Opened 5 years ago
Last modified 3 years ago
#5105 new defect (bug)
Remove bot blocking (403 responses) on *.trac.wordpress.org sites.
Reported by: | jonoaldersonwp | Owned by: | |
---|---|---|---|
Milestone: | Priority: | high | |
Component: | Trac | Keywords: | seo |
Cc: |
Description
We have systems in place which actively prevent Google (and other agents?) from accessing *.trac.wordpress.org
sites/URLs. We return a 403 response (and a raw NGINX template) in these scenarios.
This 'solution' prevents these agents them from seeing/accessing the robots.txt file on those respective sites, and thus results in them continuing to attempt to crawl/index them (especially as these URLs are heavily linked to throughout the wp.org ecosystem).
I propose that we remove the 403 behaviour, and rely on the robots.txt file to do its job.
If we believe that it's necessary to restrict crawling behaviour for performance reasons, then we can consider tailoring the robots.txt rule(s) to be more restrictive, and/or implementing performance improvements throughout the site(s) (of which there are myriad available and achievable, both front-end and back-end).
Change History (4)
#2
@
5 years ago
- Priority changed from normal to high
@afercia rightly points out that the current behaviour is likely to negatively impact the ability of contributors to contribute, as they rely on Google (either through internal or external site search) to find tickets/issues and related files. Upgrading the severity accordingly.
NB: It looks like this might be tied to some rate-limiting logic. That doesn't change anything, though; this should still be removed.