Making WordPress.org

Opened 7 months ago

Last modified 7 months ago

#5105 new defect

Remove bot blocking (403 responses) on *.trac.wordpress.org sites.

Reported by: jonoaldersonwp Owned by:
Milestone: Priority: high
Component: Trac Keywords: seo


We have systems in place which actively prevent Google (and other agents?) from accessing *.trac.wordpress.org sites/URLs. We return a 403 response (and a raw NGINX template) in these scenarios.

This 'solution' prevents these agents them from seeing/accessing the robots.txt file on those respective sites, and thus results in them continuing to attempt to crawl/index them (especially as these URLs are heavily linked to throughout the wp.org ecosystem).

I propose that we remove the 403 behaviour, and rely on the robots.txt file to do its job.

If we believe that it's necessary to restrict crawling behaviour for performance reasons, then we can consider tailoring the robots.txt rule(s) to be more restrictive, and/or implementing performance improvements throughout the site(s) (of which there are myriad available and achievable, both front-end and back-end).

Change History (2)

#1 @jonoaldersonwp
7 months ago

NB: It looks like this might be tied to some rate-limiting logic. That doesn't change anything, though; this should still be removed.

#2 @jonoaldersonwp
7 months ago

  • Priority changed from normal to high

@afercia rightly points out that the current behaviour is likely to negatively impact the ability of contributors to contribute, as they rely on Google (either through internal or external site search) to find tickets/issues and related files. Upgrading the severity accordingly.

Note: See TracTickets for help on using tickets.