Making WordPress.org

Opened 19 months ago

Last modified 3 months ago

#5105 new defect

Remove bot blocking (403 responses) on *.trac.wordpress.org sites.

Reported by: jonoaldersonwp Owned by:
Milestone: Priority: high
Component: Trac Keywords: seo


We have systems in place which actively prevent Google (and other agents?) from accessing *.trac.wordpress.org sites/URLs. We return a 403 response (and a raw NGINX template) in these scenarios.

This 'solution' prevents these agents them from seeing/accessing the robots.txt file on those respective sites, and thus results in them continuing to attempt to crawl/index them (especially as these URLs are heavily linked to throughout the wp.org ecosystem).

I propose that we remove the 403 behaviour, and rely on the robots.txt file to do its job.

If we believe that it's necessary to restrict crawling behaviour for performance reasons, then we can consider tailoring the robots.txt rule(s) to be more restrictive, and/or implementing performance improvements throughout the site(s) (of which there are myriad available and achievable, both front-end and back-end).

Change History (4)

#1 @jonoaldersonwp
19 months ago

NB: It looks like this might be tied to some rate-limiting logic. That doesn't change anything, though; this should still be removed.

#2 @jonoaldersonwp
19 months ago

  • Priority changed from normal to high

@afercia rightly points out that the current behaviour is likely to negatively impact the ability of contributors to contribute, as they rely on Google (either through internal or external site search) to find tickets/issues and related files. Upgrading the severity accordingly.

#3 @jonoaldersonwp
7 months ago

This has been stale for a year; how can we escalate addressing this?

This ticket was mentioned in Slack in #meta by jonoaldersonwp. View the logs.

3 months ago

Note: See TracTickets for help on using tickets.