Making WordPress.org

Opened 3 weeks ago

Last modified 3 weeks ago

#4559 new enhancement

Dedicated robots.txt file for translate.wordpress.org

Reported by: jonoaldersonwp Owned by:
Milestone: Priority: low
Component: Translate Site & Plugins Keywords: seo

Description (last modified by jonoaldersonwp)

This consumes huge amounts of crawl budget, for relatively little return. We'd like to block crawling of it entirely, via robots.txt.

At the moment, it shares a robots.txt with other wordpress.org domains, which makes this impossible.

Can we give it a dedicated robots.txt file, which is separate from other sites, with the following contents:

User-agent: *
Disallow: /*
Noindex: /*
Allow: /$

NB: We'll need to be absolutely certain that this is a standalone file, and doesn't bleed through to any other WP domains/contexts, or we'll cause the end of the world.

If/when this is complete, the ?filter rule can be removed from the shared/global robots.txt file.

Change History (3)

#1 @jonoaldersonwp
3 weeks ago

  • Description modified (diff)

#2 @ocean90
3 weeks ago

  • Priority changed from high to low
  • Type changed from defect to enhancement

There are probably a few pages which still should be indexed like /stats, /consistency or each /locale/$locale.

We'd like to block crawling of it entirely

Just out of curiosity, who is "we"?

#3 @jonoaldersonwp
3 weeks ago

Happy to add a small number of whitelisted pages. In this case, "we" is me and Joost.

Note: See TracTickets for help on using tickets.