Making WordPress.org

Opened 15 months ago

Last modified 15 months ago

#6766 new enhancement

Update developer.wordpress.org/robots.txt to prevent spam attacks

Reported by: jonoaldersonwp's profile jonoaldersonwp Owned by:
Milestone: Priority: low
Component: General Keywords: seo performance has-patch changes-requested
Cc:

Description

The developer site is the subject of internal site search spam attacks.

This impacts our crawl budget, and floods our Google Search Console account (potentially blinding us to other issues).

We can reduce the impact of this by tweaking the site's robots.txt rules as follows, to block search patterns (and add some best practices whilst we're there).

# Prevent crawling of WP internals
# --------------------------------
User-agent: *
Disallow: /wp-admin/
Disallow: /?rest_route=
Disallow: /xmlrpc.php

# Prevent crawling of search URLs
# --------------------------------
Disallow: /?s=
Disallow: /search/

Change History (3)

This ticket was mentioned in PR #121 on WordPress/wordpress.org by @tellyworth.


15 months ago
#1

  • Keywords has-patch added

See https://meta.trac.wordpress.org/ticket/6766

This adds new rules to both https://wordpress.org/robots.txt and https://developer.wordpress.org/robots.txt.

Before, main site:

Sitemap: https://wordpress.org/sitemap.xml
Sitemap: https://wordpress.org/news-sitemap.xml
Sitemap: https://wordpress.org/themes/sitemap.xml
Sitemap: https://wordpress.org/plugins/sitemap.xml
Sitemap: https://wordpress.org/news/sitemap.xml
Sitemap: https://wordpress.org/showcase/sitemap.xml
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Allow: /wp-admin/load-scripts.php
Allow: /wp-admin/load-styles.php

User-agent: *
Disallow: /search
Disallow: /?s=

User-agent: *
Disallow: /plugins/search/

After, main site:

Sitemap: https://wordpress.org/sitemap.xml
Sitemap: https://wordpress.org/news-sitemap.xml
Sitemap: https://wordpress.org/themes/sitemap.xml
Sitemap: https://wordpress.org/plugins/sitemap.xml
Sitemap: https://wordpress.org/news/sitemap.xml
Sitemap: https://wordpress.org/showcase/sitemap.xml
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Allow: /wp-admin/load-scripts.php
Allow: /wp-admin/load-styles.php

User-agent: *
Disallow: /wp-admin/
Disallow: /?rest_route=
Disallow: /xmlrpc.php
Disallow: /search
Disallow: /?s=

User-agent: *
Disallow: /plugins/search/

Before, developer:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Allow: /wp-admin/load-scripts.php
Allow: /wp-admin/load-styles.php

After, developer:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Allow: /wp-admin/load-scripts.php
Allow: /wp-admin/load-styles.php

User-agent: *
Disallow: /wp-admin/
Disallow: /?rest_route=
Disallow: /xmlrpc.php
Disallow: /search
Disallow: /?s=

#2 @tellyworth
15 months ago

  • Keywords changes-requested added

@jonoaldersonwp: @dd32 pointed out in code review that part of this contradicts an earlier ticket #5806

https://github.com/WordPress/wordpress.org/pull/121#discussion_r1109379789

How should that be resolved? Can this be simplified to only include the search URLs?

#3 @jonoaldersonwp
15 months ago

I don't think there's a contradiction; #5806 is about wordpress.org and Rosetta sites. This issue should apply to developer.wordpress.org only.

Note: See TracTickets for help on using tickets.