Making WordPress.org

Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#5938 closed defect (bug) (fixed)

Add an x-robots-tag header to md5/sha1 etc file URLs

Reported by: jonoaldersonwp's profile jonoaldersonwp Owned by:
Milestone: Priority: lowest
Component: WordPress.org Site Keywords: seo
Cc:

Description

We expose lots of URLs like this across wordpress.org:

https://wordpress.org/wordpress-5.6.3.zip.sha1
https://wordpress.org/wordpress-4.0.5.tar.gz.md5
https://br.wordpress.org/wordpress-4.7-pt_BR.zip.md5

These consume considerable crawl resources and often tripS the 'soft 404' warning in Google Search Console.

We should manage this by adding an x-robots-tag header to all responses ending in .sha1, .md5 and similar, with a value of noindex, follow.

Change History (4)

#1 @dd32
3 years ago

This should not affect .zip$ or .gz$ links correct?

What about https://downloads.wordpress.org/* links? Same as the above?

#2 @jonoaldersonwp
3 years ago

Ah, good question!

We can indeed safely ignore zip and gz links.

Yes please for downloads.!

#3 @dd32
3 years ago

  • Component changed from General to WordPress.org Site
  • Resolution set to fixed
  • Status changed from new to closed

Added.

$ curl -Is https://wordpress.org/wordpress-5.6.3.zip.sha1 | grep -i 'x-robots-tag'
x-robots-tag: noindex, follow

$ curl -Is https://wordpress.org/wordpress-5.6.3.zip | grep -i 'x-robots-tag'
// No output

$ curl -Is https://downloads.wordpress.org/release/en_AU/latest.zip.sha1 | grep -i 'x-robots-tag'
x-robots-tag: noindex, follow

$ curl -Is https://br.wordpress.org/wordpress-4.7-pt_BR.zip.md5 | grep -i 'x-robots-tag'
x-robots-tag: noindex, follow

Plugins have a .json checksum file that I haven't added it to, but they're served with the proper content-type headers and aren't linked to.. so I think those should be fine?

#4 @jonoaldersonwp
3 years ago

Nice one!
Yeah, not seeing any problems with the JSON files. Much appreciated!

Note: See TracTickets for help on using tickets.