WordPress.org

Making WordPress.org

Opened 2 months ago

Last modified 8 weeks ago

#5381 new defect

Prevent broken internal links in plugin readme content

Reported by: jonoaldersonwp Owned by:
Milestone: Priority: low
Component: Plugin Directory Keywords: seo
Cc:

Description

Plugin authors occasionally include broken/malformed links to wordpress.org URLs in their readme content.

This harms our UX and SEO.

We can (partially) prevent this by checking for the presence of broken 'internal' links when a plugin is updated.

Specifically, on plugin update we should:

  • Make a request to each link in the text which has a hostname matching wordpress.org or a wordpress.org subdomain.
  • If it returns a 404, transform the <a> to a <s> (maintaining tag attributes and content).
  • Notify the plugin author (details/mechanism TBD).

Change History (4)

This ticket was mentioned in Slack in #meta by jonoaldersonwp. View the logs.


2 months ago

#2 @dd32
2 months ago

Did some digging.

There’s 35,061 links in published plugin readmes that link to WordPress.org, of that 17,902 are unqiue - I didn't sanitize the URLs so a bunch are trailing slash, others are http vs https (70% were ssl'd).

Of those URLS, here's the response codes:

  • 200: 17,607 urls
  • 404: 295 urls.

Of the 404's, some were parsing failures on my part.. Looking at those remaining 246 links, I've spreadsheeted them:
https://docs.google.com/spreadsheets/d/e/2PACX-1vTuzw6BkceSipfSubm8Q8aCkrdW16GHMjCpRluV9qvNZXbrsFXrs5WIighRrRfeILJrPjJMWSfDCbnm/pubhtml

So yeah, 0.7% of plugin links to WordPress.org from plugin pages are currently 404's, the majority of which haven't been in use for a long time and should be redirects if anything. The remaining few links (downloads, and 14 support forum links) can probably remain as 404s.

This ticket was mentioned in Slack in #meta by tellyworth. View the logs.


8 weeks ago

#4 @jonoaldersonwp
8 weeks ago

I've reviewed that list, and identified a subset of URLs which can be safely and simply redirected. Could we get those implemented?

URLs marked as 'Phase 2' or 'Misc' require further consideration, as they come with additional complexity.

Note: See TracTickets for help on using tickets.