Opened 4 years ago

Last modified 4 years ago

#5381 new defect (bug)

Prevent broken internal links in plugin readme content

Reported by: jonoaldersonwp's profile jonoaldersonwp Owned by:
Milestone: Priority: low
Component: Plugin Directory Keywords: seo


Plugin authors occasionally include broken/malformed links to URLs in their readme content.

This harms our UX and SEO.

We can (partially) prevent this by checking for the presence of broken 'internal' links when a plugin is updated.

Specifically, on plugin update we should:

  • Make a request to each link in the text which has a hostname matching or a subdomain.
  • If it returns a 404, transform the <a> to a <s> (maintaining tag attributes and content).
  • Notify the plugin author (details/mechanism TBD).

Change History (4)

This ticket was mentioned in Slack in #meta by jonoaldersonwp. View the logs.

4 years ago

#2 @dd32
4 years ago

Did some digging.

There’s 35,061 links in published plugin readmes that link to, of that 17,902 are unqiue - I didn't sanitize the URLs so a bunch are trailing slash, others are http vs https (70% were ssl'd).

Of those URLS, here's the response codes:

  • 200: 17,607 urls
  • 404: 295 urls.

Of the 404's, some were parsing failures on my part.. Looking at those remaining 246 links, I've spreadsheeted them:

So yeah, 0.7% of plugin links to from plugin pages are currently 404's, the majority of which haven't been in use for a long time and should be redirects if anything. The remaining few links (downloads, and 14 support forum links) can probably remain as 404s.

This ticket was mentioned in Slack in #meta by tellyworth. View the logs.

4 years ago

#4 @jonoaldersonwp
4 years ago

I've reviewed that list, and identified a subset of URLs which can be safely and simply redirected. Could we get those implemented?

URLs marked as 'Phase 2' or 'Misc' require further consideration, as they come with additional complexity.

Note: See TracTickets for help on using tickets.