WordPress.org

Making WordPress.org

Opened 4 days ago

Last modified 30 hours ago

#4684 new defect

Trac 'reports' crawl and indexing controls

Reported by: jonoaldersonwp Owned by:
Milestone: Priority: low
Component: Trac Keywords: seo analytics
Cc:

Description (last modified by jonoaldersonwp)

Trac 'reports' pages, like https://core.trac.wordpress.org/report/5, require the following improvements:

  • Remove default sort (and other) parameters from pagination links. E.g., the link to 'page 2' in this state includes a redundant asc=` parameter. This is the default behaviour, and it should be removed.
  • Add a canonical URL tag, referencing the current report and paginated state, omitting the (now removed) asc=1 parameter.
  • Update the page title to include the paginated state (E.g., {5} Next Major Release – Page 1 of 5 – WordPress Trac)
  • Add (or modify the output of) a meta robots tag with a value of noindex, follow on all requests which include the following parameters: max, sfp_email, sfph_mail, sort, USER, asc

Note that these must all be deployed as a single change. A partial/selective implementation of these may make things worse.

Change History (4)

#1 @jonoaldersonwp
4 days ago

  • Description modified (diff)
  • Priority changed from low to normal

#2 in reply to: ↑ description @dd32
35 hours ago

  • Priority changed from normal to low
  • Remove default sort (and other) parameters from pagination links. E.g., the link to 'page 2' in this state includes a redundant asc=` parameter. This is the default behaviour, and it should be removed.

That's Trac integrated behaviour, I don't think we'll be modifying that, it requires too much digging into the way Trac works.

  • Add a canonical URL tag, referencing the current report and paginated state, omitting the (now removed) asc=1 parameter.

For some reports, that won't be possible due to the complicated things that happen behind the scenes in Trac.
For example, take a look at the reports on the Themes Trac, for an idea of what future reports on core.trac may look like.

  • Update the page title to include the paginated state (E.g., {5} Next Major Release – Page 1 of 5 – WordPress Trac)

Due to the way Trac templates work, while we can filter the title, it's significantly harder to insert content into the middle of the title, prefixes and suffixes are doable, but won't work with the current structure (which we also can't change)

  • Add (or modify the output of) a meta robots tag with a value of noindex, follow on all requests which include the following parameters: max, sfp_email, sfph_mail, sort, USER, asc

The sfp args are present on a lot of requests, and as you've noted are mostly useless. The USER parameter and a few others similar to it completely change the pages content though into a new report.

Most of these things seem like low priority tasks which can be done individually, but don't really seem like they'll improve the user experience, so are going to remain as a low priority fix in the scheme of Trac things.

#3 @jonoaldersonwp
30 hours ago

As I said, these can't be rolled out as individual changes, they need to be done as a single release. Otherwise, we'll make more mess than we resolve. The only exception is the final point, which would still be valuable to implement on its own.

These aren't intended to make a significant difference to user experience, they're intended to reduce crawl budget waste and to consolidate value into more important areas of the site.

#4 @dd32
30 hours ago

As I said, these can't be rolled out as individual changes, they need to be done as a single release.

That's the reason as to why I responded to each of them. If it's an all-or-nothing, then it's a nothing for the forseeable future, crawling budget or not.

Note: See TracTickets for help on using tickets.