Making WordPress.org

Opened 6 years ago

Closed 6 years ago

#4068 closed defect (bug) (fixed)

Change 503 behaviour on paginated states

Reported by: jonoaldersonwp's profile jonoaldersonwp Owned by: dd32's profile dd32
Milestone: Priority: high
Component: General Keywords: seo
Cc:

Description (last modified by SergeyBiryukov)

Requests for paginated states (i.e., /page/n/) by known bots are intentionally served a 503 HTTP status (and a ‘raw’ error template); this is managed by the load balancer.

This is a critical issue for SEO, and must be altered as a matter of urgency.

Full context, rationale, supporting information and details (including longer-term improvements) can be found here: https://docs.google.com/document/d/1smvevKgp29CP9mop7s8F-q8WhMo7cTipEWAqS2wehWI/edit?usp=sharing

Short-term actions:

  • Reduce the maximum number of paginated results for logged-out users. At the moment, we stop pagination at page 100 (even if there are more results than this) and return a 404 error. We should reduce this threshold to 30 for logged-out users. Requests to page > 30 should return a 404 error.
    • See #3985 for making the load balancer return a 'friendly' error page in these scenarios.
    • We should add a notice at the end of page 30, indicating that people can search for more/different results if they haven’t found what they’re looking for.
  • We should increase the number of results on each page (currently 14) to 20 to offset the reduction in crawling/discovery.

Change History (15)

This ticket was mentioned in Slack in #meta by joostdevalk. View the logs.


6 years ago

#2 @SergeyBiryukov
6 years ago

  • Description modified (diff)

#3 @SergeyBiryukov
6 years ago

To clarify, this is about the plugin directory, right?

#4 @jonoaldersonwp
6 years ago

Yes, though I suspect that the same behaviour might be implemented elsewhere?

#6 @dd32
6 years ago

@jonoaldersonwp In general, a ticket per affected application is easier, as some issues are easy to fix in one place, but impossible in others, etc.

#7 @dd32
6 years ago

Just noting for reference, Currently the Bot pagination blocking is for any request with /page/ in its URI on all WordPress.org domains for pages >= 50, but may change at any time in response to increased load from bots.

I think dropping all logged out users to a maximum of 50 pages is acceptable and shouldn't be too much extra logic. It'd need to affect Plugin Directory, Theme Directory, and Support Forums.

This ticket was mentioned in Slack in #meta by sergey. View the logs.


6 years ago

#9 @joostdevalk
6 years ago

I think dropping all logged out users to a maximum of 50 pages is acceptable and shouldn't be too much extra logic. It'd need to affect Plugin Directory, Theme Directory, and Support Forums.

Let's do that!

#10 @SergeyBiryukov
6 years ago

  • Owner set to SergeyBiryukov
  • Status changed from new to accepted

#11 @dd32
6 years ago

  • Owner changed from SergeyBiryukov to dd32

I'm currently testing a plugin to do this.

#12 @dd32
6 years ago

In 8174:

Add a mu-plugin which limits logged out users to 49 pages of content.

This is being done for SEO purposes, as Search Crawlers are blocked on accessing high paginated results and the linked 404 causes problems.

This may be reverted/disabled if it causes issues (including less discoverability).

See #4068.

#13 @dd32
6 years ago

Just noting that there were two different bot blocks in place

  • A requests per second limit = 503 (now a 429 and increased limits)
  • A pagination limit for >= /page/50 = 404

I originally missed the first block which this ticket was actually about, and thought it was the second. Limiting the paginated results to ~50 "solves" the latter point, and also reduces the number of URLs to be crawled which should help the first block (Less requests overall).

There's also a few issues around the high-pagination block as implemented, as Google IS indexing high-paginated URLs (..under multiple variations due to junk args appended to the url..) however it doesn't seem like something we want to encourage given the requests/second block is being triggered.

#14 @SergeyBiryukov
6 years ago

We should increase the number of results on each page (currently 14) to 20 to offset the reduction in crawling/discovery.

Looks like this is the only item left here.

  • The Plugin Directory uses the standard posts_per_page option from Reading Settings. The value has been updated from 14 to 20.
  • In Theme Directory, there's infinite loading with 24 themes per page, seems good enough as is. Note: the limit in [8174] does not apply there, as themes are requested via Themes API, not as a part of a regular request.
  • On Support Forums, there are 30 topics per page (see #2480), also seems good enough.

#15 @dd32
6 years ago

  • Resolution set to fixed
  • Status changed from accepted to closed

I think we're done here.

There's a pending ticket similar for the bland error pages: #3985

Note: See TracTickets for help on using tickets.