Opened 7 years ago
Closed 7 years ago
#4068 closed defect (bug) (fixed)
Change 503 behaviour on paginated states
| Reported by: |
|
Owned by: |
|
|---|---|---|---|
| Milestone: | Priority: | high | |
| Component: | General | Keywords: | seo |
| Cc: |
Description (last modified by )
Requests for paginated states (i.e., /page/n/) by known bots are intentionally served a 503 HTTP status (and a ‘raw’ error template); this is managed by the load balancer.
This is a critical issue for SEO, and must be altered as a matter of urgency.
Full context, rationale, supporting information and details (including longer-term improvements) can be found here: https://docs.google.com/document/d/1smvevKgp29CP9mop7s8F-q8WhMo7cTipEWAqS2wehWI/edit?usp=sharing
Short-term actions:
- Reduce the maximum number of paginated results for logged-out users. At the moment, we stop pagination at page 100 (even if there are more results than this) and return a 404 error. We should reduce this threshold to 30 for logged-out users. Requests to page > 30 should return a 404 error.
- See #3985 for making the load balancer return a 'friendly' error page in these scenarios.
- We should add a notice at the end of page 30, indicating that people can search for more/different results if they haven’t found what they’re looking for.
- We should increase the number of results on each page (currently 14) to 20 to offset the reduction in crawling/discovery.
Change History (15)
This ticket was mentioned in Slack in #meta by joostdevalk. View the logs.
7 years ago
#5
@
7 years ago
Found here, also: https://en-gb.wordpress.org/themes/browse/popular/page/2/.
#6
@
7 years ago
@jonoaldersonwp In general, a ticket per affected application is easier, as some issues are easy to fix in one place, but impossible in others, etc.
#7
@
7 years ago
Just noting for reference, Currently the Bot pagination blocking is for any request with /page/ in its URI on all WordPress.org domains for pages >= 50, but may change at any time in response to increased load from bots.
I think dropping all logged out users to a maximum of 50 pages is acceptable and shouldn't be too much extra logic. It'd need to affect Plugin Directory, Theme Directory, and Support Forums.
This ticket was mentioned in Slack in #meta by sergey. View the logs.
7 years ago
#9
@
7 years ago
I think dropping all logged out users to a maximum of 50 pages is acceptable and shouldn't be too much extra logic. It'd need to affect Plugin Directory, Theme Directory, and Support Forums.
Let's do that!
#11
@
7 years ago
- Owner changed from SergeyBiryukov to dd32
I'm currently testing a plugin to do this.
#13
@
7 years ago
Just noting that there were two different bot blocks in place
- A requests per second limit = 503 (now a 429 and increased limits)
- A pagination limit for >=
/page/50= 404
I originally missed the first block which this ticket was actually about, and thought it was the second. Limiting the paginated results to ~50 "solves" the latter point, and also reduces the number of URLs to be crawled which should help the first block (Less requests overall).
There's also a few issues around the high-pagination block as implemented, as Google IS indexing high-paginated URLs (..under multiple variations due to junk args appended to the url..) however it doesn't seem like something we want to encourage given the requests/second block is being triggered.
#14
@
7 years ago
We should increase the number of results on each page (currently 14) to 20 to offset the reduction in crawling/discovery.
Looks like this is the only item left here.
- The Plugin Directory uses the standard
posts_per_pageoption from Reading Settings. The value has been updated from 14 to 20. - In Theme Directory, there's infinite loading with 24 themes per page, seems good enough as is. Note: the limit in [8174] does not apply there, as themes are requested via Themes API, not as a part of a regular request.
- On Support Forums, there are 30 topics per page (see #2480), also seems good enough.
To clarify, this is about the plugin directory, right?