Opened 6 years ago
Closed 6 years ago
#4068 closed defect (bug) (fixed)
Change 503 behaviour on paginated states
Reported by: | jonoaldersonwp | Owned by: | dd32 |
---|---|---|---|
Milestone: | Priority: | high | |
Component: | General | Keywords: | seo |
Cc: |
Description (last modified by )
Requests for paginated states (i.e., /page/n/) by known bots are intentionally served a 503 HTTP status (and a ‘raw’ error template); this is managed by the load balancer.
This is a critical issue for SEO, and must be altered as a matter of urgency.
Full context, rationale, supporting information and details (including longer-term improvements) can be found here: https://docs.google.com/document/d/1smvevKgp29CP9mop7s8F-q8WhMo7cTipEWAqS2wehWI/edit?usp=sharing
Short-term actions:
- Reduce the maximum number of paginated results for logged-out users. At the moment, we stop pagination at page 100 (even if there are more results than this) and return a 404 error. We should reduce this threshold to 30 for logged-out users. Requests to page > 30 should return a 404 error.
- See #3985 for making the load balancer return a 'friendly' error page in these scenarios.
- We should add a notice at the end of page 30, indicating that people can search for more/different results if they haven’t found what they’re looking for.
- We should increase the number of results on each page (currently 14) to 20 to offset the reduction in crawling/discovery.
Change History (15)
This ticket was mentioned in Slack in #meta by joostdevalk. View the logs.
6 years ago
#5
@
6 years ago
Found here, also: https://en-gb.wordpress.org/themes/browse/popular/page/2/.
#6
@
6 years ago
@jonoaldersonwp In general, a ticket per affected application is easier, as some issues are easy to fix in one place, but impossible in others, etc.
#7
@
6 years ago
Just noting for reference, Currently the Bot pagination blocking is for any request with /page/
in its URI on all WordPress.org domains for pages >= 50, but may change at any time in response to increased load from bots.
I think dropping all logged out users to a maximum of 50 pages is acceptable and shouldn't be too much extra logic. It'd need to affect Plugin Directory
, Theme Directory
, and Support Forums
.
This ticket was mentioned in Slack in #meta by sergey. View the logs.
6 years ago
#9
@
6 years ago
I think dropping all logged out users to a maximum of 50 pages is acceptable and shouldn't be too much extra logic. It'd need to affect Plugin Directory, Theme Directory, and Support Forums.
Let's do that!
#11
@
6 years ago
- Owner changed from SergeyBiryukov to dd32
I'm currently testing a plugin to do this.
#13
@
6 years ago
Just noting that there were two different bot blocks in place
- A requests per second limit = 503 (now a 429 and increased limits)
- A pagination limit for >=
/page/50
= 404
I originally missed the first block which this ticket was actually about, and thought it was the second. Limiting the paginated results to ~50 "solves" the latter point, and also reduces the number of URLs to be crawled which should help the first block (Less requests overall).
There's also a few issues around the high-pagination block as implemented, as Google IS indexing high-paginated URLs (..under multiple variations due to junk args appended to the url..) however it doesn't seem like something we want to encourage given the requests/second block is being triggered.
#14
@
6 years ago
We should increase the number of results on each page (currently 14) to 20 to offset the reduction in crawling/discovery.
Looks like this is the only item left here.
- The Plugin Directory uses the standard
posts_per_page
option from Reading Settings. The value has been updated from 14 to 20. - In Theme Directory, there's infinite loading with 24 themes per page, seems good enough as is. Note: the limit in [8174] does not apply there, as themes are requested via Themes API, not as a part of a regular request.
- On Support Forums, there are 30 topics per page (see #2480), also seems good enough.
To clarify, this is about the plugin directory, right?