Making WordPress.org

Opened 3 years ago

Closed 13 months ago

#5458 closed defect (bug) (duplicate)

Redirects should not trigger 429 responses, and rate limiting should be relaxed.

Reported by: jonoaldersonwp's profile jonoaldersonwp Owned by:
Milestone: Priority: highest omg bbq
Component: General Keywords: seo
Cc:

Description (last modified by jonoaldersonwp)

Requests to https://wordpress.org/tags/privacy/page/15 are redirected to https://wordpress.org/support/topic-tag/privacy/page/15/.

However, in the process, there are several interstitial redirects (the change in path, and, appending the trailing slash).

This frequently triggers a 429 error for Google (and presumably other systems).

To address this:

  • Rate limiting on 301 redirects should be removed (or radically relaxed).
  • System-level redirect rules (altering paths, appending trailing slashes, case normalisation) should be consolidated into a single redirect.

This also appears to apply to any paginated request (e.g., https://wordpress.org/support/topic/my-newest-blog/page/2/); which makes this even more damaging, significant, and urgent to fix.

Given that we 'removed' paginated states beyond 50, the performance implications of this should be somewhat mitigated. If there are still performance concerns, there's no shortage of opportunities to optimise other parts of the site(s)/stack to recoup our losses.

Change History (13)

#1 @jonoaldersonwp
3 years ago

NB, I suspect this is responsible for hundreds of small oddities that I've been seeing but unable to diagnose over the past few months. Aside from fixing the underlying issue, this would clean up our reporting significantly (and thus make it easier to find/manage other issues).

#2 @jonoaldersonwp
3 years ago

As we gradually fix other areas, I'm seeing more evidence emerging that this is hurting crawling and discoverability. This remains a (very) high priority.

#3 @jonoaldersonwp
3 years ago

  • Description modified (diff)

This ticket was mentioned in Slack in #meta by jonoaldersonwp. View the logs.


3 years ago

#5 @jonoaldersonwp
3 years ago

  • Description modified (diff)
  • Summary changed from Redirects should not trigger 429 responses to Redirects should not trigger 429 responses, and rate limiting should be relaxed.

#6 follow-up: @dd32
3 years ago

Just noting that it's impossible to rate limit redirects at a different rate than non-redirects.

Due to how nginx rate limiting works, the counters are updating prior to the request being made, and the blocking is applied prior to the system processing the request. As a result, the module does not know if the request is a redirect either at the point of bumping the request counters or blocking the request from happening.

The only way to resolve that is to increase the allowed rate limits, or fix the clients which are requesting pages at a too high rate that ultimately triggers the blocking.


Consolidating redirects / etc can somewhat be done (and has been), but the vast majority are not worth the effort or are almost impossible to do so due to the varied places that the redirects happen, as has been explained many times.

#7 in reply to: ↑ 6 @dd32
3 years ago

Replying to dd32:

Due to how nginx rate limiting works...

Looks like I might've been wrong, server-level redirects probably bypass the rate limiting already as per https://trac.nginx.org/nginx/ticket/1834 so any redirect requests that are being rate limited here, are redirects from PHP which means many of the redirects are already consolidated or unaffected by the rate limiting.

#8 @jonoaldersonwp
3 years ago

Interesting!
So, how do we unpick this?
Feels like the easiest fix would just be to (significantly) relax the rate-limiting for Google?

#9 @jonoaldersonwp
3 years ago

  • Priority changed from high to highest omg bbq

This is resulting in large numbers of errors in Google Search Console, and continues to contribute to the serious delays we see in discovery, crawling, indexing and consolidation.

#10 @jonoaldersonwp
3 years ago

This continues to be a critical problem for the whole .org ecosystem, resulting in huge volumes of errors and value leakage.

This ticket was mentioned in Slack in #meta by jonoaldersonwp. View the logs.


3 years ago

#12 @jonoaldersonwp
2 years ago

Google Search Console is now reporting half a million URLs with errors, which prevents discovery and crawling of paginated plugins.

It's absolutely critical that we remove the block on allowing search engines to crawl paginated states.

#13 @jonoaldersonwp
13 months ago

  • Resolution set to duplicate
  • Status changed from new to closed

Duplicate of #6773.

Closing in favour of https://meta.trac.wordpress.org/ticket/6773#ticket

Note: See TracTickets for help on using tickets.