#5184 closed defect (bug) (reported-upstream)
Homepage requests with a 'page' parameter should return a 404
Reported by: | jonoaldersonwp | Owned by: | |
---|---|---|---|
Milestone: | Priority: | lowest | |
Component: | General | Keywords: | seo |
Cc: |
Description (last modified by )
Requests like https://wordpress.org/page/3/ should return a 404 template and HTTP header.
Requests to paginated states of /download/, like https://wordpress.org/download/6/, should return a 404 template and HTTP header.
Requests to paginated states of pages in (and including) the 'about' section, such as https://en-gb.wordpress.org/about/features/5/, https://en-gb.wordpress.org/about/5/ and https://wordpress.org/about/license/8/ should return a 404 template and HTTP header
Change History (18)
#2
@
5 years ago
- Description modified (diff)
Closing the others as duplicates of this, as they're all Paginated states of Pages which is the same thing at the core.
#5
follow-up:
↓ 6
@
5 years ago
The last two examples should be fixed by [WP47727].
#6
in reply to:
↑ 5
@
5 years ago
Replying to ocean90:
The last two examples should be fixed by [WP47727].
Ah, so they are, Thanks @SergeyBiryukov!
#7
in reply to:
↑ 1
;
follow-up:
↓ 8
@
5 years ago
Replying to dd32:
Would it returning a
canonical
tag ofhttps://wordpress.org/
suffice here? (Currently it returns<link rel="canonical" href="https://wordpress.org/3/" />
)
Just noting that core should really be returning a canonical of either https://wordpress.org/page/3/
or https://wordpress.org/
here - https://wordpress.org/3/
is just plain wrong. This specific canonical issue only happens on the homepage, and there is an open core ticket for this specific issue: https://core.trac.wordpress.org/ticket/49220
For wordpress.org specifically, the canonical should be equal to https://wordpress.org/
#8
in reply to:
↑ 7
@
5 years ago
Replying to bradleyt:
Replying to dd32:
Would it returning a
canonical
tag ofhttps://wordpress.org/
suffice here? (Currently it returns<link rel="canonical" href="https://wordpress.org/3/" />
)
...
For wordpress.org specifically, the canonical should be equal tohttps://wordpress.org/
Would returning that canonical tag fulfil the needs of this ticket, specifically, can we avoid having to return a 301 or 404 here and just use the canonical tag instead?
#9
follow-up:
↓ 10
@
5 years ago
A canonical tag would definitely help, but we'd still be in a position where we have infinite crawl traps and pages which should exist. That'd continue to impact crawl budget, discovery, etc, across the site(s).
#10
in reply to:
↑ 9
@
5 years ago
Replying to jonoaldersonwp:
we'd still be in a position where we have infinite crawl traps and pages which should exist. That'd continue to impact crawl budget, discovery, etc, across the site(s).
As paginated states of the front-page aren't ever actually linked, I'm not sure if that's realistically an issue here? 3rd party websites may link to one or two such pages, but on the whole it shouldn't be massive traffic?
#11
@
5 years ago
The problem isn't traffic volume, it's that they're queryable and public. That means they'll still represent a point of leakage. That aside, they shouldn't exist / be exposed, regardless.
#12
@
5 years ago
- Resolution set to fixed
- Status changed from new to closed
Returns a canonical tag now.
I'm not inclined to add a redirect here right now.
All other urls mentioned redirect thanks to [WP47727].
#13
@
5 years ago
- Priority changed from normal to lowest
- Resolution fixed deleted
- Status changed from closed to reopened
This is a huge improvement, but we still need to improve the handling of invalid requests to optimize crawl budget.
As per the brief, URLs like https://wordpress.org/page/3/ need to return a 404 or 301.
Prefer a 404, as these URLs might feasibly be valid in the future.
#15
@
5 years ago
- Resolution set to reported-upstream
- Status changed from reopened to closed
Opened https://core.trac.wordpress.org/ticket/50163 with a possible patch.
Going to mark this as it can be handled upstream.
#17
@
3 months ago
Cross-posting: Fixed via https://core.trac.wordpress.org/changeset/59091.
#18
@
2 months ago
I've done some independent research related to this, so I figured I'd share what found:
HTTP Codes:
301
is not quite right because it communicates permanent non-existence404
is currently best because it communicates current non-existence416
means "Range Not Satisfiable" but has a specific use-case that does not apply here (at all)- There isn't a better response code than
404
to communicate that a request is out-of-bounds or what the boundaries would be relative to a specific URI
WordPress Code:
- The way that
redirect_canonical()
integrates with pagination is specific to something like/page/1/
orpage=1
, and is consistently handled for all core cases (archives, multi-page singulars, comments, etc...), and also properly do a 301 back to the root/canonical URI – it does not redirect page requests that are too large - I noticed that
wp_get_canonical_url()
does handle pagination query variables, but does not currently consideris_front_page()
and the difference betweenpage
andpaged
, though I have not confirmed if that matters yet - The
cpage
query var for Comments may be worth checking too, if paginated comments are set anywhere and out-of-bounds requests are being made/crawled
Would it returning a
canonical
tag ofhttps://wordpress.org/
suffice here? (Currently it returns<link rel="canonical" href="https://wordpress.org/3/" />
)