#5740 closed task (blessed) (fixed)
Add /?s= disallow rule to robots.txt
Reported by: | jonoaldersonwp | Owned by: | dd32 |
---|---|---|---|
Milestone: | Priority: | high | |
Component: | General | Keywords: | seo performance |
Cc: |
Description (last modified by )
https://wordpress.org/robots.txt already disallows /search
, but doesn't disallow /?s=*
.
This omission creates an attack vector for negative SEO and spam attacks; and we're currently under heavy attack.
To prevent this, we should allow the following rule to the robots.txt file:
Disallow: /?s=
Attachments (1)
Change History (11)
#3
@
4 years ago
Potentially, but risky without tailoring and configuring on a site-by-site basis, depending on how their internal site search works.
This ticket was mentioned in Slack in #meta by tellyworth. View the logs.
4 years ago
#5
@
4 years ago
Is this fixed by https://core.trac.wordpress.org/ticket/52457?
#6
@
4 years ago
/search/
is excluded for performance reasons, not for SEO spam.
/?s=*
redirects to /search/
so I don't think there's any need to exclude it specifically? It seems like there was no reason for this ticket If I understand correctly?
https://core.trac.wordpress.org/ticket/52457 as mentioned by @tellyworth sets a noindex tag on all search results on other sites, such as https://wordpress.org/news/?s=spam
#7
@
4 years ago
https://core.trac.wordpress.org/ticket/52457 is related, but isn't a solution here, and isn't the same thing.
Why would I create a ticket with no reason?
Noindex tags prevent indexing of an already-crawled URL. Robots.txt directives prevent crawling. They're different systems, with different effects.
It's because our ?s
URLs redirect to /search/
URLs that we have a problem - to the tune of ~500,000 spam URLs indexed in Google, damaging the WordPress brand, and consuming crawl budget which we desperately need elsewhere.
#8
@
4 years ago
There have been very significant changes in SEO over the past decade or so.
While, yeah, some tactics are still really spammy, search engines *have* gotten better at detecting them - and they are mostly now only successful in the short term.
SEO is not a dirty word. When done right - and when taking a long-term view of things, it is really about user experience (including, but not limited to performance, since page loading times is now a major factor for many search engines).
.org has changed a lot too. It is now in a mature phase, which means that the same strategies that worked even three years ago, won't keep working.
We all want to give users the best experience that we possibly can - and to do that, we need to start taking a long term view on SEO.
Users need to be able to find answers - and potential users need to be able to see our marketing content - and developers need to be able to find information related to the development of WordPress.
The unfortunate reality is that a massive amount of low-quality URLs are preventing users, current and future, and budding developers who can help build WordPress in the future, from getting where they need to be.
And this ticket can help fix some of that.
#9
@
4 years ago
- Owner set to dd32
- Resolution set to fixed
- Status changed from new to closed
In 10989:
#10
@
4 years ago
In future, please include an example rather than "please just do this".
It wouldn't be the first time there's been a ticket created for behaviours that aren't actually happening, and when all the data we have available to us says "This isn't a problem" a ticket without an example should always be taken with a grain of salt.
Would this apply to all WP sites? (as in, should this be in core?)