Making WordPress.org

Opened 5 years ago

Last modified 5 years ago

#4685 new defect (bug)

Disallow /changeset in https://core.trac.wordpress.org/robots.txt

Reported by: jonoaldersonwp's profile jonoaldersonwp Owned by:
Milestone: Priority: lowest
Component: Trac Keywords: seo
Cc:

Description

Add a disallow rule for /changeset.

Change History (6)

#1 @nacin
5 years ago

Why would we block changesets from search? They're very valuable.

#2 @jonoaldersonwp
5 years ago

Because there are ~13,000 low-quality, undifferentiated, unoptimised pages indexed, sucking value out of the rest of the site.

If somebody's willing to invest radical improvements in the trac site to bring it up to standards, we could get away with having these indexed. In their current state, they're a drain.

Are you suggesting that people search/Google for specific queries, and expect/find these pages? Can you give me some example queries and scenarios?

#3 @dd32
5 years ago

I agree, /changeset URLs are useful and valuable. Lets not block them.

There's potentially a better request here, which is to noindex the specific files urls on it, such as /changeset/\d+/trunk/..... which are less useful in search results.

#4 @jonoaldersonwp
5 years ago

Ok, let's alter the approach to add rules for requests containing:

  • *old_path=
  • /trunk
  • /branches

#5 @dd32
5 years ago

So that'd be this then?

  • /changeset/*/old_path=
  • /changeset/*/trunk/
  • /changeset/*/branches/
  • /changeset/*/tag/

Also just noting that currently the robots.txt here is shared with every Trac and SVN instance, and requires a systems request to alter.

#6 @jonoaldersonwp
5 years ago

I'm wondering if it'd be easier just to roll these rules out to all Trac/SVN sites.

The 'old path' rule should be /changeset/*old_path=*.

Note: See TracTickets for help on using tickets.