WordPress.org

Making WordPress.org

Opened 5 weeks ago

Last modified 4 weeks ago

#4685 new defect

Disallow /changeset in https://core.trac.wordpress.org/robots.txt

Reported by: jonoaldersonwp Owned by:
Milestone: Priority: lowest
Component: Trac Keywords: seo
Cc:

Description

Add a disallow rule for /changeset.

Change History (6)

#1 @nacin
5 weeks ago

Why would we block changesets from search? They're very valuable.

#2 @jonoaldersonwp
5 weeks ago

Because there are ~13,000 low-quality, undifferentiated, unoptimised pages indexed, sucking value out of the rest of the site.

If somebody's willing to invest radical improvements in the trac site to bring it up to standards, we could get away with having these indexed. In their current state, they're a drain.

Are you suggesting that people search/Google for specific queries, and expect/find these pages? Can you give me some example queries and scenarios?

#3 @dd32
4 weeks ago

I agree, /changeset URLs are useful and valuable. Lets not block them.

There's potentially a better request here, which is to noindex the specific files urls on it, such as /changeset/\d+/trunk/..... which are less useful in search results.

#4 @jonoaldersonwp
4 weeks ago

Ok, let's alter the approach to add rules for requests containing:

  • *old_path=
  • /trunk
  • /branches

#5 @dd32
4 weeks ago

So that'd be this then?

  • /changeset/*/old_path=
  • /changeset/*/trunk/
  • /changeset/*/branches/
  • /changeset/*/tag/

Also just noting that currently the robots.txt here is shared with every Trac and SVN instance, and requires a systems request to alter.

#6 @jonoaldersonwp
4 weeks ago

I'm wondering if it'd be easier just to roll these rules out to all Trac/SVN sites.

The 'old path' rule should be /changeset/*old_path=*.

Note: See TracTickets for help on using tickets.