Making WordPress.org

Opened 6 weeks ago

Last modified 5 weeks ago

#8225 new enhancement

Plugin search: slug field boost creates permanent advantage for grandfathered trademark slugs

Reported by: jeangalea's profile jeangalea Owned by:
Milestone: Improved Search Priority: normal
Component: Plugin Directory Keywords:
Cc:

Description

Problem

The plugin search algorithm gives the slug_text field a 5x boost — the highest text-field weight in the query. This is by design and generally makes sense: a slug match is a strong relevance signal.

However, combined with two other policies, it creates an unintended and permanent competitive distortion:

  1. Guideline 17 and trademark enforcement block new plugins from using trademarked terms (e.g., "instagram", "facebook") in their slugs.
  2. Slugs cannot be changed after approval (confirmed in https://make.wordpress.org/plugins/2018/11/21/reminder-we-cant-rename-plugins-post-approval/).
  3. Pre-2015 plugins are grandfathered and keep their trademark-containing slugs indefinitely.

The result: a small number of grandfathered plugins receive a permanent 5x text-relevance boost for trademarked search terms that no new plugin can ever obtain, regardless of quality, support, or user satisfaction.

Concrete example

Search: "instagram feed"

  • #1: Smash Balloon Social Photo Feed — slug instagram-feed (grandfathered). Gets a 5x boost on both "instagram" and "feed" via slug_text. Note: the display name no longer contains "Instagram" — Meta's lawyers required them to remove it.
  • #2: Spotlight Social Feeds — slug spotlight-social-photo-feeds. Can only match "instagram" via tags (2x boost). Cannot obtain an equivalent slug because "instagram" is now blocked.

The #1 plugin gets a 2.5x text-relevance advantage on the highest-volume search term in this category, solely because it was approved before 2015. This advantage persists even though the plugin was already forced to remove "Instagram" from its display name to comply with trademark enforcement.

The inconsistency

WordPress.org enforces trademarks in display names and blocks them in new slugs, but the ranking algorithm still rewards grandfathered slugs containing those same trademarked terms with a 5x boost. The trademark is being enforced on the surface (names, banners, icons) but continues to confer a ranking advantage underneath (slug-based search scoring).

This affects not just Instagram — any grandfathered slug containing a trademark that is now blocked gives its holder an insurmountable ranking advantage.

Code reference

The 5x slug boost is set in class-plugin-search.php:

$should_match[] = [
    'multi_match' => [
        'query'  => $search_phrase,
        'fields' => $this->localise_es_fields( 'title', 'slug_text' ),
        'type'   => 'most_fields',
        'boost'  => 5,
    ],
];

Source: https://github.com/WordPress/wordpress.org/blob/trunk/wordpress.org/public_html/wp-content/plugins/plugin-directory/class-plugin-search.php

Suggested approaches

Any of these would address the issue:

Option A: Reduce or remove slug_text from the boosted fields.
The slug was chosen by the developer at submission time, often years ago. It reflects naming strategy, not ongoing quality. Removing it from the should match (or reducing its boost from 5 to 1-2) would make rankings more dependent on current signals like active installs, support quality, and content relevance.

Option B: Exclude trademarked terms from slug-based scoring.
If a slug contains a term that would be blocked under current Guideline 17 enforcement, exclude that term from the slug boost calculation. This preserves slug matching for non-trademarked terms while removing the grandfathered advantage for trademarked ones.

Option C: Migrate grandfathered trademark slugs to compliant slugs with redirects.
Update slugs like instagram-feed to non-trademarked equivalents (the plugin already has a compliant display name). Set up 301 redirects from old URLs. This fully aligns slug policy with ranking behavior.

Option A is the simplest code change and has the broadest fairness benefit.

Change History (8)

#1 @dd32
6 weeks ago

  • Keywords search relevance elasticsearch removed
  • Milestone set to Improved Search

The main thing that we need to retain in any change to this section of code, is that a search for a plugin by slug needs to return the plugin.

In your example, Searching for instagram shouldn't need boost instagram-feed, but searching for instagram-feed should.

This doesn't help with single-term plugin names which are highly sought after because of this bump.

Note: I'm not closing this as a duplicate; but I'm fairly sure there's discussion of this in the other Search tickets: https://meta.trac.wordpress.org/query?status=!closed&component=Plugin+Directory&milestone=Improved+Search

#2 @jeangalea
6 weeks ago

Thanks for the quick look and for putting this on Improved Search.

Your refinement is sharper than my original framing: the distinction between "search matches the full slug" (navigate-to-plugin) and "search matches a sub-token of the slug" (ranking boost) captures the actual issue.

When I went back to the code to trace the technical flow, I noticed something I'd like your read on. Line 281 of class-plugin-search.php currently reads:

'fields' => $this->localise_es_fields( 'title', 'slug_text' ),

localise_es_fields is declared as public function localise_es_fields( $fields ) — single argument. Every other call site in the file passes either a single string or a single array (lines 198, 259, 271, 290, 303). Line 281 is the only one passing two positional strings, so PHP receives $fields = 'title' and silently drops 'slug_text'.

Looking at git history, this appears to have been introduced during the search reformat in [13640]. The pre-refactor code explicitly included slug_text as a separate field:

'fields' => ( $this->is_english ? [
    0 => 'title_en',
    1 => 'slug_text',
] : [
    'title_' . $this->locale,
    'title_en^' . $this->en_boost,
    'slug_text',
] ),

So slug_text was a real field in the query until the refactor. It looks like the same typo pattern you fixed in [13804] shortly after ('contributor''contributors'), just on a different line.

This changes the framing of my ticket in a useful way. The 5x slug boost I raised isn't actively in production, but it will return as soon as line 281 gets noticed and fixed. That makes this the right moment to decide how slug_text should be re-added to the query: as a tokenized most_fields match (which is what the refactor intended and what creates the sub-token concern I originally raised), or as a phrase/exact match that supports navigation-by-slug without the sub-token leakage.

The sub-token concern still applies to the intended behavior: a user searching for a bare trademark term getting a 5x boost to a grandfathered compound slug via a tokenized slug match is the inequity, and it would return the moment slug_text goes back into the query in tokenized form. This isn't specific to one plugin or platform — it applies to any grandfathered compound slug containing a term that's now blocked for new submissions under Guideline 17. Examples across different trademark categories: amazon-s3-and-cloudfront, custom-facebook-feed, google-captcha, google-sitemap-generator, instagram-feed, mailchimp-for-wp, wp-twitter-feed, wp-youtube-lyte, youtube-embed-plus. Scoping slug matching to the full slug would preserve the navigation case while closing the leakage across all of them.

Could you confirm my reading of the [13640] regression? If it's accurate, happy to put together a patch that restores slug_text to the query in a form that addresses both the bug fix and the sub-token design question at the same time. Localized to the should_match construction on line 278.

#3 @dd32
6 weeks ago

Could you confirm my reading of the [13640] regression?

:facepalm: That is indeed an issue; I've missed the array syntax around it there. Happy little accident that it skipped over the slug..

This is what I intended on it being: (To match the previous behaviour)

-'fields' => $this->localise_es_fields( 'title', 'slug_text' ),
+'fields' => $this->localise_es_fields( [ 'title', 'slug_text' ] ),

However, I'm not going to change this immediately, as it's been like this for 2 years, and the impact of doing so appears that it'll put even more weight onto the slug. (Which is something I don't want to do, I've always wanted to put less weight on it, which probably explains why i missed this in the testing of the change)

As for the rest of your comment, Yes, I agree with your understanding (I think) and a patch / PR to alter this would be reasonable if you feel like you can make the search work in a way that exact slugs are matched but overall partial matches are less boosted in results, I'll however get someone with more ES knowledge than myself to review it before I commit it :)

There might be an argument made by others that "My slug is foo-bar-baz-by-acme, I should get a slug boost if someone searches for foo-bar-baz" so worth keeping in mind that slug searching can be important, but not critical to ranking (IMHO).

#4 @jeangalea
6 weeks ago

Thanks for confirming the regression. Given the "less slug weight overall" direction and your offer to get an ES reviewer involved, I'd rather wait to see how they'd want the scoring structured before proposing a specific query shape. Happy to follow up if/when that discussion takes shape.

#5 @gibrown
6 weeks ago

Howdy. So ya that regression is a problem. Also it looks like line 198: https://github.com/WordPress/wordpress.org/blob/7c52c748eab8f1d3b33bca81f15cf30341f66694/wordpress.org/public_html/wp-content/plugins/plugin-directory/class-plugin-search.php#L198 is also incorrect, but not causing problems as long as no one adds an extra param.

So as a first point. I also kinda hate the slug boosting, but when I was building and evaluating the algorithm we needed it in order to have some consistency for use cases people had. We used to always put an exact match at the top and so heavily boosting on the slug was the best way to approximate that behavior while also ensuring that a search for "social" would not show https://wordpress.org/plugins/social/ at the top (random example, but there are many common words used as slugs).

That said, I suspect this bug/regression may be causing real issues. I've been asked in the past about a few particular searches and I think this regression could be why I didn't understand the ranking:
https://wordpress.org/plugins/search/form/
https://wordpress.org/plugins/search/forms/
https://wordpress.org/plugins/search/form+builder/
https://wordpress.org/plugins/search/quizzes/
https://wordpress.org/plugins/search/lead+magnet/
https://wordpress.org/plugins/search/reCAPTCHA/

Not having any boosting on the slug is also a problem.

For the given "instagram" examples, I don't think adding the slug boosting back in would matter at all. The active installs is by far the heaviest weighted field and in the two example results it is a difference between a plugin with 60k installs and one that has a 20x bigger install base. For the goal of guiding users to plugins where they are likely to have the best long term support experience this seems like a good ranking.

A 5x boost sounds like a lot, but you have to look at it in context. The boost for title is 5x plus another 2x against partial prefix string phrase matches (title.engram). Excerpt, Description, and Tag matches also have a 2x boost. The biggest difference with these examples is that active_installs of 1m is going to have a ~5x boost while 60k is about 2.5x (plus support and ratings).

@dd32 my recommendation is to try fixing the bug that was introduced and see how it affects some of these search examples. It is a bit of a moving target because plugin authors are constantly adjusting their descriptions to boost rankings. Still I think we should try it. I suspect it won't make that much of a difference for most cases tbh.

#6 @dd32
5 weeks ago

Also it looks like line 198: https://github.com/WordPress/wordpress.org/blob/7c52c748eab8f1d3b33bca81f15cf30341f66694/wordpress.org/public_html/wp-content/plugins/plugin-directory/class-plugin-search.php#L198 is also incorrect, but not causing problems as long as no one adds an extra param.

That's not an issue; Passing a string as the first/only param will cast to an array, it's only when arg 2+ is provided not in an array that the issue arises.

is to try fixing the bug that was introduced and see how it affects some of these search examples

I'll defer to your judgement here, Let's give that a shot and see how it affects results.

#7 @dd32
5 weeks ago

In 14809:

Plugin Directory: Search: Restore slug boosting in the search algorithm.

This was a bug introduced in [13640], which went unnoticed due to a desire to reduce reliance upon slugs for search boosting.

Slug boosting has some legitimate uses however, so this is being restored for the time being.

Props jeangalea, gibrown.
See #8225/

#8 @gibrown
5 weeks ago

Took a look through some results and it feels like it did fix a few minor ranking things in those form results. Not really the main issues though, but I think the fix for them is more about boosting for plugins that have lots of reviews combined with the review average rather than only the average.

Note: See TracTickets for help on using tickets.