WordPress.org

Making WordPress.org

Opened 6 months ago

Last modified 4 months ago

#2642 new enhancement

Better boosting for phrase matches (or add support for the user to specify a phrase)

Reported by: gibrown Owned by:
Milestone: Plugin Directory v3 - Future Priority: normal
Component: Plugin Directory Keywords:
Cc:

Description

There are a number of examples of users wanting to find a particular plugin. We used to support this with exact matching against slugs, but that has other downsides. In general, exact matches don't work great when the words in the plugin name are very generic and there are not that many of them. But when there are many words we can probably do better.

Some examples:

  • "Easy AdSense Ads & Scripts Manager"
  • "Post Tags and Categories for Pages"

When the user has written a lot of characters, I'm wondering if we should increase the boosting of the phrase search. I experimented with higher boosting on this field at one point already and decided against it, though other things have changed since then.

I think that just adding support for having the user specify the phrase by putting quotes around it would solve this use case and would be less likely to have bad effects for other searches. About 0.2% of all unique searches currently use quotes, so some users are already using them (though not many).

I think the most compelling use case this adds support for is when a user wants to find a specific plugin to be able to get to the support forums.

Change History (30)

#1 @Ipstenu
6 months ago

Quotes is how Google does it, so that behavior should be familiar enough. I think that's a good choice.

Maybe we could also suggest they look at the tag for single term search? Like "you searched for the word 'forum' - have you looked at the forum tag yet?"

#2 @agilelogix
6 months ago

Here is an example of same name search plugin.

https://wordpress.org/plugins/search/post+timeline/ (not in search with exact name)
https://wordpress.org/plugins/post-timeline/

This is a big problem if the search will not show new plugins or with less active downloads there is no way to climb up the ranking.

Edit: After these search changes, I see the download stats dropped to 0, I would suggest creating few new plugins as test cases for search optimization, in my opinion, the old search was somehow better than current one, every other day new plugins comes up with great ideas than the old one, it would be good to give them some more chances.

Thanks

Last edited 6 months ago by agilelogix (previous) (diff)

#3 @gibrown
6 months ago

Good example @agilelogix thanks!

This is a big problem if the search will not show new plugins or with less active downloads there is no way to climb up the ranking.

There is a bit of a catch-22 here unfortunately. A generic-ish name like "Post Timeline" is the type of search that anyone could run whether they are looking for that plugin or not. Given the low install count, it is probably pretty unlikely that they are actually looking for this plugin when trying to install it, and unfortunately we just can't be sure.

But it is an interesting case because a "post timeline" is a fairly different thing from a plugin that has both the words "post" and "timeline" in it. It's actually a pretty good argument for supporting quotes I think.

Better phrase boosting would also help somewhat (I suspect the phrase boosting is probably too low), but maybe not too much.

This ticket was mentioned in Slack in #meta by gibrown. View the logs.


6 months ago

#5 in reply to: ↑ 4 @agilelogix
6 months ago

In my opinion, the name of the plugin is pretty perfect, like people wants to make a timeline of their Posts, so they will type "Post Timeline", my plugin does the same thing. If Active Downloads is deciding factor then new plugins have no chance to get up in ranking.

Replying to slackbot:

This ticket was mentioned in Slack in #meta by gibrown. View the logs.

#6 @quantumcloud
6 months ago

Agreed with @agilelogix . The new search ranking algorithm with massive ranking boost for older plugins with more downloads and installs makes it next to impossible for new plugins with better ideas to gain any traction.

The downloads for our plugins dropped over 90% with the update. The plugin age seems to be the single most important factor now - establishing a status quo. We have 4 new plugins ready to submit. I understand that it is not WordPress' responsibility to help us get more downloads. But honestly, as plugin developer, this change is frustrating. Why bother releasing a useful Free version in the repository that will hardly ever get downloaded.

Sorry, if I am venting the wrong place or said something wrong. My intention is to only share the difficulty and frustration we are facing new as relatively new-comer to WP plugin development.

My 2 cents, reduce the value of plugin age, active installs and downloads as ranking factors.

Thanks

#7 @gibrown
6 months ago

Another potentially interesting example.

https://wordpress.org/plugins/search/buddyforms/ ends up finding a lot of the other buddyforms related plugins rather than https://wordpress.org/plugins/buddyforms/

This is probably because the other plugins are mentioning "buddyforms" a lot while the main plugin has no real reason to mention itself.

This ticket was mentioned in Slack in #meta by gibrown. View the logs.


6 months ago

#9 @lukecavanagh
6 months ago

It looks like it does find that plugin just fine, since the plugin name is
"Form Builder & Front End Editor BuddyForms" whilst the plugin slug just happens to be "buddyforms".

https://plugins.trac.wordpress.org/browser/buddyforms/trunk/BuddyForms.php#L4

#10 @gibrown
6 months ago

Ya, I take it back, that is not a problematic search. I think it was some caching problems with how the plugin was getting updated.

#11 @gibrown
6 months ago

"team" gives somewhat noisy results also due to it being mentioned in a number of plugins with very large install counts. Good example to try and tweak a bit.

https://wordpress.org/plugins/search/team/

This ticket was mentioned in Slack in #meta by ocean90. View the logs.


5 months ago

This ticket was mentioned in Slack in #meta by sergey. View the logs.


5 months ago

#14 @SergeyBiryukov
5 months ago

Related/duplicate: #2734

#15 @gibrown
5 months ago

Not exactly related to exact matching, but is related to seeming noise in the results and should be looked at more closely are: "charts" and "directory".

In both cases, around results 10-14 we start to see some high install plugins match that are not really very relevant to the search. I suspect it is because all the other results are out of date or not tested on the latest WP, but should look at this in more detail. Somewhat makes me wonder if the boosting between the 10, 100, and 1000 plugins is a bit too steep though.

#16 @tobifjellner
5 months ago

Let's say I find out about a specific plugin one way or another, I got a direct URL from a trusted article or a friend, for instance. So I'm right there, reading the page of this very plugin. And I want to install it. However, I prefer installing without using upload of a zip, so I'll copy the slug part of this plugin, then I'll jump over to the admin pages of my website, and go to plugin - Add new. And here my only option is to paste that slug into the search field.
Before I'd typically see the wanted plugin within 3-4 first results. Now I may have to browse much longer than that.

I can see different ways of fixing this:

  1. I change my behavior totally and start downloading and uploading zip-files instead.
  1. Adjust the search algorithm so that IF there's a 100% slug match for the search phrase, then it will be presented first, possibly marked "Slug match" or similar. (I.e. a meta adjustment of search algorithm)
  1. In WP-admin, when adding a plugin, add a third option: standard-search, zip-upload, and indicate slug of plugin. (and this field would allow any of
    [slug|((http(s)://)wordpress.org)(/)plugins/slug|plugins]
    

Option 3 would mean fixing my biggest pain point in a totally different way.

(Note: edited to correct numbering of these options. I had started with 0, but the wiki formatting changed that to 1...)

Last edited 5 months ago by tobifjellner (previous) (diff)

#17 @tobifjellner
5 months ago

For the suggested change in WP core UI, I've created https://core.trac.wordpress.org/ticket/40475

This ticket was mentioned in Slack in #meta by sergey. View the logs.


5 months ago

#19 @gibrown
5 months ago

I really like the idea of matching the plugin's full url. There are some tricky pieces with the localization of the urls, but it is a pretty easy solution.

I also think supporting quotes would help this case a lot.

Exact slug matching is very problematic because of all of the slugs that are common words. eg "backup".

#20 @tobifjellner
5 months ago

Have in mind, though, that visitors now are often moved to various localized versions, depending on their Accept-Language headers.

This ticket was mentioned in Slack in #meta by gibrown. View the logs.


5 months ago

#22 @gibrown
5 months ago

"knowledgebase" is another interesting example where the boosting by active installs pushes out 1-2 poorly reviewed/old/very-very-new plugins that should probably be in spots 10-14.

https://wordpress.org/plugins/search/knowledgebase/

#23 @yani.iliev
5 months ago

Changing of plugin tags and plugin display name does not have any implications and is almost instantly indexed. This allows for adjusting the tags and the display name, tests search rankings, and repeat until desired result is reached.
The search should look at how long a plugin has been using a certain name and also how old the tags are.
A new plugin that added the tag "automate" should not be put ahead of a plugin that had the tag "automate" for the last 1 year.
The same should apply to the name, any recent change needs to reset the weight of the display name.

Active Installs needs to have lower weight compared to Reviews.
Reviews provides instant feedback of how satisfied users are with a plugin - active installs tells that a plugin is active but it does not tells if users are satisfied and that this plugin should be ahead in the results.
Plugins that are used once and then wiped suffer a lot from this - one example is migration plugins that do not need to stay activated after a transfer.

Thinking more about it, reviews and forum posts tell the plugin story from user perspective - they define the most what the plugin does and what it is good at. They should be indexed and added to the rankings.

Stemming is not very good - migration and migrator match very different plugins

#24 follow-up: @gibrown
5 months ago

Hi @yani.iliev thanks for the feedback very helpful, will look into these. On

Stemming is not very good

We are intentionally using minimal stemming so for English we are only removing plurals. Very aggressive stemming like you suggest tends to reduce search relevancy based on most of the papers I have read. A good one looking at stemming in various languages is https://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf

Off the top of my head, a good example where more aggressive stemming would hurt us is "stored", "store", and "storing". This change would likely boost some popular plugins that are storing data into the results for building an e-commerce "store".

It may be interesting to try more aggressive stemming as a boost, but I think the impact would be marginal at best and we don't really have the A/B testing infrastructure in place to evaluate such a change. (I wish we did because I'd really like to know what impact this would have).

#25 in reply to: ↑ 24 @yani.iliev
5 months ago

Replying to gibrown:

Hi @gibrown,

Could you provide some details on the stemming algorithm currently in use and I might be able to provide some suggestions how to make it slightly more aggressive? I think there is a case of understemming in English and I'd like to see a slightly more aggressive stemming but not to cause overstemming.

Every language different - what works in English will likely not work in Bulgarian - I'd optimize per language.

You mentioned that there is no A/B testing infrastructure. In the search data that you have - is there a link between a search term and the plugins that were installed after the search?

#26 @gibrown
5 months ago

Hi @yani.iliev

Most of my thinking on stemming is still applicable from this post: https://greg.blog/2013/05/01/three-principles-for-multilingal-indexing-in-elasticsearch/

The analyzers we are using are here: https://github.com/Automattic/wpes-lib/blob/master/src/common/class.wpes-analyzer-builder.php

is there a link between a search term and the plugins that were installed after the search?

No unfortunately.

#27 follow-up: @gsexton
4 months ago

I'm seeing cases where search relevancy is overwhelmed by large numbers of downloads. For example, if you search for the single word "calendar", JetPack is listed in the results of page 1. I looked at the readme.txt file and found that it contains that keyword exactly one time. Evidently, it's massive installed base overwhelms many plugins that from a straight textual analysis are vastly more relevant.

#28 in reply to: ↑ 27 @quantumcloud
4 months ago

100% agreed. The search results are quite polluted with false positives from big names. I reported the issue multiple times but nothing has been done on it so far. The highest priority seems to be that the big names show up for the most popular searches on 1st page. All other searches and the search result quality are largely ignored at this point. Not to mention how next to impossible it has become for new plugin developers to gain any momentum as they start with 0 active installs. New WordPress plugin developers can pretty much forget about their plugins ever getting downloaded from the repository through organic searches.

Truly hope that the situation is rectified asap.

Replying to gsexton:

I'm seeing cases where search relevancy is overwhelmed by large numbers of downloads. For example, if you search for the single word "calendar", JetPack is listed in the results of page 1. I looked at the readme.txt file and found that it contains that keyword exactly one time. Evidently, it's massive installed base overwhelms many plugins that from a straight textual analysis are vastly more relevant.

This ticket was mentioned in Slack in #meta by gibrown. View the logs.


4 months ago

#30 @gibrown
4 months ago

#2832 was marked as a duplicate.

Note: See TracTickets for help on using tickets.