Making WordPress.org

Opened 9 years ago

Closed 8 years ago

Last modified 3 years ago

#1692 closed enhancement (fixed)

Plugin search quality improvements

Reported by: tellyworth's profile tellyworth Owned by: tellyworth's profile tellyworth
Milestone: Plugin Directory v3.0 Priority: normal
Component: Plugin Directory Keywords: dev-feedback needs-docs 2nd-opinion needs-testing
Cc:

Description

We need some improvements to ElasticSearch relevance scoring, and possibly data storage and indexing, in order to produce better quality results.

Some ideas for indicators of good quality results:

  • A search for a general term like "gallery" or "shopping cart" should favour plugins that have been updated relatively recently, are popular, and tested-up-to recent stable releases of WP.
  • A search for a plugin name, or partial name, that is relatively unique (such as "akismet" or "super cache"), should show that plugin as the top result.
  • A search for a partial plugin name that is somewhat generic (such as "contact form" or "wordpress seo") should show that plugin near the top result, provided it is recent and popular.
  • Plugins that are abandoned, have few active installs, or possible compatibility issues, should be ranked lower or not shown at all in search results.

The approximate active install count, modified date, and tested-up-to version, are already indexed, and should go a long way towards ranking results appropriately.

Attachments (2)

inject-translated-meta-keys.diff (3.5 KB) - added by tellyworth 8 years ago.
precision-matching.diff (3.9 KB) - added by gibrown 8 years ago.
Improve precision of search relevancy

Download all attachments as: .zip

Change History (156)

#1 @danieliser
9 years ago

I would also add a score for # of reviews to avg rating.

For example a plugin with one 5 star review isn't necessarily the same as one with 400+ avg 4.9.

Maybe make results sortable in general based on any of these criteria would be useful in general.

Sort By: Recently Updated, # of Reviews, Avg Rating, Oldest - Youngest.

#2 follow-up: @tellyworth
9 years ago

I agree that user-controlled sorting and/or filtering will be useful.

Ratings are somewhat complex and deserve to be considered separately I think. The best way to deal with them as far as search is concerned is to compute a score when ratings are changed, using a method something like this:

http://www.evanmiller.org/ranking-items-with-star-ratings.html

..and store the result in postmeta for ElasticSearch to use as a ranking factor.

#3 in reply to: ↑ 2 @danieliser
9 years ago

Love it, I do think a weighted score is more appropriate though, currently not even any of the sites that scrape the WP plugin repo can sort by true rank, they always show plugins with 1 5 star at the top simply because its a perfect 5. Thats not truly representative of the info though as often the plugins own author is the first to review it... Cough Cough, realizing I did this..

Replying to tellyworth:

I agree that user-controlled sorting and/or filtering will be useful.

Ratings are somewhat complex and deserve to be considered separately I think. The best way to deal with them as far as search is concerned is to compute a score when ratings are changed, using a method something like this:

http://www.evanmiller.org/ranking-items-with-star-ratings.html

..and store the result in postmeta for ElasticSearch to use as a ranking factor.

#4 follow-up: @gibrown
9 years ago

I've built a custom Elasticsearch index for the plugins which does a better job of indexing a lot of the data that was previously only available in meta fields. This index is also running on our Elasticsearch 2.3 cluster which means that all of the newer aggregations and query options should be available (previous index was on 1.3).

I've disabled the index that @tellyworth was using before and enabled this one. Its backwards compatible, so the currently deployed test search query is still working.

The index building code and full mapping can be seen in this gist: https://gist.github.com/gibrown/d4750aa773154948c81791fe18bdc521. It relies on the wpes-lib framework (https://github.com/automattic/wpes-lib]).

Some details on how we are currently indexing content:

  • We have a separate field for each custom language analyzer we have configured. Currently have 29 language analyzers: ar, bg, ca, cs, da, de, el, en, es, eu, fa, fi, fr, he, hi, hu, hy, id, it, ja, ko, nl, no, pt, ro, ru, sv, tr, zh
  • All of the analyzed text fields ('content', 'title', 'excerpt', and 'upgrade_notice') have an associated field for this analyzer (eg 'content_es').
  • If a language doesn't have a custom language analyzer, then it should use the default field (eg 'content') which tries to use some reasonable defaults.
  • Right now the non-English fields are being populated by looking for an associated meta value (eg content for Spanish looks for the meta key 'content_es'). An open question is whether we should instead reindex nightly and query GlotPress for the translation.
  • For every content field (and a few others) we also have an _ngram version of the field
  • ngrams take an individual token (like 'wordpress') and stores it as multiple character sequences (eg 2+3 grams: 'wo', 'wor', 'or', 'ord', 'rd', 'rdp', 'dp', 'dpr', 'pr', 'pre', 're', 'res', 'es', 'ess', 'ss')
  • ngrams should enable us to build very fast and relevant instant search results so that a user never has to hit enter. Simply run a query any time the user seems to pause for more than a 100-200 ms.
  • There is not anything special that the client needs to do for ngrams or language analysis. It just needs to run a multi_match query on the appropriate fields.

Some fields I want to highlight and how I think they could be used for improving search results:

  • Obviously all of the content fields should be searched with a multi_match query. Adding phrase queries to the query should also help. Example:
{ "query" : { "bool" : {
  "must" : [
    "multi_match" : {
      "type" : "cross_fields", //enables matching terms in different fields (eg "Matt Hello Dolly")
      "fields" : ["content_en","title_en^2","excerpt_en","upgrade_notice_en^0.5","slug_ngram","header_author","contributors"],
      "query": "Matt Hello Dolly",
    }
  ],
  "should" : [
    "multi_match" : {
      "type" : "phrase", //treat the whole
      "fields" : ["content_en","title_en^2","excerpt_en","upgrade_notice_en^0.5","slug_ngram","header_author","contributors"],
      "query": "Matt Hello Dolly",
    }
  ],
} }

  • number_of_translations : Counts the number of fields translated for each language (translating content, title, excerpt, and upgrade_notice to one language will get 4 points). We should rank translated plugins higher. Should be a good signal of quality and encourages plugin authors to translate plugins. Just multiply in a log1p field_value_factor scoring
  • tested: latest WP version as a float. Higher is always better. Just multiply in a log1p field_value_factor scoring
  • required : potentially could use this as a signal of how long a plugin has been around. But easy to game.
  • stable_tag : not sure if this is useful :)
  • tagged_versions : unsure if useful
  • number_of_versions : more tags (maybe) means it has been supported for a while. Would be better if we had some dates when the tags happened
  • percent_on_stable : based on meta.usage and the meta.stable_tag this is (roughly) percentage of users who trust this plugin enough to upgrade. Could be used in a 'script_score' to adjust number of active installs or in a log1p field_value_factor scoring.
  • active_installs : obviously in log1p field_value_factor scoring.
  • support_resolution_yes, support_resolution_no, support_resolution_mu : (I'm not sure about definition of this compared to support threads)
  • support_resolution_percentage: = support_resolution_yes / (support_resolution_yes + support_resolution_no + support_resolution_mu) log1p field_value_factor scoring
  • support_threads : log1p field_value_factor scoring
  • support_threads_resolved : log1p field_value_factor scoring
  • support_threads_percentage : support_threads_resolved / support_threads : log1p field_value_factor scoring
  • contributors_active_installs: the sum of the number of active installs across all the plugins and all the authors of this plugin. Should be a great signal. Example https://wordpress.org/plugins/slug-control/ by Make Jaquith has 30 active installs. Because of who wrote it I have no doubt it is a great plugin for that use case, but 99% of users don't know that. Total active installs by the contributors should be a good proxy for this.

Open issues (I think):

  • Should we bulk reindex this index daily and query GlotPress to get the latest translations? Can someone provide me with the PHP code for doing that?
  • We should probably have a complete whitelist of supported langs and have a field for each with the appropriately configured lang analyzer? Is there a full list?
  • What should the full query be when doing instant search?
  • What should the full query be when doing non-instant search?
  • It might be nice to add some meta fields with actual user names of the contributors to search against.
  • Related to @tellyworth's point about generic terms. We should not treat an exact name match as highly as we have been in the past. I think just boosting the title a bit will do that, but it needs some experimentation. I'm biased with this example, but let's look at https://wordpress.org/plugins/search.php?q=related+posts. The top recommendation has 40k installs. YARPP at #2 has 300k. Jetpack is not on the first couple of pages despite having far more installs. The currently deployed test query though is also far too heavily weighted towards number of installs (see https://cloudup.com/cZK8ZPNUE5p). That's pretty terrible results. We need to find some middle ground and balance for these weightings. "SEO" is another case that is probably worth testing with.

#5 @dd32
9 years ago

support_resolution_yes, support_resolution_no, support_resolution_mu : (I'm not sure about definition of this compared to support threads)

@gibrown I think you might be looking at some old data there, as those fields aren't present anymore, although I'm seeing a lot of bad data in there at present.
The two fields are support_threads which is the total number of support threads from the last 2 months, and support_threads_resolved which is the number of those which are marked as resolved. For example, a plugin has 5 threads in 2 months, 4 of which are resolved.

#6 @gibrown
9 years ago

I think you might be looking at some old data there, as those fields aren't present anymore, although I'm seeing a lot of bad data in there at present.

Ok, will remove those fields when I next rebuild the index. We're running some tests of ngram indexing for other purposes which will probably cause us to change how we build the ngram fields. Won't impact anything on the client side. Just improvements in relevancy of the results and how fast the queries run.

#7 @aaroncampbell
9 years ago

First, sorry that this is based on a specific search, but hopefully understanding what's going on with some specific queries will help me wrap my head around the new search.

I was testing out the new plugin repo a little and thought I'd search for "ithemes" to take a look at iThemes Security (for those that don't know, the plugin I work on...so I was trying to use it as a reference point for what's changed, etc). I was surprised to see "White Label CMS", "ManageWP Worker", "Sidekick", "MainWP Child", and "InfiniteWP Client" all show before iThemes Security. iThemes Exchange and other smaller iThemes plugins don't even display.

I tried to dig through some of them and see where "ithemes" was matching:
White Label CMS

  • Twice in one line of the changelog: "Patch submitted by Chris @ iThemes. Better support for users of iThemes."

InfiniteWP Client

  • Once in FAQ: "Does ManageWP work with all popular plugins like WordPress SEO by Yoast, WPTouch, Google XML Sitemaps, NextGEN Gallery, Contact Form 7, WooCommerce, iThemes Security, WordPres importer, Wordfence Security and others?"

Sidekick

  • Once in Changelog: "Fixed incompatibility with iThemes Builder and For Loop JS Loops"

Compare those to iThemes Security:

  • In title: "iThemes Security (formerly Better WP Security)"
  • Contributor
  • 20x in description tab
  • Once in installation tab
  • 11x in FAQ
  • 37x in Changelog
  • Once in screenshots

It also has more active installs than any of the ones listed before it (I see that it's a metric).

I noticed other issues with searches like "Aaron Campbell" and "Google Analytics" but maybe dealing with this iThemes one will help me understand those.

#8 follow-up: @danieliser
9 years ago

Are these changes being worked out on the live site? IE With each change you mention should the result orders be adjusted accordingly?

I ask because I have been watching this ticket and noticed that since yesterday my own plugins have dropped signifigantly. I know I am being biased but my own plugin Popup Maker seems way more relevant to the keyword popup or popups, but is shown in the 16th spot, to top it off one of my main competitors, and arguably the other most popular (more installs than mine, less reviews) is pushed to the 3rd page.

Is there a breakdown of weighting/scoring metrics so that everyone can give feedback on it?

I have a feeling keyword matches in the slug, tags, title and keyword stuffing in the body/description is being over-weighted causing less relevant results to come up higher simply because of word counts.

I think the weight on those need to be scaled back and weighed evenly against # of reviews/avg rating & udpate activity.

#9 in reply to: ↑ 8 @samuelsidler
9 years ago

Replying to danieliser:

Are these changes being worked out on the live site? IE With each change you mention should the result orders be adjusted accordingly?

No, they aren't. Separately, @Otto42 is making changes to the current weighting. Please ping him in #meta on Slack.

Last edited 9 years ago by samuelsidler (previous) (diff)

#10 @Otto42
9 years ago

The minor adjustments I've been making to the current system don't have anything to do with this ticket. This ticket is about the changes for the new system which will be replacing the current one.

You can ping me directly about the current system.

#11 @obenland
9 years ago

  • Milestone changed from Plugin Directory v3 - M3 to Plugin Directory v3 - M4

This ticket was mentioned in Slack in #meta by obenland. View the logs.


9 years ago

#13 in reply to: ↑ description ; follow-up: @rglennnall
9 years ago

i found this thread when i was about to start a "suggestions" post regarding the current search features. am glad to hear that it's being improved.

my personal thoughts were initially that, as often as i return to plugins depository, it sure would be nice if i could mark certain plugins as ones i've already seen, and are satisfactory to me, or not, in some way. There's a Favorites HEART already, so i was just thinking something along those lines. "Seen It - Has Potential" "Seen It - Not Worth a 2nd Look", etc...

especially after having downloaded and tested a plugin to find real deficiencies, it's frustrating to then, months later, spend time going back to the plugin page and ultimately remembering, "oh, yeah, I've already tested this - it broke my website..."

thanks for ya'lls consideration of something simple like this. would save many people with poor memories loads of time, i think.

cheers,
GN

Replying to tellyworth:

We need some improvements to ElasticSearch relevance scoring, and possibly data storage and indexing, in order to produce better quality results.

Some ideas for indicators of good quality results:

  • A search for a general term like "gallery" or "shopping cart" should favour plugins that have been updated relatively recently, are popular, and tested-up-to recent stable releases of WP.
  • A search for a plugin name, or partial name, that is relatively unique (such as "akismet" or "super cache"), should show that plugin as the top result.
  • A search for a partial plugin name that is somewhat generic (such as "contact form" or "wordpress seo") should show that plugin near the top result, provided it is recent and popular.
  • Plugins that are abandoned, have few active installs, or possible compatibility issues, should be ranked lower or not shown at all in search results.

The approximate active install count, modified date, and tested-up-to version, are already indexed, and should go a long way towards ranking results appropriately.

Last edited 9 years ago by rglennnall (previous) (diff)

#14 in reply to: ↑ 4 @ocean90
9 years ago

Replying to gibrown:

  • Should we bulk reindex this index daily and query GlotPress to get the latest translations? Can someone provide me with the PHP code for doing that?

Please no, let's not make GlotPress another source for it. All data should be stored in the plugin directory itself. See #1691 for possible implementations.

#15 @rglennnall
9 years ago

  • Keywords dev-feedback needs-docs added

A thought on whatever standards there are, or are even suggested, for plugin developers and their submissions to the Plugin Directory:

After having experienced likely thousands of plugins, i'm amazed at the incredible range of 'installation and use instructions' presented with the plugins by the authors. Some are quite thorough, but more are woefully inadequate to nonexistent - there are times when i actually have to abandon my attempts to try one out before I even see it in action because i cannot find where or how it's implemented.

I realize there are vast arrays of developers and development teams submitting plugins, and I realize that WP cannot monitor the plugins.

what i continue to wonder is what kinds of standards or suggestions, or tips, exist to aid these developers in a) presenting a useful product, and b) having successful conversions (free or paid)...?

It's suggested within this ticket thread that we pay attention to the 'last updated' data, and this certainly tells us something toward what to expect, but we nevertheless must invest some time (often WAY TOO MUCH time) in finding and testing a needed plugin.

along these lines, it's also fairly depressing to scroll through hundreds of plugin search results in which many of them haven't been updated, likely even looked at, by its authors for over a year, sometimes 2 years.

So my questions/suggestions are 2:

1) ARE there standards that address not only the coding but the user instructions provided at some level for plugin authors? If not, shouldn't there be?

2) ISN'T there a justifiable means of weeding the garden of, say, year-old or more, unattended weeds? Seems to me that an author who wants his plugin to be a success would put in a bit if time to at least bump his product up by "updating" the thing periodically, and conversely, these plugins which are clearly abandoned don't seem to be given any concern by its author and will not likely be missed by them.

I appreciate the FANTASTIC work WordPress and its people (you) does for we developers. Big Time.

thanks,
GN

This ticket was mentioned in Slack in #feature-shinyupdates by obenland. View the logs.


9 years ago

#17 @afercia
9 years ago

Hello everyone. Reporting here a question raised on Slack:

about the new plugin search, will also the search "by author" be improved? Asking because currently you have to search for the author username to get some results and I guess most of the users they just don't know the author username. For example, if I search for "author:John Blackbourn" I don't get any results. But if I search for "author:johnbillion" I get 21 results.

Unless I'm doing something wrong, seems something worth checking. Thanks! :)

#18 @coffee2code
9 years ago

@afercia: See #100, which was waiting until this rewrite and would be a welcome improvement to the search once tackled, though I don't believe it has been planned for yet.

#19 @afercia
9 years ago

@coffee2code that is, thanks :)

This ticket was mentioned in Slack in #meta by obenland. View the logs.


9 years ago

#21 @obenland
9 years ago

  • Owner set to tellyworth
  • Status changed from new to assigned

#22 @tellyworth
9 years ago

In 3310:

Some simple improvements to search results - better handling of queries for titles, more of a boost to recent and popular plugins.

More tweaks to come.

See #1692.

#23 @tellyworth
8 years ago

attachment:inject-translated-meta-keys.diff leverages [3319] and [3320] to pull translated content from GlotPress and inject it into (non-stored) meta keys like post_content_fr_fr. Those meta values can then be indexed by ES for localized search. It attempts to only inject them if a Jetpack sync is about to occur, since fetching those values any other time would be needlessly expensive.

It's difficult to test and probably doesn't quite work right. Posting here for a sanity check before trying it live - see #1691 for some discussion also.

I've intentionally limited this to two languages and post_ids below 200, so we can manually sync those posts and see what goes wrong before expanding to everything.

#24 @obenland
8 years ago

  • Milestone changed from Plugin Directory v3 - M4 to Plugin Directory v3 - M5

This ticket was mentioned in Slack in #meta by obenland. View the logs.


8 years ago

#26 follow-up: @squarecandy
8 years ago

1) This is awesome and much needed. +1. Thanks to everyone working on it.
2) My only comment after reading the thread above is to be careful with the concept of "weeding the garden". Sure there's a ton of complex over-reaching now-abandoned plugins that need to be weeded out, but there are also some plugins that are very small, very useful and may not need a lot of tweaking. Perhaps plugins without recent updates get lower ranking in the default sort, but they should be easy to find by searching by title, author and sorting by various attributes as discussed.

#27 in reply to: ↑ 26 @rglennnall
8 years ago

Replying to squarecandy:

1) This is awesome and much needed. +1. Thanks to everyone working on it.
2) My only comment after reading the thread above is to be careful with the concept of "weeding the garden". Sure there's a ton of complex over-reaching now-abandoned plugins that need to be weeded out, but there are also some plugins that are very small, very useful and may not need a lot of tweaking. Perhaps plugins without recent updates get lower ranking in the default sort, but they should be easy to find by searching by title, author and sorting by various attributes as discussed.

No question - many obscure plugins I've found indispensable - trusting the process... :)

#28 @tellyworth
8 years ago

In 3449:

Inject translated content, title, and excerpt into pseudo-meta values for search indexing.

This is intentionally limited for testing purposes to fr and es locales, and to post IDs below 200.

See #1692.

#29 @tellyworth
8 years ago

In 3494:

Tweak search scoring to give better results with current data.

Date scoring is over-sensitive because modified dates currently do not accurately reflect plugin updates.

See #1692.

#30 @tellyworth
8 years ago

In 3498:

Fix some issues with the pseudo-meta translated content for JP sync.

See #1691, #1692

#31 @tellyworth
8 years ago

In 3500:

Add last_updated meta value to plugin posts.

The correct post_modified date isn't available to ElasticSearch, this will help till we resolve that.

See #1692

#32 @tellyworth
8 years ago

In 3504:

Search: filter on tested-up-to >= 4.0.

See #1692

#33 @tellyworth
8 years ago

In 3505:

Search: improve [3504] by calculating the version cutoff as WP_CORE_STABLE_BRANCH minus 5 releases. Thanks @dd32.

See #1692

#34 @tellyworth
8 years ago

In 3509:

Search: add plugin_status meta at import, so we can filter on 'disabled', which is ignored by Jetpack sync.

See #1692

#35 @tellyworth
8 years ago

In 3519:

Plugin directory: include all available translations in pseudo-meta for search.

Also add some defensive code, better testing and bugfixes.

See #1691, #1692

#36 @tellyworth
8 years ago

In 3522:

Plugin directory: search locale-specific fields when get_locale() is non-English.

English fields are still searched as a low-weighted default, in case translated content is not available.

See #1691, #1692

#37 @gibrown
8 years ago

New index has been built and deployed with fields for all locales. Form is:

  • content_es_ES
  • title_es_ES
  • excerpt_es_ES
  • upgrade_notice_es_ES

Also added fields for:

  • disabled - a boolean
  • plugin_modified - a date field

#38 @tellyworth
8 years ago

In 3529:

Plugin directory: use new plugin_modified value for search scoring.

See #1692

#39 @tellyworth
8 years ago

In 3535:

Plugin directory: exclude disabled plugins from search.

Also rearrange filters somewhat, and reduce the date decay.

See #1692

#40 @tellyworth
8 years ago

In 3536:

Plugin directory: treat en_AU, en_GB etc as separate locales for search.

See #1692

#41 @tellyworth
8 years ago

In 3537:

Plugin directory: use log scale for active_installs when scoring, to give better results for exact name searches.

See #1692

#42 @tellyworth
8 years ago

In 3542:

Plugin directory: include tested-up-to as a factor in result scoring.

See #1692

#43 @tellyworth
8 years ago

In 3543:

Plugin directory: use WP_Http to query the translate API.

See #1691, #1692

#44 @tellyworth
8 years ago

In 3568:

Plugin directory search: give a boost to exact phrase matches in title; also incorporate support_threads_resolved.

See #1692

#45 @tellyworth
8 years ago

In 3571:

Plugin directory search: boost the translated title, not just title_en; include excerpt in search fields.

See #1692

#46 @tellyworth
8 years ago

In 3572:

Plugin directory search: use log2p so as not to exclude plugins with zero resolved support threads.

See #1692

#47 @tellyworth
8 years ago

In 3573:

Plugin directory search: lower the title phrase match boost now that [3572] has resolved issues with newer and less popular plugins.

See #1692

#48 @tellyworth
8 years ago

In 3591:

Plugin directory search: fix an off-by-one date range bug that was excluding some plugins from search results.

See #1692

#49 @tellyworth
8 years ago

In 3597:

Plugin directory search: further tweaks for search quality - notably, better handling of close name matches.

See #1692

#50 @remyb92
8 years ago

Hello,

I'm coming after reading this article on WPTavern, https://wptavern.com/new-wordpress-plugin-directory-now-in-open-beta where they ask for plugin developpers to report issues or bug.

So I did several tests on the new plugin directory the main issues I encoutered are on the search results, so I comment on this ticket.

1. When I search for several important plugin, I got many unrelevant results. For instance :
a) Searching for "Cache" (Say i want a Cache plugin) I have on first page : TablePress,NextGEN Gallery,Yoast SEO,MailPoet Newsletters which are not cache plugin.
b) Searching for "Translate" (Say I went a translation plugin) I have on first page : only 4 out of 14 plugins are translation plugin. The other are TablePress,Yoast SEO, MailPoet Newsletters, WooCommerce
c) Searching for "Image compression" (Say I went a compression plugin for image), I got again Jetpack, Events Manager, MailPoet Newsletters, NextGEN Gallery which are not relevant.

2. Some terms don't returns anything.
a) I search "Newsletters" returns nothing (but Newsletter with no "s" returns something). Which is weird because some plugins are named with the "s", like : MailPoet Newsletters, Email Subscribers & Newsletters
b) I search "translation" it returns nothing. It should returns translation plugins. (Searching "translator" returns "TablePress" as a first result, which is not relevant.)

3. It doesn't handle typos.
a) Searching for "woocommercz" doesn't return nothing and should return woocommerce plugin. Someone manage to do it with algolia : https://wordpress.algolia.com

I hope this tests can help. I'm not sure why the results are ordered this way but to me the search results is a lot less relevant than the actual one.
It seems to many weight is given on active installs other relevancy which cause some plugins to be in almost every search result, like MailPoet or TablePress.
Cheers.

#51 @SMB-dev
8 years ago

  • Keywords 2nd-opinion added

Search for "jobmanager" omits plugin. Similar to @remyb92 comments - this is with a plugin I manage.

While I know the Search Functionality is in Beta it does need serious work.

As the Lead Dev for Job Manager https://wordpress.org/plugins/job-manager/ I am wondering why a
a search for "job", job-manager, or "jobmanager" pulls up "WP Job Manager" but NOT Job Manager? Can some one please look into this and or explain to me the criteria used in the New Search that would cause job-manager to drop out ?? Very Concerned....

#52 @bfintal
8 years ago

Searching for the term "SEO" in the old directory shows mostly SEO related plugins, although Yoast SEO isn't in the first page. In the new directory, searching "SEO" shows Yoast SEO at the top, however it's closely followed by non-SEO plugins such as NextGEN Gallery, qTranslateX, The Events Calendar, JetPack, Ajax Load More, Business Directory Plugin. The next page shows more mixed in plugins like social sharing plugins, security plugins, translation plugins, and caching plugins.

New directory search for SEO: https://wordpress.org/plugins-wp/search/seo/
Current directory search for SEO: https://wordpress.org/plugins/search.php?q=seo

I think there is too much weight on the more established plugins as against the old one in this current iteration. In this specific scenario regarding SEO, it would be hard for users to discover other SEO related plugins, or new SEO plugins unless they search for the actual plugin name. This can set a high barrier of entry for new plugins.

#53 follow-up: @danieliser
8 years ago

@bfintal - I am thinking similarly though the problem may not be with the weighting per say, but with when the weighting is applied.

I have a feeling that the weighting for active install count is applied during the initial search. IMHO that is wrong.

Should be get all results relevant to the terms SEO weighted by their relevancy to the term (based on content, category, tags etc).

Once that is done then apply weighting for active installs, review count / avg rating etc to sort the results of the query.

I think if the first query takes AI into account at all the results will be wrong as they won't be relevant as much as higher weighted.

Hope that makes since.

#54 in reply to: ↑ 53 @bfintal
8 years ago

Replying to danieliser:

@bfintal - I am thinking similarly though the problem may not be with the weighting per say, but with when the weighting is applied.

Yup, weighing is needed and that may be a possible solution.

[It] Should be get all results relevant to the terms SEO weighted by their relevancy to the term (based on content, category, tags etc).

Once that is done then apply weighting for active installs, review count / avg rating etc to sort the results of the query.

I think if the first query takes AI into account at all the results will be wrong as they won't be relevant as much as higher weighted.

I agree.

Also, active Installs and review count / avg ratings are nice, but if those were the majority of the sorting factors then new entrants or upcoming plugins would have a hard time getting some visibility.

From what I know, "supports up to", and the number of resolved tickets are also used, maybe to weed out the stale plugins. But how about for the new ones? I think it would be good too if other factors were added in as well that would help newcomers with potential. I imagine new plugins would always be in the bottom of the list and would rarely rise a few positions up the search results since it would be really hard to catch up with the thousands of active installs and hundreds of ratings the established plugins already have.

I also think your suggestion on the ability to sort results would be really needed to compliment search. This may help other non-established plugins as well.

Sort By: Recently Updated, # of Reviews, Avg Rating, Oldest - Youngest.

#55 @Otto42
8 years ago

#1838 was marked as a duplicate.

This ticket was mentioned in Slack in #meta by obenland. View the logs.


8 years ago

#57 in reply to: ↑ 13 ; follow-up: @joyously
8 years ago

Replying to rglennnall:

my personal thoughts were initially that, as often as i return to plugins depository, it sure would be nice if i could mark certain plugins as ones i've already seen, and are satisfactory to me, or not, in some way. There's a Favorites HEART already, so i was just thinking something along those lines. "Seen It - Has Potential" "Seen It - Not Worth a 2nd Look", etc...

I agree with the sentiment of this (I do this a lot with WordPress and Google Play), however, even nicer is to have a way to be even more specific such as links to support tickets you opened about the problem or individualized Notes to Self if you are logged in. Remembering why you didn't choose it is important (broken and version #, client didn't like, wrong functionality).

#58 in reply to: ↑ description @joyously
8 years ago

Replying to tellyworth:

Some ideas for indicators of good quality results:

  • A search for a general term like "gallery" or "shopping cart" should favour plugins that have been updated relatively recently, are popular, and tested-up-to recent stable releases of WP.

I disagree that these criteria indicate good quality results because change does not equate to "better".

  • A search for a plugin name, or partial name, that is relatively unique (such as "akismet" or "super cache"), should show that plugin as the top result.

The title should have the most weight, and it shouldn't matter if it's relatively unique or not because a partial name search is often done intentionally.

  • A search for a partial plugin name that is somewhat generic (such as "contact form" or "wordpress seo") should show that plugin near the top result, provided it is recent and popular.

If I can't find an older plugin by partial name, how do I find it? Why does it have to be popular or recent if I know part of the name?

  • Plugins that are abandoned, have few active installs, or possible compatibility issues, should be ranked lower or not shown at all in search results.

I disagree. The search shouldn't decide what I want to see. How would a maintainer of the directory find those plugins if they need to if the search is rigged against them? What if those are the ones I want to find? I should be able to search and jump to the last page of results to get my answer.

The approximate active install count, modified date, and tested-up-to version, are already indexed, and should go a long way towards ranking results appropriately.

These should be displayed in results, but not used for relevancy ranking due to not knowing the searcher's actual intent.

#59 @joyously
8 years ago

I can't tell if anyone mentioned any connection between the new plugin taxonomies (whatever they end up being) and how they are used in the search. It seems like they should play a big part in improving the search and in being able to limit a search to a section of the whole by taxonomy.

#60 in reply to: ↑ 57 @rglennnall
8 years ago

Replying to joyously:

Replying to rglennnall:

my personal thoughts were initially that, as often as i return to plugins depository, it sure would be nice if i could mark certain plugins as ones i've already seen, and are satisfactory to me, or not, in some way. There's a Favorites HEART already, so i was just thinking something along those lines. "Seen It - Has Potential" "Seen It - Not Worth a 2nd Look", etc...

I agree with the sentiment of this (I do this a lot with WordPress and Google Play), however, even nicer is to have a way to be even more specific such as links to support tickets you opened about the problem or individualized Notes to Self if you are logged in. Remembering why you didn't choose it is important (broken and version #, client didn't like, wrong functionality).

yes, that's a great idea. There is a great AJAX plugin that does this very thing for your plugins in your own WP installation. I love it, since I have so many plugins, many similar in name, and have to stop and remember "what does this one do...???" Would work great in the plugins directory.

This ticket was mentioned in Slack in #meta by ocean90. View the logs.


8 years ago

#62 @tellyworth
8 years ago

  • Milestone changed from Plugin Directory v3 - M5 to Plugin Directory v3 - M9

#63 @tellyworth
8 years ago

In 4333:

Plugin directory search: fix linear function score and remove ngram fields as suggested by @gibrown

See #1692

#64 follow-up: @tellyworth
8 years ago

In 4439:

Plugin directory: assorted search tweaks.

See #1692

This ticket was mentioned in Slack in #meta by sareiodata. View the logs.


8 years ago

#66 in reply to: ↑ 64 @sareiodata
8 years ago

  • Keywords needs-testing added

Replying to tellyworth:

In 4439:

Plugin directory: assorted search tweaks.

See #1692

I don't know if this change is the reason (I didn't test the search prior to this), however I can't find a single plugin that has under 100.000 active installs in the search.

I might be exaggerating, however most search results are returning plugins with a lot of active installs that are not particularly relevant to the search query.

I can't find boost in the code for title, slug, etc.(perhaps I'm not looking where I should?). Only for active installs, plugin_modified, tested up to version, rating and support_threads_resolved. So wouldn't that list results that might not be relevant just because they have a lot of active installs?

#67 @tellyworth
8 years ago

In 4443:

Plugin directory: try to balance the search scoring a little better.

See #1692

#68 @nerrad
8 years ago

https://wordpress.org/plugins-wp/search/series/ returns no results. I would expect series related plugins.

It’s possible that’s related to the plural string issue (where searching for something like events returns no results but event does) but this particular issue likely will be tricky to fix because series is one of those terms in which the singular (serial?) is not an obvious permutation of plural (i.e. its not “serie”).

So if I use the term post series to try to get around the series results issue “https://wordpress.org/plugins-wp/search/post+series/” I get results, but nothing relevant to what I’m looking for.

I'm expecting to see results like: Organize Series (http://wordpress.org/plugins-wp/organize-series) or Series (https://wordpress.org/plugins-wp/series/).

Also, one thing I'm noticing that if the term matches exactly the name of a plugin, it often does not result in that plugin being the first result (which I would expect). For example, if I search for organize series, the first three matches not only don't even match the specific term but they also have nothing to do with organizing series. The plugin I am looking for is the fourth result.

This ticket was mentioned in Slack in #meta by ipstenu. View the logs.


8 years ago

#70 @lukecavanagh
8 years ago

If you search contact form, it takes to page three to be able to find Formidable Forms in the search results.

https://wordpress.org/plugins-wp/search/contact+form/page/3/

One of the tags is already contact form
https://wordpress.org/plugins-wp/tags/contact-form/

#71 @progmastery
8 years ago

Some search queries give strange order of results.

Example:
https://wordpress.org/plugins-wp/search/instagram/

Why "Custom Instagram Feed", "Enjoy Instagram" and "Lightweight Social Icons" have higher rank than "Instagram"?

#72 @tellyworth
8 years ago

In 4450:

Plugin directory: improve handling of multi word searches and plurals.

See #1692

#73 @gibrown
8 years ago

@progmastery thanks for the report fixed now. The index was missing the correct support threads resolved and rating info.

@tellyworth it looks like we had slightly stale data in the index. The synced data was correct though in the DB. I'm not sure if this was just due to us juggling indices or what, but we should keep an eye out for this. May indicate a bug in keeping the index up to date. I started a bulk reindex.

#74 @tellyworth
8 years ago

In 4451:

Plugin directory: remove old active_installs filter from search, it was incorrectly excluding a few plugins

See #1692

#75 @gibrown
8 years ago

@lukecavanagh for Formidable, it looks like we are weighting partial matches against the slug too heavily. Will work on it.

#76 @lukecavanagh
8 years ago

@gibrown

Thanks.

#77 @lukecavanagh
8 years ago

@gibrown

Having a similar issue with Ninja Forms using the search term contact form, on page 14 and I have still not found it.

https://wordpress.org/plugins-wp/search/contact+form/page/14/

One of the tags on the plugin is contact form.

https://wordpress.org/plugins-wp/ninja-forms/

#78 @gibrown
8 years ago

@nerrad i think the cases you mentioned are all fixed now.

On "serial" vs "series", I see what you're saying. It's a hard case. I'm not sure I agree that "series" and "serial" are completely interchangeable because the first is always a noun, and the second is predominantly used as an adjective. We're using pretty light stemming of plurals. I wrote a bunch more about why here: https://gibrown.com/2013/05/01/three-principles-for-multilingal-indexing-in-elasticsearch/

#79 @gibrown
8 years ago

@lukecavanagh ya that should be the same issue. In general the high number of installs, high review scores, and high number of resolved support threads should be boosting these much higher.

#80 @lukecavanagh
8 years ago

@gibrown

But Ninja Forms has 4.5 stars on reviews and over 700,000 active installs. It seems like it should be showing on the first page.

Seems like more search results is based on the actual plugin name, which is why Contact Form 7 related plugins are mostly showing on the results for contact form.

#81 @gibrown
8 years ago

Yep, I agree. Its broken. Needs to be worked on.

Unfortunately it may take some iterations. Its a tricky balance against trying to do exact/partial matches against the slug and title of the plugin.

#82 @tellyworth
8 years ago

In 4452:

Plugin directory search: reduce slug_ngram weighting because it's already high.

See #1692

#83 @tellyworth
8 years ago

  • Milestone changed from Plugin Directory v3 - M9 to Plugin Directory v3.0

#84 @nerrad
8 years ago

@gibrown thanks for the update. So I've confirmed.

  • plurals return results
  • series is returning some relevant results now.
  • Seems to be better matching for exact plugin name matches.

However, still seeing some odd results with some queries:

If I search the term event the first page of results is relevant. However beginning on the second page some more irrelevant results appear to be creeping into the result. There are a number of results that appear to have nothing to do with event management that are ahead of plugins that do (and even have Event or Events in the name of the plugin. For example Tickera, Events Maker by dFactory, Event Espresso 4 Decaf - Event Registration Event Ticketing, Import Eventbrite Events are all on the fourth page of results and below plugins like: All in One WP Security & Firewall, Collapse-O-Matic, DuracellTomi's Google Tag Manager for WordPress, Google Analytics Dashboard for WP, Max Mega Menu, and others.

So it appears there still isn't enough weighting given to title/slug for the plugin - it's better, but there's still unexpected results coming ahead of others.

I know search is hard and I appreciate the work that's been done to improve things, seems like we're getting closer to a great search tool :)

Last edited 8 years ago by nerrad (previous) (diff)

#85 @nerrad
8 years ago

Regarding this comment:

high number of resolved support threads should be boosting these much higher.

Is this a good thing to use in weighting results? There's a couple potential issues I see with this:

  1. Plugins that "just work" for people are likely not going to have many support thread posts.
  2. Plugins that are active on more sites are going to generally have more support threads than plugins active on less sites. So there's a multiplier affect here that doesn't necessarily equate to better quality or more relevance when someone is looking for plugins matching their term.

However, if this weighting was done based on a ratio of resolved to total support threads, that could be helpful because that would mean that a plugin that has 10 support threads and resolved 8 of them is given the same equivalent weighting for that metric as a plugin that has more footprint with 100 support threads that has resolved 90 of them.

#86 follow-up: @lukecavanagh
8 years ago

@gibrown

How much weight is given in the search for active installs, what would happen for new plugins on the repo and users being able to discover those?

#87 in reply to: ↑ 86 @nerrad
8 years ago

Replying to lukecavanagh:

@gibrown

How much weight is given in the search for active installs, what would happen for new plugins on the repo and users being able to discover those?

To build on this question, if the weighting is based on total active installs, this is another area where a ratio based rating might work better. In other words, if you built that weighting against a ratio of active_installs/days_in_repository then that might give a "trending" weighting affect to where the plugin shows in the results. That way plugins that are newer, could surface earlier in the results until they start aging with no traction. Conversely more established plugins with more total active installs due to how long they've been in the directory would have that weighting have less and less overall impact if they continue to age with no significant increase in active installs.

It gives a chance for newer plugins to surface in results and be "seen" while still attributing value to whether users are finding those results useful, trying the plugin, and then keeping the plugin active on their sites.

#88 @lukecavanagh
8 years ago

@nerrad

Okay good to hear.

#89 @nerrad
8 years ago

@lukecavanagh just want to make it clear that I'm not on the team making the changes, I was just making a suggestion building off of your question for the team making the changes.

#90 @lukecavanagh
8 years ago

@nerrad

I agree, there does need to be some logic so that new plugins with no active installs have a chance of being discovered through search on the WP repo.

#91 @gibrown
8 years ago

@nerrad thanks for testing and giving feedback. That's really helpful.

"event" search term

Ya, thanks for pointing this one out. Similar to "backup" this is tricky because it is a common word. We've played with weights a lot in an ad hoc manner already, so I get worried that more ad hoc tuning will just break something else. I'm planning to work on testing some large samples of search terms as mentioned a bit in this post. This may end up happening after the new search launches because there seems to be good agreement that the new search is still much better.

high number of resolved support threads - Is this a good thing to use

Right now we are only using the count of resolved threads, and I agree with some of your points. The new index we switched to earlier this week also includes a percent resolved field which I'm going to experiment with adding to the query. In general, I think we should include this signal in search for two reasons:

  1. It is a pretty good signal that the plugin author cares about supporting users. If the plugin doesn't need much support it is likely that the other plugins "competing" on a particular search term also don't need much support.
  2. To the extent plugin author's want to improve their search rankings, let's use signals that encourage them to improve the user experience. Other signals in the index along these lines (not yet deployed):
    • total active installs for the contributors to a plugin: encourage more plugins; for authors of popular plugins, rank them higher (under the assumption they are probably also well written)
    • number of months during which the plugin was updated : encourage longevity, and regular updates
    • percentage of one star ratings responded to: encourage taking feedback seriously, addressing problems
    • number of translations: wordpress is global, plugins should not just be in English, encourage translations

Also, unlike active installs, these are things that are more under the plugin authors control. I think this also tries to address @lukecavanagh's concerns about new plugins and active installs. By adding more signals, we should be able to provide other opportunities for new plugins to rank better.

That said, I think active installs is a very very good signal. A 100,000 says that there are at least tens of thousands of individual users who have both installed the plugin and for whom it is working. Given the 100k search queries a day, a very small percentage are developers, and so IMO we should not be suggesting brand new plugins that have no history to them. In cases where we know something about the plugin author we could try to do some boosting.

WP has a very large and successful ecosystem. It is one of our strengths. We should help users take advantage of it. I'm reminded of the yearly feedback in the WP user survey where "plugins" are highlighted as both the good part of WP and also the most frustrating (see https://ma.tt/2014/10/sotw-2014/ slide 22). I'd say the current state of plugin search is a big part of why plugins are so frustrating to users. They search for something, it's not clear what would be a good choice, and then what they choose is not actually well supported and breaks six months later.

Anyway... that's my reasoning for some of the ideas I'm pursuing... these are important things to discuss, so thanks for bringing them up.

#92 @nerrad
8 years ago

Thanks for engaging in discussion on this @gibrown - I know ultimately the teams goal is to improve the experience for users and that's awesome.

I'd say the current state of plugin search is a big part of why plugins are so frustrating to users.

I disagree with that statement. At the time of Matt's SotW address which you linked to, I would have agreed. However I actually believe the current search form at wp.org/plugins is very useful to users (admittedly without having any real-world data to back that up). The changes that were made there helped things significantly both in presentation of results and the ability to choose which criteria to search on. Personally I found those changes significant when looking for plugins (both in my WordPress backend dash and on WordPress.org). Also, the fact that there was some parity between what I saw in my WordPress admin and the WordPress.org search was useful to me.

The concern I'm communicating in this case is that the gains obtained from the changes to search before the most recent work are being overlooked/lost here.

Last edited 8 years ago by nerrad (previous) (diff)

This ticket was mentioned in Slack in #meta by nerrad. View the logs.


8 years ago

#94 @sethshoultes
8 years ago

@gibrown

I think active installs is a very very good signal. A 100,000 says that there are at least tens of thousands of individual users who have both installed the plugin and for whom it is working. Given the 100k search queries a day, a very small percentage are developers, and so IMO we should not be suggesting brand new plugins that have no history to them. In cases where we know something about the plugin author we could try to do some boosting.

Not trying to sound negative here, but to me this sounds like your opening the doors wide open to favoritism and politicization of the search results. I'm wary of this turning into some type of pay-to-play search system.

Edit: I think it's great that you're taking feedback and working with plugin developers on this. Thank you! :)

Last edited 8 years ago by sethshoultes (previous) (diff)

#95 follow-up: @gibrown
8 years ago

@nerrad thanks for continuing this conversation. Its likely our fundamental disagreement here is between what a developer sees when searching vs what someone who is non technical would see. An old WP argument, just wanted to acknowledge it. It's a hard balance. I'm sure we'll have to continue iterating on all of this.

The fundamental way I approach search is that when a user takes any time to put something into a search box, we should be trying to infer and answer their question as best we can. We're providing answers, and the first answer we give matters a lot.

So as an example let's compare "stats". It is the 13th most common search on .org, at about 93k times a year.

https://wordpress.org/plugins/search.php?type=term&q=stats

https://wordpress.org/plugins-wp/search/stats/

There are really only two options in the world that can work for a hundred thousand sites a year: Google Analytics and Jetpack. Neither of those show up on the old search page. Yes, there are also some other players that are great - especially in certain niches. But scaling stats is really hard and pretty expensive which is why there are not many. I can go and look at almost any common search term on .org, and I see this exact same pattern. Its not just how do we satisfy this current user, but how do we satisfy 100k users. What plugins can support that volume of users and give them a great WordPress experience?

So yes, if the user does multiple searches and narrows down their results, then they can find those better plugins, but most users are trained by Google. They will see that first page of results and think that is all there is. Most users never get off the first search results page. You can see this in the user testing: https://make.wordpress.org/meta/2016/11/08/plugin-directory-user-testing-round-1/ They don't refine their searches, they go after the first few results and then they decide.

So from my perspective the current search is not at all answering the question of "what plugin would serve me best for tracking stats on my website?". Everyone seems to agree that the new search is significantly better. Personally, I still think it is pretty bad and has lots of room for improvements, but its a good start.

We're working on adding more stats tracking to the search pages also so we can get more hard data. I agree this is a tough conversation to have without more data about what users click on.

#96 follow-up: @gibrown
8 years ago

@sethshoultes

opening the doors wide open to favoritism and politicization of the search results

I certainly understand the concern, and that is a big part of the reason I am going into such depth on this ticket with my thinking. The search query is all out in the open for folks to see it (which was a big discussion because we are worried about people trying to game it).

pay-to-play search system

Do you mean the count of contributors active installs in particular? I agree with your concern there. That signal should probably not be weighted super heavily, and I haven't experimented with it enough to decide if it is a good idea or not. What I am trying to address here is the long tail of niche plugins (and niche searches they are serving). If there are a bunch of plugins that match but are only installed on a few 100 or 1000 sites, how should we separate them out? Pointing users at plugin authors who have been active in the community for years and have other successful plugins is a potential way to improve that.

Maybe that is a bit of favoritism, but I often have heard that developers will select plugins from people in the community that they know write good plugins. I've done this myself. I'd like to find a signal that helps a new WP user get that benefit even if they don't know the ecosystem.

#97 in reply to: ↑ 96 @sethshoultes
8 years ago

Replying to gibrown:

Thanks for clarifying. I haven't been able to follow the entire discussion here (or any other related discussions), yet. Those last statements caught my eye.

Have you considered adding interactive filters for including/excluding certain terms, tags, titles, published dates, number of installs, etc?

Last edited 8 years ago by sethshoultes (previous) (diff)

#98 @gibrown
8 years ago

Have you considered adding interactive filters for including/excluding certain terms, tags, titles, published dates, number of installs, etc?

It's been brought up a lot. I think most users would have no idea how to use any of those filters and its more important to work on the core query relevancy. When I've looked at stats for those types of features they are used very very rarely.

If we're making UX improvements to the search my top list would be:

  • search as you type
  • error correction and auto correction
  • suggested searches as you type

#99 follow-up: @dd32
8 years ago

@aleksanderkuczek is at WCUS contributor day looking at a few search items.

I spotted this case; User Switching is showing up on https://wordpress.org/plugins-wp/search/caching/page/2/ but I can't see any field which should be matching the caching term.

#100 in reply to: ↑ 99 @aleksanderkuczek
8 years ago

Here's the list of plugins that show on certain search results (on first or second page) that I think are not the right results:

Search term: "gallery"
strange plugin: https://wordpress.org/plugins-wp/slideshow-jquery-image-gallery/
reason: Irrelevant to search term. Plugin is not a gallery. Term "gallery" is only in the slug and once in the content. Additionally 1000+ reviews and very high ranking.

Search term: "slider"
strange plugin: https://wordpress.org/plugins-wp/gallery-video/
reason: Irrelevant to search term. It's a gallery, not a slider plugin. May be caused by 15+ occurrences of term "slider" in content

Search term: "caching"
strange plugin: https://wordpress.org/plugins-wp/wordfence/
strange plugin: https://wordpress.org/plugins-wp/mailchimp-for-wp/
strange plugin: https://wordpress.org/plugins-wp/loco-translate/
strange plugin: https://wordpress.org/plugins-wp/advanced-access-manager/
strange plugin: https://wordpress.org/plugins-wp/child-theme-configurator/
strange plugin: https://wordpress.org/plugins-wp/ad-inserter/
reason: Irrelevant to search term

Search term: "form"
strange plugin: https://wordpress.org/plugins-wp/events-manager/
reason: Irrelevant to search term.

Search term: "youtube"
strange plugin: https://wordpress.org/plugins-wp/easy-fancybox/
strange plugin: https://wordpress.org/plugins-wp/slider-image/
strange plugin: https://wordpress.org/plugins-wp/jetpack/
reason: Irrelevant to search term

Search term: "sitemap"
strange plugin: https://wordpress.org/plugins-wp/ecwid-shopping-cart/
strange plugin: https://wordpress.org/plugins-wp/polylang/
strange plugin: https://wordpress.org/plugins-wp/email-subscribers/
reason: Irrelevant to search term

Replying to dd32:

@aleksanderkuczek is at WCUS contributor day looking at a few search items.

I spotted this case; User Switching is showing up on https://wordpress.org/plugins-wp/search/caching/page/2/ but I can't see any field which should be matching the caching term.

#101 @sareiodata
8 years ago

As far as I can tell the new categories (not tags) are taken into account, however, not all plugins have them added?

$es_wp_query_args['query_fields'] = array( 'title_en^4', 'content_en', 'excerpt_en', 'author', 'tag', 'category', 'slug_ngram^0.25', 'slug', 'contributors' );

Right now I can't add them here https://wordpress.org/plugins-wp/profile-builder/admin/ for example.

Perhaps that's one of the reasons behind the weird search results for some queries.

#102 follow-up: @lukecavanagh
8 years ago

@aleksanderkuczek

https://wordpress.org/plugins-wp/wordfence/

Shows up on caching search term since it did have Falcon Engine in core.

#103 in reply to: ↑ 95 @joyously
8 years ago

Replying to gibrown:

The fundamental way I approach search is that when a user takes any time to put something into a search box, we should be trying to infer and answer their question as best we can. We're providing answers, and the first answer we give matters a lot.
...
So from my perspective the current search is not at all answering the question of "what plugin would serve me best for tracking stats on my website?".

I think the big assumption that we make is the turning of a one word query into a complete sentence. The search is assuming the sentence that you wrote when that is not necessarily the sentence that the user intended. Google worked for years to get their algorithm perfected. I doubt this search will equal that.

I would rather err on the side of more text matches than on the side of any other weighting factors, because all the user has is text to enter. That is, if you don't give them the interactive controls to weight the factors his way. Would it be so hard to show those controls, set to the defaults? Then at least the user could go after what they really want, if they choose to. If not, why doesn't the Browse page work to scroll through everything?

#104 in reply to: ↑ 102 @aleksanderkuczek
8 years ago

Okay so that may be the bad example then.

Replying to lukecavanagh:

@aleksanderkuczek

https://wordpress.org/plugins-wp/wordfence/

Shows up on caching search term since it did have Falcon Engine in core.

#105 @joyously
8 years ago

I just tried the search "responsive slider" on both new and old. The new search has the first 6 results that are not good matches. The old search has all good matches.

#106 @awcode
8 years ago

New plugin directory search seems to be matching any words that shares a few letters regardless of relevance.

eg. https://wordpress.org/plugins-wp/search/transport/
Only 1 of first 12 is relevant, 5 are to do with translate which shares first 5 letters but completely different topic.

eg. https://wordpress.org/plugins-wp/search/contextual/
Some relevant but lots of completely irrelevant plugins with the word content or contact in, again matching the first 4 letters rather than true meaning of search word.

Old directory was much more effective on these and all other searches I have tried so far.

#107 @Ipstenu
8 years ago

#2318 was marked as a duplicate.

#108 @Ipstenu
8 years ago

#2324 was marked as a duplicate.

#109 @tellyworth
8 years ago

In 4517:

Plugin directory search: rebalance search weightings a bit.

In particular, drastically lower the effect of partial slug matches, and limit the strength of other signals. This is an incremental improvement that still needs further work.

See #1692.

#110 @tellyworth
8 years ago

In 4519:

Plugin directory search: title needs a little more weight for some exact matches.

See #1692

This ticket was mentioned in Slack in #meta by gibrown. View the logs.


8 years ago

#112 @jdailey
8 years ago

Searching "Image Compression" and "Image Optimization" seem to pull up a few less relevant plugins and drop plugins like Smush and EWWW 2 to 3 pages back. Example: Speed Booster Pack seems a bit less relevant based on Issues resolved, active installs, Tested with, and ratings but is ranking much higher than Smush and EWWW. Optima Express + MarketBoost IDX Plugin, Ultimate WordPress Auction Plugin...Great plugins just don't seem to fit the criteria.

Full disclosure: I am involved in Smush plugin support and copy but I saw the report that some search results are not responding the way they should so I thought I would share my experience. :) Thanks for all your work on this!

#113 follow-up: @dd32
8 years ago

IMHO the tested version is still being relied upon way too much.

Take for example a search for Debug Bar Console, I'd expect that exact plugin to show up, unfortunately though it has a tested-up-to of 3.4.2, which results it being pushed to result 55 of 56 matching plugins.
This actually makes me want to bring back the compatibility widget.. or add per-plugin wp-usage stats so that search can say "Oh this must work, lots of people use it on recent versions" - both of those seem like hacks around search though.

#114 follow-up: @Ipstenu
8 years ago

I disagree.

Tested up to is one of the few ways we have to show end users, who don't understand the ecosystem, that developers are paying attention and making sure their code works on new versions of WP. If 'punishing' them with lousy search results forces them to test and update, I think that benefits the users.

And of course if they bump and lie and didn't test and it breaks, they'll get a lot of 1 star reviews.

That said, I would think if tested up to is .1 above or below the current version, it should be okay. Like if I tested on 4.8 beta and tagged my plugin, that should be okay while 4.7 is current.

#115 in reply to: ↑ 114 @rglennnall
8 years ago

Replying to Ipstenu:

I disagree.

Tested up to is one of the few ways we have to show end users, who don't understand the ecosystem, that developers are paying attention and making sure their code works on new versions of WP. If 'punishing' them with lousy search results forces them to test and update, I think that benefits the users.

And of course if they bump and lie and didn't test and it breaks, they'll get a lot of 1 star reviews.

That said, I would think if tested up to is .1 above or below the current version, it should be okay. Like if I tested on 4.8 beta and tagged my plugin, that should be okay while 4.7 is current.

MAN, no DOUBT. There's little more infuriating than coming across a plugin that hasn't been tested ---> IN NINE YEARS! or SIX years. came across these last couple of days.

Why would they possibly still be in there, Matt???

I'm happy to try one that goes back to say, v. 3.* - some still work because of their simplicity; some of my favorite plugins are old ones. BUT it's nice to know what we're dealing with beforehand. Tested up to, or something equivalent, is pretty damn necessary in my opinion.

as far as the thousands of old, obsolete, useless, space-taking, hair-pulling plugins - this should be a consideration of we "professional plugin searchers," as well.

Good GOD, man! :)

Last edited 8 years ago by rglennnall (previous) (diff)

#116 in reply to: ↑ 113 @rglennnall
8 years ago

Replying to dd32:

per-plugin wp-usage stats...

I like this idea. I can see your point that certain factors, like tested up to..., on their own, are simply not sufficient. It clearly requires not only many factors but an intricate balance of them all.

These great programmers working on this are to be admired for both their stamina and their clear lack of any social life.

#117 follow-up: @Ipstenu
8 years ago

Your sarcasm came on a bit strong there, @rglennnall - Can you please scale it back a bit?

We're not the normal users. We know how things work (like the perfectly valid reasons not to update code). Which is why I stress we do need to downgrade. Usage doesn't actually indicate all that much either. I know a couple well used, by a LOT of people, plugins that aren't upgraded often (if at all) and can be hit or miss because they stop being compatible with other plugins and themes.

Basically, the tested-up-to is the only metric I can think of that indicates the developer is around and testing in a reasonable timeframe.

If there's a more reliable one, I'd love to hear it.

#118 in reply to: ↑ 117 @rglennnall
8 years ago

Replying to Ipstenu:

Your sarcasm came on a bit strong there, @rglennnall - Can you please scale it back a bit?

We're not the normal users. We know how things work (like the perfectly valid reasons not to update code). Which is why I stress we do need to downgrade. Usage doesn't actually indicate all that much either. I know a couple well used, by a LOT of people, plugins that aren't upgraded often (if at all) and can be hit or miss because they stop being compatible with other plugins and themes.

Basically, the tested-up-to is the only metric I can think of that indicates the developer is around and testing in a reasonable timeframe.

If there's a more reliable one, I'd love to hear it.

Warn't a sarcastic word in my reply, Mika. Meant every word of it. I was agreeing with you.

I may have been a little melodramatic in my description of NINE year old plugins, but they really are frustrating to no end to encounter when one is actually seeking to solve an important need.

as well, I complimented you hard-working guys in my reply to dd32.

I totally realize you guys have your hands full. major balancing act making this search work properly.

Ya'll have my respect. I was just agreeing with you. that's why I closed my comment with a smiley face.

sorry you misunderstood.

#119 @joyously
8 years ago

I still wonder if all these criteria are used in the actual search or if they are only used in the ranking of the search results (with the search being on the text entered, whether that is title or description or a taxonomy).
I think the two methods would yield different results, and if the ranking is done after the search, it would be easier to allow the user to manipulate it using a sort or other filter.

#120 @garthkoyle
8 years ago

From what I understand replies to support topics are used to weight the quality of search results, but what about replies to reviews?

#121 @tellyworth
8 years ago

In 4599:

Plugin directory search: try using decay instead of a hard cutoff for last-updated and tested-to values.

Also adjust weightings a bit more.

See #1692

#122 @progmastery
8 years ago

Please search for "gallery" https://wordpress.org/plugins-wp/search/gallery/. Why gallery plugin named "gallery" with 300k installs (second most popular after NextGen) is shown so much lower down?

Last edited 8 years ago by progmastery (previous) (diff)

#123 @Ipstenu
8 years ago

While the display name is "Gallery" the URL is photo-gallery and the readme really isn't very good.

Keep in mind, single word searches are insanely hard to get 'right' for all interpretations of right. Not to mention that's usually not what people search for. Even when they look for 'gallery' they're looking for 'gallery slider' or something more specific. Users use multiple words, or even sentences.

#124 @gibrown
8 years ago

@progmastery thanks for pointing that out. There are some syncing problems between .org and the Elasticsearch index that we are still hunting down and this was one of them. I just fixed that case which happens to have bumped it up to the top of results, but there are still a lot of other very popular gallery plugins that are not on that page, so rankings will probably change some more.

#125 follow-up: @gibrown
8 years ago

@awcode what were expecting as a result for "transport"?

The only result I can guess is https://wordpress.org/plugins/seo-data-transporter/. That is not ranking well because it hasn't been updated in over two years and the last version of WP it was tested on is 5 versions behind.

For "contextual" I assume you are looking for https://wordpress.org/plugins-wp/contextual-related-posts/ which is showing up. Anything else? The other results with "contextual" in the title on the old search page all get ranked lower because they are so far behind or haven't been updated in a long time.

#126 follow-up: @gibrown
8 years ago

@joyously just circling back to your search for "responsive slider" I think the changes @tellyworth has made has improved that a fair bit. May be a few more improvements coming as well, but are there other results you think are missing in this case?

#127 in reply to: ↑ 125 ; follow-up: @awcode
8 years ago

For "Transport" a perfect example of a good set of results would be https://wordpress.org/plugins/search.php?q=transport
Doesn't matter whether my meaning is the transport industry or data transport, both come up.

Right now I get several email plugins, redis cache, permalink manager, pdf embedder, not a single thing with any relevance to the typed word on the first page. Actually got worse since I first posted.

First relevant plugin is updated recently, latest tested by but only 70+ installs. However since this would be considered a niche phrase a relevant 70+ installs is more suitable than an irrelevant plugin with millions of installs.

Contextual was just another example of partial word matches, can now see the better content matches are lagging on updates so more understandable.

Replying to gibrown:

@awcode what were expecting as a result for "transport"?

The only result I can guess is https://wordpress.org/plugins/seo-data-transporter/. That is not ranking well because it hasn't been updated in over two years and the last version of WP it was tested on is 5 versions behind.

For "contextual" I assume you are looking for https://wordpress.org/plugins-wp/contextual-related-posts/ which is showing up. Anything else? The other results with "contextual" in the title on the old search page all get ranked lower because they are so far behind or haven't been updated in a long time.

#128 in reply to: ↑ 126 @joyously
8 years ago

Replying to gibrown:

@joyously just circling back to your search for "responsive slider" I think the changes @tellyworth has made has improved that a fair bit. May be a few more improvements coming as well, but are there other results you think are missing in this case?

I think it has improved, however, the old search is still better because I see 30 results that are responsive image sliders on the first page. The new search shows only 14 on the first page and some of those aren't image sliders (testimonials, instagram, logos, thumbnails). Why don't I see some of those 30 instead? And page 2 should have the rest of those 30, but again it has what I would call sketchy matches.
What is used for ordering -- it doesn't make much sense.

#129 @progmastery
8 years ago

Thanks @gibrown. Do you fix sync errors case by case?

Perhaps these are another cases: "form" https://wordpress.org/plugins-wp/search/form/ or "forms" https://wordpress.org/plugins-wp/search/forms/.

A number of popular (>200k) form plugins are missing from the results, e.g. "Fast Secure Contact Form" and "Formidable Forms".

A search for "google map" https://wordpress.org/plugins-wp/search/google+map/. It is difficult to understand why "Google Map" and "WP Google Map Plugin" are lower than "WP Store Locator". "WordPress Google Maps Plugin" with 40k+ installs is missing from first and second pages of search results, also it is missing from https://wordpress.org/plugins-wp/search/google+maps/ search. Also perhaps there is an issue with weighting precise search terms in multi-word query, because it is difficult to explain why "Google Analytics Dashboard for WP" has so high position in the results. Please take a look at its readme file https://plugins.svn.wordpress.org/google-analytics-dashboard-for-wp/tags/4.9.5/readme.txt.

p.s. I could not find elastic search index code, or the one responsible for syncing between .org main and secondary databases. Is it publicly available?

#130 in reply to: ↑ 127 ; follow-up: @gibrown
8 years ago

@awcode I'm not sure I would agree that none of the new results are relevant since at least a few of them are also on the old search page. For "data transport" the new results look ok-ish. It is a very niche and ambiguous term, but that is what makes it interesting to look at. The "data transport" plugins are kinda swamping out other results.

I'm having a hard time figuring out what a good result would be though. What is your intention when using that search term? Is it public transportation? BTW, I think there is also a British vs American English aspect here as well. "transport" is not how I (as an American) would search for transportation plugins, but again that makes it interesting because it means that the language on the plugins is a bit different. "transportation" doesn't really do very well on the new search either...

https://wordpress.org/plugins/transportersio/ (the only one that is really up to date and we should be )
https://wordpress.org/plugins/transport-and-business-locator/
https://wordpress.org/plugins/transizroutes/
https://wordpress.org/plugins/transitart/

Seem like the best results, but all are getting pretty out of date. Kinda feels like an area where there aren't any good solutions.

I do agree with you that the search is matching partial words too much. Planning to experiment more with that today to see if I can come up with some better solutions. Will play with this example some more.

Thanks for the feedback.

#131 @gibrown
8 years ago

I think it has improved, however, the old search is still better because I see 30 results that are responsive image sliders on the first page. The new search shows only 14 on the first page and some of those aren't image sliders (testimonials, instagram, logos, thumbnails). Why don't I see some of those 30 instead? And page 2 should have the rest of those 30, but again it has what I would call sketchy matches.

@joyously all of the results on the new search are sliders that are responsive. Some of the lower down ones happen to also be for different use cases (logos, testimonials, instagram). The top 6 results are all very recently updated and collectively are installed on over 180k sites. I actually think the results are quite good. It points a user that is looking for a responsive slider towards plugins that are already successfully supporting many users and getting good reviews. The old search points the user at results that are quite obviously less well supported (haven't been updated in years, not tested on recent versions, bad reviews).

I don't agree that 30 results is better than 14. Based on lots of data most users don't look beyond the first few results. Here's a quick link I found that also mentions some other studies: https://moz.com/blog/google-organic-click-through-rates-in-2014

Less than 5% of users go to the second page of results. Personally I'm trying to focus a lot on the top 4 results when evaluating these results.

The only case for "responsive slider" that isn't showing up that I don't understand is this plugin: https://wordpress.org/plugins-wp/slider-image/

It has 100k installs and lots of updates. Going to look into this case a bit more.

Thanks for the continued testing. It's really helpful to me to look at individual cases and examine them.

#132 @joyously
8 years ago

I don't agree that 30 results is better than 14. Based on lots of data most users don't look beyond the first few results.

My point wasn't that 30 results is better than 14, although I happen to prefer more choices with fewer page loads. My point was that there were 30 that easily matched my search, whereas out of the 14 there were only some that matched. It seems that those 30 are the better matches, rather than the less than 14.

I tried an experiment. I entered 4 random words to see what I got. Old search for cast rotten easter block returned zero results, which makes sense. New search returned 1 result (MasterBlogster Scroll Top and Bottom). I changed it to rotten easter block and old search gave zero results while new search returned 2 pages of results. Is that supposed to be an improvement?

#133 @dd32
8 years ago

I tried an experiment. I entered 4 random words to see what I got.

Searching random words and being surprised that results show up isn't a good way to test search.
Searching for junk is going to give weird results, almost as weird as the input you give it, but it'll do it's best to find something that you actually meant (which It assumes wasn't the junk it got).

For example, compare https://wordpress.org/plugins/search.php?q=taxomomy to https://wordpress.org/plugins-wp/search/taxomomy. The first contains items which only contain that term, the newer search shows the items that a user would expect to see.

At the end of the day, search is a lot of weighing one up against another, a bad search term returning worse results than a good search term is OK, bettering the results for a bad search at the detriment of good searches isn't; weighing those up against one another is the battle that's being fought.

#134 @tellyworth
8 years ago

In 4627:

Plugin directory search: adjust the active_installs weighting curve (props @gibrown).

See #1692

#135 @joyously
8 years ago

Searching random words and being surprised that results show up isn't a good way to test search.
Searching for junk is going to give weird results, almost as weird as the input you give it, but it'll do it's best to find something that you actually meant (which It assumes wasn't the junk it got).

Really, it's a good way to test what the search is matching. I think the correct result is what the old search gave (no results), not the junk that the new search assumes matches due to stemming and whatever else it's doing. A valid result is zero matches, so why didn't the new search show me that? I think there is too much weighting going on in the new search.

#136 in reply to: ↑ 130 @awcode
8 years ago

Maybe a few are relevant but bulk aren't.

Don't see any relevance at all in # 2 result "The ultimate toolkit for theme developers using the WordPress Customizer"
Seems a few mentions of the word in the changelog, thats it, nothing in title, description or tags that is even remotely connected that I can see.

Backup/migration/email plugins I can kind of see the connection, although the only one that someone may actually be intending with a phrase like this would be data migration, eg "transport my data". Interestingly modifying to this phrase actually removes/demotes these relevant ones.

Redis object cache - completely irrelevant except for one mention of "transport layer security" and lots of installs.

Is it possible to group results before ordering?
eg. results with an 90-100% relevancy are at the top and then popularity sorting applies to that batch.
Then 70-90% relevancy below sorted internally by popularity.
Then these anomalies would be in the 40-70% relevancy section and buried on page 4 rather than in top 5 just because they have lots more installs than the more relevant matches.

On a more specific phrase try https://wordpress.org/plugins-wp/search/transport+booking/
# 2 - Responsive coming soon!
# 3 - Responsive pricing table!

# 1 is semi related to this phrase, only other one is Transporters.io right at bottom of this page.
Disclaimer: I am associated with this plugin and can confirm that people are actively searching for transportation related plugins using searches like this on a daily basis. Not yet had anyone mistakenly install it when they were really looking for an email/backup/redis cache tool instead.

Replying to gibrown:

@awcode I'm not sure I would agree that none of the new results are relevant since at least a few of them are also on the old search page. For "data transport" the new results look ok-ish. It is a very niche and ambiguous term, but that is what makes it interesting to look at. The "data transport" plugins are kinda swamping out other results.

I'm having a hard time figuring out what a good result would be though. What is your intention when using that search term? Is it public transportation? BTW, I think there is also a British vs American English aspect here as well. "transport" is not how I (as an American) would search for transportation plugins, but again that makes it interesting because it means that the language on the plugins is a bit different. "transportation" doesn't really do very well on the new search either...

https://wordpress.org/plugins/transportersio/ (the only one that is really up to date and we should be )
https://wordpress.org/plugins/transport-and-business-locator/
https://wordpress.org/plugins/transizroutes/
https://wordpress.org/plugins/transitart/

Seem like the best results, but all are getting pretty out of date. Kinda feels like an area where there aren't any good solutions.

I do agree with you that the search is matching partial words too much. Planning to experiment more with that today to see if I can come up with some better solutions. Will play with this example some more.

Thanks for the feedback.

#137 follow-ups: @joeguilmette
8 years ago

Disclosure: I work on WP All Import

Interestingly, WP All Import is nowhere to be found on page 1 results when searching for 'import'. The first result is a 1 star'd plugin with 400 actives, and the second is a third party add-on for WP All Import. There are plugins without any active installs that haven't been updated in two years on page 1.

Anyway, I don't want this come off as me complaining that our plugin isn't doing better in search results. But since I work on it I pay more attention to its search results than other plugins. Hoping that this is just another data point you fine folks can use to help improve search results for users.

#138 in reply to: ↑ 137 @sethshoultes
8 years ago

Replying to joeguilmette:

Disclosure: I work on WP All Import

Interestingly, WP All Import is nowhere to be found on page 1 results when searching for 'import'. The first result is a 1 star'd plugin with 400 actives, and the second is a third party add-on for WP All Import. There are plugins without any active installs that haven't been updated in two years on page 1.

Anyway, I don't want this come off as me complaining that our plugin isn't doing better in search results. But since I work on it I pay more attention to its search results than other plugins. Hoping that this is just another data point you fine folks can use to help improve search results for users.

I work for Event Espresso, an event registration plugin. I've seen similar results with the search term "event registration". Here are some examples:

Old search: https://wordpress.org/plugins/search.php?type=term&q=event+registration
New search: https://wordpress.org/plugins-wp/search/event+registration/

As you can see in the old search, it shows plugins with the correct terms, in both the title and description of the plugin. However, in the new search, plugins with names like event planning and event booking are ranking higher than plugins with the term "event registration" in the name. Also, four of the results are for contact forms.

Now if you search for event calendar:
Old search: https://wordpress.org/plugins/search.php?type=term&q=event+calendar
New search: https://wordpress.org/plugins-wp/search/event+calendar/

As you can see in the old search the results are highly relevant to the search, while in the new search you have a mixed bag, where four of the plugins have no relevance in the short description or the title. Also, if you click to page two of the "event calendar" results, it gets even worse. Not even half the results have "event calendar" in the name or the title: https://wordpress.org/plugins-wp/search/events+calendar/page/2/

Here are some other things that need improvement in the new search:

  • Show the number of total search results and the number of results listed on the page.
  • Please, please, show more than 14 results. I agree with the others that 30 results are better than 14. The current results feel very limited, especially with so many unexpected results showing.

On a side note, after watching this thread for a while. I think this recent blog post by Seth Godin is relevant to the overall discussion: http://sethgodin.typepad.com/seths_blog/2017/01/but-where-did-the-algorithm-come-from.html

Not saying you are hiding things or blaming an algorithm, just wanted to put that article at the top of your mind.

Edit (to add more searches):

Front-end Registration:
Old: https://wordpress.org/plugins/search.php?type=term&q=front-end+registration
New: https://wordpress.org/plugins-wp/search/front-end+registration/

Registration:
Old: https://wordpress.org/plugins/search.php?type=term&q=registration
New: https://wordpress.org/plugins-wp/search/registration/

Donation:
Old: https://wordpress.org/plugins/search.php?type=term&q=donation
New: https://wordpress.org/plugins-wp/search/donation/

Last edited 8 years ago by sethshoultes (previous) (diff)

#139 follow-up: @megamenu
8 years ago

My 2p: the new search results seem a lot more relevant to me.

The old search is very easy to game, as a lot (too much?) weighting is put on the title. This has led to people renaming their plugins to single keywords in order to easily boost their ranking, "slider" is a great example:

Old Search: https://wordpress.org/plugins/search.php?type=term&q=slider
New Search: https://wordpress.org/plugins-wp/search/slider/

On the old search, you will find the most relevant and expected results grouped onto page 3, with fairly irrelevant results on page 1 and 2 (unless you are looking for relatively unpopular plugins with exact match titles). On the new search, you will find the most relevant results on page 1. With the old search, I suspect I could release a plugin called "Slider Slider" and get to position one.

It's a similar story for other popular keywords, e.g. Lightbox, gallery, contact form.

I've been following this thread since the early days, and I must say I don't envy those working on this ticket! It seems there are a lot of conflicting opinions, some backed up with evidence from the old (also imperfect) search. I appreciate the work everyone is putting into it.

#140 in reply to: ↑ 137 @joeguilmette
8 years ago

Replying to joeguilmette:

Interestingly, WP All Import is nowhere to be found on page 1 results when searching for 'import'. The first result is a 1 star'd plugin with 400 actives, and the second is a third party add-on for WP All Import. There are plugins without any active installs that haven't been updated in two years on page 1.

Oops! I made a mistake, I was using the old search 😂

Keyword: import
New search - Only half of the page 1 results are importers: https://wordpress.org/plugins-wp/search/import/
Old search - The results are more relevant, but the plugin ranking is off because there are many popular well reviewed plugins not shown and page 1 has many plugins with very installs, 1 year+ without updates, etc: https://wordpress.org/plugins/search.php?q=import

Keyword: export
New search - Same, less than half of the page 1 results are exporters: https://wordpress.org/plugins-wp/search/export/
Old search - Relevant results, but poor ranking: https://wordpress.org/plugins/search.php?q=export

Last edited 8 years ago by joeguilmette (previous) (diff)

#141 in reply to: ↑ 139 ; follow-up: @gibrown
8 years ago

Thanks all. Some very helpful examples. I'm not going to respond to everyone because I think a lot of these are are related to the same problem of over matching partial words or very common terms getting somewhat swamped out by very popular plugins that also use that same term ("import" and "export" are very good examples of this).

I think the new search is a lot better than the old, but I am also still disappointed in the current results and think we can do much better. Going to spend time working on it rather than responding to individual cases for the moment.

I've been following this thread since the early days, and I must say I don't envy those working on this ticket! It seems there are a lot of conflicting opinions, some backed up with evidence from the old (also imperfect) search. I appreciate the work everyone is putting into it.

Thanks for the kind words @megamenu. Honestly, this is pretty fun to work on. It's a hard problem and having engaged users with opinions is a great way to learn.

#142 @gibrown
8 years ago

@progmastery we've been tracking down various sync issues off and on when they come up. Often cases where Jetpack sync is not hooked into updates to the plugin directory properly, but there have also been some problems when indexing the synced data into Elasticsearch (right now I'm about to go fix the 900 we are missing).

Jetpack sync code is here: https://github.com/Automattic/jetpack/tree/master/sync
All of the ES indexing is done on WP.com side. The exact index code (with the custom fields) is not currently public though it is build on top of https://github.com/Automattic/wpes-lib

Here are the mappings for the index though: https://gist.github.com/gibrown/1be2434f14e9abf17e5a2e6ec6de93e4

Happy to chat about that more if you want to get into it. Feel free to contact me on .org Slack also.

#143 in reply to: ↑ 141 @sethshoultes
8 years ago

Replying to gibrown:

Thanks all. Some very helpful examples. I'm not going to respond to everyone because I think a lot of these are are related to the same problem of over matching partial words or very common terms getting somewhat swamped out by very popular plugins that also use that same term ("import" and "export" are very good examples of this).

I think the new search is a lot better than the old, but I am also still disappointed in the current results and think we can do much better. Going to spend time working on it rather than responding to individual cases for the moment.

I've been following this thread since the early days, and I must say I don't envy those working on this ticket! It seems there are a lot of conflicting opinions, some backed up with evidence from the old (also imperfect) search. I appreciate the work everyone is putting into it.

Thanks for the kind words @megamenu. Honestly, this is pretty fun to work on. It's a hard problem and having engaged users with opinions is a great way to learn.

Keep up the great work. We know you have worked hard on this. We are all very optimistic about the results.

#144 @tellyworth
8 years ago

In 4654:

Plugin directory search: correct weightings from @gibrown for r4627.

See #1692

@gibrown
8 years ago

Improve precision of search relevancy

#145 @gibrown
8 years ago

@tellyworth that patch fixes a lot of issues with imprecise search results. Pretty much every search discussed on this thread has been tested and improved. I've also tested and tried to evaluate it against over 3000 queries. Reasonably happy with the results.

It does have some downsides. Increasing the precision of the search causes us to do worse when trying to match queries that have some error in them. I estimate that with this about 35-40% of all queries will get less than 5 results, and 18% will get zero results. I think before this match, those rates were half that. (Sidenote, the old search was probably just as bad). I think the best way to deal with these cases though is to add some auto-correction and that is probably best to break out into another ticket.

#146 @tellyworth
8 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 4752:

Plugin directory search: improve search precision. Props @gibrown.

See https://meta.trac.wordpress.org/ticket/1692#comment:145 for detailed notes. This fixes most previously reported search issues.

Fixes #1692

#147 @lukecavanagh
8 years ago

@gibrown

How much weight in the search does a plugin not resolving support posts or supporting a plugin on the repo, since the plugin support might be on a 3rd party site?

#150 @rrv4813
7 years ago

Hi,

I don't know is this the right place to clarify a doubt in https://meta.trac.wordpress.org/browser/sites/trunk/wordpress.org/public_html/wp-content/plugins/plugin-directory/libs/site-search/jetpack-search.php#L1000

I will post my query, can you please guide me to get it solved.

While calculating function_score, we have a function 'field_value_factor' for 'support_threads_resolved' at line 1000.
My doubt is whether 'support_threads_resolved' is the number of support_threads_resolved or the percentage of support_threads_resolved as the name suggests the former but there is 'missing' => 0.5 term at line 1003 which suggests the latter.

Any help will be highly appreciated.

Thanks,
Rohit

Last edited 7 years ago by rrv4813 (previous) (diff)

#151 follow-up: @bfintal
7 years ago

@rrv4813 From what I know, it's the number of support threads marked as resolved.

#152 in reply to: ↑ 151 @rrv4813
7 years ago

Replying to bfintal:

@rrv4813 From what I know, it's the number of support threads marked as resolved.

Thanks.

This ticket was mentioned in Slack in #slackhelp by nikolam. View the logs.


6 years ago

This ticket was mentioned in Slack in #core by joyously. View the logs.


3 years ago

Note: See TracTickets for help on using tickets.