WordPress.org

Making WordPress.org

Opened 13 months ago

Last modified 3 months ago

#2114 new defect

Possible abuse on popular themes list based on active installs

Reported by: acosmin Owned by:
Milestone: Priority: low
Component: Theme Directory Keywords: close
Cc:

Description

The Theme Review Theme doesn't police theme names anymore and this can lead to possible abuses of the popular themes list.

If an author chooses a theme name and that theme name already exists outside the .org directory it will inherit all the active installs from that theme.

Best example is with Themeforest themes:

This newly released theme (just as an example, I don't want to say this theme abuses the system, it was released a long time ago when active installs were not part of the algorithm)
https://wordpress.org/themes/total/

gets all the active installs from this popular TF theme
https://themeforest.net/item/total-responsive-multipurpose-wordpress-theme/6339019

The solution could be to add a few more parameters in the mix and change the algorithm.

  • use the Author Name or Author URI or Theme URI to get unique results.
  • maybe a new tag in style.css


This isn't fair to theme authors and needs a fix.

Thanks!

Change History (19)

This ticket was mentioned in Slack in #themereview by acosmin. View the logs.


13 months ago

#2 @ocean90
13 months ago

Related: #1604 for plugins

#3 @grapplerulrich
13 months ago

Thank you for creating the ticket. A problem may arise if the Author or Theme URI gets changed of the time. We could warn Theme authors about this in advance.

#4 follow-up: @acosmin
13 months ago

Another option, in TF's case, would be to look for this in style.css

License: Custom license
License URI: http://themeforest.net/licenses/terms/regular

There are only 18 themes fully GPL compatible with not so many sales. The rest are on a custom license.

Last edited 13 months ago by acosmin (previous) (diff)

#5 @dd32
13 months ago

  • Priority changed from high to low

I'll refer this off to #14179 on Core trac.
The only solution here is to adjust how core theme update notifications are done, if an alteration is made there, then this will flow through to the active install counts for themes (and likewise for plugins),

It's well documented that at present theme updates are based purely on the theme slug (folder name), and for Plugins although it's based on more data points, at the end of the day it's based almost purely upon the plugin slug (folder name) and plugin header name.

The ideal solution for this is to include a unique header/UUID/hash in the headers to base update notifications off (as suggested in Core #10814 and others).

I'd argue that TRT should probably reject theme names which are already used in the wild, primarily to prevent the unexpected update notifications from existing themes to the new w.org hosted theme.
If this was something the TRT is interested in, I could hook up something to determine that based on our existing stat engines and report it, say if 100+ sites in the wild already use that name. However that gets murky when you consider many themes are live in the wild via the authors site for months before approval on w.org, and it'll also significantly reduce the number of usable theme names (Hint: all the good ones are taken, with millions of theme names in the wild).

Marking low as to be honest, this is something we've known about for 7+ years and never made a move on, marking this as a duplicate of one of the core tickets is also an option.

#6 in reply to: ↑ 4 @StephenCronin
13 months ago

Replying to acosmin:

Another option, in TF's case, would be to look for this in style.css

License: Custom license
License URI: http://themeforest.net/licenses/terms/regular

There are only 18 themes fully GPL compatible with not so many sales. The rest are on a custom license.

Hi @acosmin,

This is sort of irrelevant given Dion's response (which makes sense to me), but just wanted to let you know that you can't rely on that tag (that the URL links to) to work out how many 100% GPL themes there are on ThemeForest. This collection has 100 themes and I doubt it has them all.

Also, I just checked a few ThemeForest themes (I work for Envato) and found that the use of the license field is pretty inconsistent, with some themes not even including it in style.css - so it can't be relied on.

Last edited 13 months ago by StephenCronin (previous) (diff)

#7 @edge22
13 months ago

If that's the case, perhaps the algorithm on the Popular tab needs to be tweaked.

The way it is now, this theme is going to skyrocket to the top of the list after the first two weeks (or whatever it is).

#8 @Shaped Pixels
13 months ago

Looks like my comment about this in Slack got some attention...I've actually noticed the discrepancies from time-to-time from minimal counts to crazy numbers like the 30,000 active installs on a new theme. I follow the theme lists on a regular basis, so glad to see this was brought forth.

I would assume that in some cases, it's accidental, while others it could be a loophole someone discovered which would lead to the possibility of abuse--especially when it comes to the Popular List. There's at least a couple themes there, plus others I've seen over the months that have numbers that seem out of the ordinary.

As a side note and as a theme developer (author), I would still love to see the total downloads count along side the active installs count. Gives the author (and users) a way of seeing the ratio of active installs to downloads. I know for me, it would give me a better idea of how my themes perform. If a theme has large download counts but low active installs, that would tell me users are not overly happy with the theme, therefore, creates incentive for the author to do better.

This ticket was mentioned in Slack in #themereview by charliel. View the logs.


12 months ago

#10 follow-up: @ionutn
12 months ago

Hey,

For me this looks a directory related bug, we should somehow calculate the nr of active installs since the theme is live in the repo, it isn't perfect in all cases but better than what we have now :)

#11 @grapplerulrich
10 months ago

  • Keywords close added

I think we can close this ticket as now the active install number is added to the trac tickets for new themes that have more than 500 active installs.

If we want to support the use of the same slug for multiple themes in the WordPress ecosystem then create a patch in core to create a unique ID for each theme.

This ticket was mentioned in Slack in #themereview by thinkupthemes. View the logs.


5 months ago

#13 @ocean90
5 months ago

#2840 was marked as a duplicate.

#14 @MH Themes
5 months ago

Regarding the popular list issue like pointed out by @thinkupthemes in ticket #2840, wouldn't it already help if the time parameter in the algorithm would be adjusted? At the moment new themes will appear on the popular list after 2 weeks. You could for example adjust this to 3 months, that way it would be much harder to game the system because while it's quite easy to get 2-3k active installs, it's much harder to get 10-20k.

Usually after 3 months most of the new themes with only 2-3k active installs rank already much lower because they fail to attract more users. So if you adjust the algorithm from 2 weeks to 3 months, that would possibly prevent new themes with only 2-3k active installs to appear in the top spots on the popular list right after release. If these themes really become popular and achieve 10-20k active installs or more, then they will rank accordingly anyway.

#15 in reply to: ↑ 10 @ThemeZee
5 months ago

Adding new themes to the popular list not until after three months is a disadvantage for all new themes, not just the ones with naming conflicts and gaming the system.

Right now it is only a great advantage for new themes if they already have active installs before going live. It does not matter if these come from a different theme with the same name or from an existing user base.

I would therefore measure the growth of active installs, not the total amount, as @ionutn said:

  • Save the initial active install count when theme gets live
  • Calculate active installs from day two with the difference of initial and current active installs

#16 @MH Themes
5 months ago

In general I think it's inappropriate to rank brand-new themes with a bunch of active installs (2-3k) in the Top 10-20 on the popular list. These are not popular themes, they are trending, if at all. But users who don't know about these issues may think these themes are the best what .org has to offer and start using these themes even more which will keep these new themes high in the popular list based on a flaw in the algorithm. It's a self-fulfilling prophecy.

#17 @thinkupthemes
3 months ago

Hello,

This issue is really becoming a hot topic and would be great if it can be addressed. The issue primarily causes controversy due to its affects on the position of themes in the popular queue. Is there a possibility of updating the algorithm for ranking purely the popular list alone? This might be a good midway position until a full fix is in place.

Additionally, the theme info returned when collecting the theme name also contains the theme author (I believe). A possibility to addressing this issue could be to ensure that active installs are counted only when they come from the same author.

Kind regards.

This ticket was mentioned in Slack in #themereview by greenshady. View the logs.


3 months ago

#19 @dingdang
3 months ago

Hello,

since this is a different ticket regarding the same problems which can be easily solved I copy/paste my proposal from core's ticket here as well.

Regarding the topic of this ticket - if/after the proposed solution is implemented, all the counts of "active installs" will automatically count only themes (all of their versions) that are published at wordpress.org, so this will correct even past cases in the directory and "Popular themes" tab. (Ex: A theme that has hijacked other's active install will lose them and will display only the number of active installs of that specific theme and its versions at wordpress.org.)

https://core.trac.wordpress.org/ticket/14179#comment:44

Here is the version 2, no need of MD5 anymore and with an explanation in How it works section, easy to understand by anyone.

Proposal for a solution to the “collisions” of WordPress themes.
Simplified Version 2.

Table of contents:
Changes compared to Version 1.
Introduction.
Formal composition of a unique ID.
API: determination of available theme updates.
Other: calculation of theme's active installs.
Benefits.
How it works.
Technical data.
Software changes.

Changes compared to Version 1.

  • eliminated the need of the Author URI field
  • eliminated the need to calculate MD5 hashes
  • new section “How it works”
  • new section “Software changes”

After analysis of the content of the current set of “native” to wordpress.org themes and all of their versions (4876 themes, 56730 versions) a conclusion has been made that only two fields are needed in the process: the theme slug and the author. The author URI is redundant.
As a result the composition of the UIDs is simplified thus calculation of MD5 hashes is unnecessary which simplifies even more the changes to the system.

Introduction.

A collision is a term that is describing the slug match of two themes that are not related to each other but have the same name.

Two main problems are related to these cases of collisions:

  1. If there is a theme in the wordpress.org's database of themes and another one, created by another author, the second one would get an “Update” option and possibly will be replaced by the theme, published at wordpress.org. This can happen also to well distributed themes after uploading a new theme with the same name at wordpress.org and unexpectedly after an unwanted update to replace themes of web sites published long time before that.
  2. Calculation of active installations is taking in count not just those of the themes from wordpress.org's database, but as well other external themes as well random child themes residing in a folder with a matching name. Thus, authors exploit this to artificially place their new themes on top of the list by catching names of long time distributed external popular themes.

The proposed techniques solve all of the problems, with very little coding, while keeping backward compatibility, and solving the related problems for the old themes as well, not just the newly released.

Formal composition of a unique ID.

  1. Need to chose a separator, that is currently not allowed to be present in theme names. Ex: “|”, will be used below.
  2. For every theme since WordPress 3.0 (and may be even earlier versions) the core code is already reporting the following two strings:
  • theme Slug (ex: nicetheme)
  • theme Author (ex: John Doe). May not be present, if not – this is an empty string.
  1. Compose thе UID: “slug|author”. Ex: “nicetheme|John Doe”

Since all of the two fields are present in the themes (trough style.css) and are reported by WordPress (even by the very old versions) there is no need to implement and add any new data to the themes (like manually adding codes/hashes) nor to the code of the core or API to handle them.

The invention: A one-time composition of the UIDs for the current themes and all of their versions must be performed and store the list in a table. For all new theme version updates and new theme uploads, the UID will be composed and added to the same table if it's not existing already.

As the UID contains the theme slug as a prefix, it is trivial to relate a given UID unambiguously to the theme slug if needed by extracting the string that precedes the first occurrence of the separator. No other relations need to be stored.

API: determination of available theme updates.

A small update (several lines of code) is needed to identify themes not by just a slug, but by this new UID, checking in the table of UIDs. Only if the UID is present the algorithm continues by identifying the theme slug from the UID and checking as usual if there is newer version and if so – to send back an “update available” reply.

Other: calculation of theme's active installs.

Active installations of a given theme are calculated by the sum of active installations for all the UIDs related to that theme. This will result in real numbers and the “Popular themes” list will be sorted using the real numbers for the themes at wordpress.org, automatically excluding all the counts related to external themes (the wrong current numbers will be corrected to their true values).

Benefits.

  • it is handled automatically;
  • solves all the problems;
  • fully backward compatible (old WP versions);
  • solves the problem for the old existing themes as well;
  • solves the "Active installs" count problem – active installs will count automatically just the real active installs of the wordpress.org's theme even for the old cases and exploits;
  • theme authors don't have to do anything – no changes to style.css or anywhere from their standpoint;
  • external authors don't have to do anything to prevent their themes to be messed by unwanted updates – no need for "private" tag;
  • no need for changes in the core (unless for optimization);
  • the check for updates at the backend (API) is almost the same, the search is performed in a table of UIDs instead of theme slugs;
  • since there is no change in the theme's structure and new fields, the software updates related to the API and Active installations counting are independent; can be done at different points in time;
  • backward compatibility for the old versions of WordPress and old versions of the themes w/o the need to change them which is the best part of this proposal;
  • handles well the cases where a theme is acquired by another author – the theme will continue to catch updates;
  • handles well the cases of themes distributed by an author prior uploading it to wordpress.org – all previous installations will continue to catch updates from wordpress.org.

With simple words – implementing it the proposed way will put everything in place in a way like it was so from the beginning of WordPress existence.

How it works.

  • There are N themes "native" for wordpress.org (those that are currently active) for which the UIDs are precomposed for all of their old and the current versions in the SVN, and a table with that list is created; only unique values are stored, they act like a database of fingerprints, like humans can have 10 different fingerprints that link to one and the same person;
  • There are a total of N*1.16 UIDs (that's because some themes have "evolved" and got changed their authors);
  • Which means that one theme is identified in general by more than one UID;
  • Any site with any of these UIDs is unambiguously linked by the API to specific theme slug (the part that precedes the delimiter) and the API sends back the new version as usual;
  • Any external theme with the same name however comes with different UID and so the API stops at that point where this UID is unknown (not present in the table of UIDs) and as a result doesn't send back an update info, nor counts this as an active install.

Technical data.

Some tests were performed to help on decisions.

  1. There are:
  • 4876 total themes at wordpress.org;
  • 56730 total different versions;
  • 11.6 average versions per theme;
  • 1.16 the average ratio of different UIDs per theme (a single theme has more than one related UID if the author has been changed over the time);
  • 5600 (approximately) generated UIDs for the current themes (the new list to search in, instead 4876), i.e. no difference in the CPU time needed to process search requests.

Software changes.

This is a guess where in the system software updates are needed.

The API:

  1. compose the UID based on slug, author
  2. check in the table of native UIDs
  3. if the UID is present, slug = the part that precedes the delimiter and continue as usual
  4. else, ignore that theme and continue (the same way it is ignored if the slug is not present in wordpress' database of slugs now)

The "one time job":

  1. foreach active themes and all of their versions in the SVN
  2. read their style.css and compose the UID based on slug, author
  3. store the UID in the table of UIDs (only if it's non existing)

On new theme/update approval:

  1. compose the UID based on slug, author
  2. store the UID in the table of UIDs (only if it's non existing)

The active themes counter/collector:

  1. compose the UID based on slug, author
  2. checks if it is present in the table of UIDs
  3. only if it is present increase the counter for the slug which is the part that precedes the delimiter
  4. count in a second table the active installs for non-existing UIDs as well (as it probably does now for non-existing slugs – to be able to inform how much active installs has the newly uploaded theme so the reviewer could investigate if it is a legitimate author that must be linked to these copies, or someone uploaded someone else's theme)

The code that reports "currently has ... active installations"

  1. it must report not just >500 cases but now the exact number of installations of the exact UID match (which is for the exact combination of slug, author) - we have this in the table 4. from the previous section
  2. to prevent abuse on theme updates – if there is an author change (those cases are very rare) and the number of active installations of that newly composed UID is not 0 (or close to 0 taking in mind that there may be testing installations of that version), it shouldn't be auto-approved by themetrackbot but a reviewer must check manually the author's change in style.css to avoid hijacking of an external theme's UID

07/22/2017
by dingdang

Note: See TracTickets for help on using tickets.