Making WordPress.org

Opened 3 months ago

Last modified 4 weeks ago

#7477 new defect (bug)

Introduce limits for readme field length

Reported by: dd32's profile dd32 Owned by:
Milestone: Priority: normal
Component: Plugin Directory Keywords: has-patch
Cc:

Description

In short: I propose limiting it to 1,500 words per section

Reasoning & the data

While reviewing some search-related tickets I found a plugin that was appearing in results that it arguably shouldn't, and upon inspecting it, found that it was violating Guideline 12 "do not spam".

Upon reflection of this, and looking at data of all plugins, I feel we would be best placed to introduce limits to the readme content length, such as to limit spammy behaviour and reduce the impact it has upon the search results.

I propose limiting it to 1,500 words per section, similar to how we already limit the short description to 150 characters, and assets to certain file-sizes.

The choice of 1,500 is such that it means zero change to (literally) 99% of plugins, and 99.8% have less than 3,000 words.

The final 0.2% of plugins are those who I would call out for being in blatant violation of the guideline, in one case, the readme is over 260KB amounting to over 26,000 individual words.

This is not to say that a plugin with less than 1,500 words is not in violation of the guideline, this is only intended on being an absolute hard-limit. A plugin could violate it with a few hundred words. Once you start to exceed 1,500 however, it becomes increasingly hard to prove that it's for a humans benefit, and not to simply benefit search rankings.

To put these numbers into perspective..

  • an average book has between 250-300 words per page
  • an average readable computer print-out has 400-500 words per page, possibly up to 800 if you squeeze it in.
  • 26k words is between 30 and 80 printed sheets of paper.

Change History (14)

This ticket was mentioned in PR #204 on WordPress/wordpress.org by @dd32.


3 months ago
#1

  • Keywords has-patch added

#2 @dd32
3 months ago

I should also note, I personally feel that even 1,500 is most likely too long. I'd argue to less than 1,000 words should be encouraged, as even that is getting far too long to read.

Some of the longer readme plugins could utilise the FAQ section to shift some of the content out of the main description. FAQs are not searched, which is why (IMHO) we don't see plugins listing their kitchen sink of features in there instead.

#3 follow-up: @dufresnesteven
3 months ago

Agreed. When we truncate the section should we add a link to the readme in source control?

... [Read more](link-to-svn-readme.txt).

#4 in reply to: ↑ 3 ; follow-up: @dd32
3 months ago

Replying to dufresnesteven:

should we add a link to the readme in source control?

... [Read more](link-to-svn-readme.txt).

Having looked at the data, I'm not sure this would actually be beneficial to most, but that is a reasonable suggestion so I'm not against it.

#5 @dd32
3 months ago

In 13235:

Plugin Directory: Introduce a maximum length to the readme sections.

See #7477.

#6 @dd32
3 months ago

In 13236:

Plugin Directory: Use a custom word split functionality due to wp_trim_words() eating whitespace.

Followup to [13235].
See #7477.

#7 @dd32
3 months ago

In 13238:

Plugin Directory: Add warnings generated during a plugin import to an alert on the plugin page to the author.

See #6108, #7477, #6921.

#8 @dd32
3 months ago

I've had a few plugin authors DM me about this, mostly with the same kind of underlying tone.

Although I've requested them to make the comments public, none have done so yet.

Some paraphrased issues that have been raised include:

  1. 1,500 words is too short to explain the plugin in friendly terms.
  2. Since you don't allow images, we have to use lots of text to explain the functionality
  3. Increasing to 3,000 would allow us to bring the readme into size by removing our cross-promotional material
  4. Changelog should be longer, we have a large team and a lot of changes in each release.

To publicly respond to these:

  1. I disagree, but I hear you. I would really suggest that this isn't the case, plugins don't need to go extreme lengths to detail how every functionality of a plugin works, in the description of a plugin. This is perhaps a situation where the FAQ or a plugin-specific site is needed, but it's IMHO a bigger thing than what should be considered a "plugin description".
  2. Images are supported as screenshots, and Videos are (IMHO unfortunately) supported inline. Perhaps I would suggest looking into adding a plugin Blueprint for a live-demo; #7251 as another route to explain the plugin if this is restricting you too much.
  3. I'm questioning if this wouldn't be falling afoul of the "readme's must not spam" guideline already, as a description that is unrelated to the plugin could be considered unwanted / spam.. that could be a stretch of that guideline though, and would likely depend on the actual wording and what plugin is being references.
  4. The changelog appears to have been the main impacted field. Currently there's 200 plugins that have a truncated changelog, and 100 with the Description truncated.

On the last note; I'm tempted to suggest that the Changelog should either be excluded or have a higher limit. Edit: This was before running the numbers; I feel the numbers suggest that the 1,500 limit is acceptable for changelogs; see below
When I initially ran the numbers on number of affected plugins, I was looking at the description section, not the changelog.

Looking at the data for the changelog, there are some plugins with a very lengthy changelog on display; One such plugin has 70 pages (in print view) with every feature/bug fixed since 2014 - I'd suggest that's a bit much. It's < 1,500 words if limited to changelogs for the last year.

The choice of 1,500 is such that it means zero change to (literally) 99% of plugins, and 99.8% have less than 3,000 words.

For reference, comparing this to the changelog (Even though 70% of plugins do not have a changelog):

  • 98.45% have 1,500 or less words
  • 99.35% have less than 3,000

If we focus only on plugins that have change logs to make it a realistic comparison

  • 95% have less than 1,500 words
  • 98% have less than 3,000 words
  • 99% have less than 5,000 words

When we truncate the section should we add a link to the readme in source control?
... [Read more](link-to-svn-readme.txt).

Coming back to this suggestion above by Steve, this might be a reasonable suggestion for these cases, although the readme.txt although designed for human consumption, is really not a nice document to read manually.

Last edited 3 months ago by dd32 (previous) (diff)

#9 @smub
3 months ago

I can see why this proposal was implemented rather quickly without much discussion. There is no reason why anyone should have a plugin readme that contains 26k words. That is clearly spam.

Typically when combatting spam on our websites, one critical thing we look at is avoiding false positives as much as we can because that ensures a better user experience.

One way this implementation can be improved to avoid false positives is increasing the limit to 3000 words. That would cover the extra 0.8% of plugin authors with readme description and 5% of authors with changelogs because not many authors are following this trac ticket. They only find out that their changelogs or readmes are truncated after they push an update.

This leads to extra work because in some cases, the entire readme has to be rewritten. Furthermore, I do agree with the stance that rewriting readme to fit 1500 words for certain plugins may lead to less user friendly descriptions or may even lead to more videos in the description (which may not be a bad thing), but it does require users to watch videos which for some isn't the preferred medium because reading text is easier.

FAQs are a good alternative to add details, but unfortunately FAQs are hidden by default in an accordion layout (i.e requires extra click to open). This can lead to a cluttered UI unless people start using FAQs accordion layout as sub-sections and then call one of those accordions FAQs which again would lead to a sub-optimal user experience.

IMHO a quick fix here would be to increase the truncation limit to 3000 words.

#10 @dd32
2 months ago

In 13262:

Plugin Directory: Increase the maximum wordcount for descriptions from 1,500 to 2,500 words, and FAQ and Changelogs to 5,000 words.

This increases the limits placed in [13235] to reduce the impact upon some plugins, while still maintaining some level of reasonableness for end-users.

See #7477.

#11 @dd32
2 months ago

In the above commit I've increased the general word-count to 2,500 (from 1,500) and increased the FAQ & Changelogs to 5,000 (from 1,500).
Any plugin currently showing as truncated will automatically be re-parsed in the next few hours.

These numbers are still subject to change, and we may need to consider alternative UI adjustments in order to make some plugin pages be more user-friendly.

For example; currently the FAQ is burried after a very long page for some plugins (particularly those with a long list of included Blocks) and the screenshots are very easy to miss as well.

#12 @dd32
2 months ago

In 13266:

Plugin Directory: Readme Validator: Show the correct limit for the FAQ/Changelog section.

See #7477.

#13 in reply to: ↑ 4 @webdevmattcrom
2 months ago

Replying to dd32:

Replying to dufresnesteven:

should we add a link to the readme in source control?

... [Read more](link-to-svn-readme.txt).

Having looked at the data, I'm not sure this would actually be beneficial to most, but that is a reasonable suggestion so I'm not against it.

The Plugin Handbook suggests to leverage changelog.txt for longer changelogs:https://developer.wordpress.org/plugins/wordpress-org/how-your-readme-txt-works/#file-size.

Adding that automatically (if present in the plugin) for all changelogs would be beneficial.

#14 @dd32
4 weeks ago

In 13525:

Plugin Directory: Readme Parser: When trimming sections, if invalid UTF8 data is encountered, trim it in a non-utf8 safe manner.

This avoids warnings and unexpected outputs from the function.

See #7477, [13236].

Note: See TracTickets for help on using tickets.