Making WordPress.org

Opened 4 years ago

Last modified 8 weeks ago

#5152 assigned enhancement

Put a limit on adding new translation after multiple warnings

Reported by: nao's profile Nao Owned by:
Milestone: Priority: normal
Component: Translate Site & Plugins Keywords:
Cc:

Description

There should be some kind of block for users to enter poor quality translation suggestions, including unreviewed machine-translated text.

These are some points I tried to gather to avoid wrongly flagging novice translators who are just making some mistakes as they learn while giving a hard stop to users who try to add translations that burden locale teams.

Idea

For both file upload (import) & manual entry, put a limit on a user after their translations have a certain number of warnings without correcting them.
The limit can be a ban for a set period of time, or ban until mistakes are corrected (or both?).

Flag these warnings

  • HTML mangled: Certain patterns that are obvious signs of a machine translation should be rejected (e.g. space around opening/closing tags). But we need to be aware regular contributors can make a typo.
  • Extra HTML attribute
  • Placeholders missing or extra added

Ignore, or be lenient on these warnings

  • Different URL from the original: This is often necessary for pointing to localized docs.
  • Accidental newlines: should be automatically matched to the original.

Not sure about this

  • Missing/too many tags: Sometimes I explicitly do this for an em element (remove them and replace them with a different way of emphasis) since italic text is not common in Japanese writing. I can't think of any other case though, and if other locales have something like this.

Other considerations

  • Extra flag for a user submitting to multiple locales with many warnings.
  • Banned user should be visible in #polyglots-warnings or somewhere private for GTEs or global mentors.
  • Never ban GTEs and PTEs on projects where they have validating rights.
  • Sometimes translators keep adding several translations with warnings because they don't know they can reject their own translation from before. Should we overwrite (or ask?) when a new translation is being submitted for a previously translated string by the same user?

Related: #4171

Attachments (5)

5152.diff (4.3 KB) - added by ocean90 4 years ago.
5152-logging.png (561.9 KB) - added by ocean90 4 years ago.
translation-warnings-type.png (47.8 KB) - added by Nao 4 years ago.
translation-warnings-users.png (94.0 KB) - added by Nao 4 years ago.
translation-warnings-users-types.png (64.7 KB) - added by Nao 4 years ago.

Download all attachments as: .zip

Change History (26)

This ticket was mentioned in Slack in #polyglots by nao. View the logs.


4 years ago

#2 @jeroenrotty
4 years ago

What could be an idea is to integrate some of the checks the GlotDict extension brings into the GlotPress-WP project by default, that does a few extra checks on grammar, tags, ending or beginning spaces etc. GlotDict can be found here: https://github.com/Mte90/GlotDict

#3 @garrett-eclipse
4 years ago

I can give a hand migrating the GlotDict warnings (JS) into the custom warnings (PHP) plugin here;
https://meta.trac.wordpress.org/browser/sites/trunk/wordpress.org/public_html/wp-content/plugins/wporg-gp-custom-warnings/wporg-gp-custom-warnings.php

Would all GlotDict checks be desired?

  • Validation for final "...", ".", ":"
  • Validation for final ;.!:、。؟?!
  • First letter in translation is not uppercase but the original string is
  • Detect first and last character if they are space
  • Missing term translated using the locale glossary
  • Check for curly apostrophe
  • Check for non typographic quotes

And would it be toggle-able via user settings? In GlotDict all warnings can be silenced individually if they aren't a fit for the locale or translator.

#4 @Nao
4 years ago

I always thought this could be a good warning to display by default!

  • Missing term translated using the locale glossary

These sound good too but we probably shouldn't give heavy penalties toward banning/limitation?

  • Check for curly apostrophe
  • Check for non typographic quotes
  • Detect first and last character if they are space

These rules are great for many languages but could be confusing to others if they get warnings when writing properly in their language (especially for newbies).

  • Validation for final "...", ".", ":"
  • First letter in translation is not uppercase but the original string is (e.g. languages without upper/lowercase characters)

I only know two languages, so other opinions are welcome! :)

#5 @garrett-eclipse
4 years ago

Thanks @Nao for the feedback, if we do want to adopt the warnings that could be confusing I would suggest we make them optional or not bother to implement them specifically. These options can be handled through the Translation Settings (https://translate.wordpress.org/settings/)

#6 @Mte90
4 years ago

GlotDict creator here (but @garrett-eclipse is one of the maintainer too).
We can migrate it in Glotpress or translate.wp.org or add new ones is not a problem.
I think that the big problem in GlotPress is that is not possible to turn off specific warnings. Right now there is no feature to customize per users this kind of things so will require some development in GlotPress probably.
A lot of them during the years were added based on requests and for specific languages, so probably is better to migrate as first the ones that are mandatory for all the languages.

#7 @ocean90
4 years ago

Limiting users based on triggered warnings and adding new warnings should be handled separately. This ticket should only be about creating solid rules for a limit. Though, I agree that some of warnings from GlotDict should be contributed back to GlotPress itself and not only to translate.w.org.

Currently we only log discarded warnings into a database table while new warnings are only pushed to the #polyglots-warnings channel on Slack.
As a quick first step we should extend the database logging for warnings in general so we have some structured data which can be exported. This data can then be used to make some data driven decisions.

#8 @ocean90
4 years ago

In 9739:

Translate: Add timestamp to discarded warning logging.

See #5152.

@ocean90
4 years ago

#9 @dd32
4 years ago

As a quick first step we should extend the database logging for warnings in general so we have some structured data which can be exported. This data can then be used to make some data driven decisions.

I was surprised to find out that the warnings are only stored within the translations table, so storing them like that is definitely a +1 from me

This ticket was mentioned in Slack in #polyglots by nao. View the logs.


4 years ago

#11 @ocean90
4 years ago

In 9890:

Translate: Log translation warnings to a database table for analysis.

See #5152.

#12 @ocean90
4 years ago

In 9891:

Translate: Log multiple warnings for the same translation.

See #5152.

@ocean90
4 years ago

#13 @ocean90
4 years ago

The extra logging for warnings is now enabled. I'm providing an export in a week or two so we can review the data.

#14 @ocean90
4 years ago

In 9893:

Translate: Include the message of a warning in translation warnings logging.

See #5152.

#15 @Nao
4 years ago

Based on the data @ocean90 shared, I am seeing an opportunity to catch unreviewed machine translations using the warning log.
(the data contains usernames and their activities in detail, so it's not appropriate to share the raw version publicly)

  • Between 2020-05-19 21:20:32 and 2020-08-26 10:13:13, there were 13,983 warnings
  • Among top 20 users with the most warnings, 4 were probably uploading unreviewed machine translated files, 2 were well-meaning inexperienced users, 1 was a CLPTE, and 13 were very active users submitting lots of translations.

I was easily able to detect the possible offenders by reviewing unfamiliar usernames for their translation history profile. They all had submitted suggestions for the same plugin or theme in multiple languages. Also, the log showed these users' warnings were happening within a short period of time (5 to 100's in a row), which is a pattern not typical for legitimate translation contributors.

Types of warnings themselves may not be enough to detect offenders, but mismatching_url often tend to be intentional.

I think we should block upload when a set number (let's say 5) of warnings are detected in an upload file. At the same time, display a message that they should never upload unreviewed machine translation.

#16 @psmits1567
4 years ago

Also I would like to propose to add the fuzzy check into the import.If an imported record is fuzzy do not import the record. Or set a treshold to block the import after to many fuzzy records, and throw an error stating that the import contains to many fuzzy records.And state that the import needs to be improved due to the large amount of fuzzy records

#17 @psmits1567
4 years ago

I do not if this is the proper ticket, but I noticed remarks about warnings!
So therefore decided to add my comment within this ticket

Another thought on improving the import is the following:
Currently if something is wrong with the import you will get an error thrown.
But this error does not indicate what is wrong with the import
That is a bit frustrating, as you do have the proper intention.
If an error is present why not indicate the line number that causes the error ?
Or even the reason for the error message?
That would save a lot of time finding the problem.

This ticket was mentioned in Slack in #polyglots by nao. View the logs.


4 years ago

This ticket was mentioned in Slack in #polyglots by nao. View the logs.


3 years ago

This ticket was mentioned in Slack in #polyglots by nao. View the logs.


3 years ago

#21 @dd32
8 weeks ago

  • Owner dd32 deleted
Note: See TracTickets for help on using tickets.