WordPress.org

Making WordPress.org

Opened 20 months ago

Closed 19 months ago

Last modified 7 months ago

#5154 closed defect (fixed)

Automatically fix some translation errors

Reported by: dd32 Owned by:
Milestone: Priority: normal
Component: Translate Site & Plugins Keywords:
Cc:

Description

Split off from #5152

There are some common translation warnings which can be automatically corrected, simplifying the translation process by reducing the amount of time a translator (and editor) have to spend checking translations.

The two main ones I see right now that can be automatically fixed are:

  • Newlines, either prefixed or suffixed to originals/translations
  • Unicode percent signs being used rather than ASCII percent signs in placeholders

Are there any others? I'm hesitant to fix mangled HTML tags, as although most are usually just an extra space around a < or </ it's a good sign of a machine translation that's usually not perfect.

Change History (20)

#1 @dd32
20 months ago

In 9741:

Translate: Add a plugin to automatically fix some common translation errors.

If it can't fix all of a translations errors, it doesn't alter the submitted translation.

See #5154.

#2 @tobifjellner
20 months ago

I agree that automatically fixing signs of machine translation should be avoided.
As a vector for the future, perhaps some automatic fixes could be offered for specific locales, but that would probably become a project on its own.

This ticket was mentioned in Slack in #polyglots by casiepa. View the logs.


20 months ago

#4 @Mte90
20 months ago

Looking at experience with other tools like Transifex, PoEdit, Lokalize and I think Crodin I think that is better to suggestions that fix automatically for you.
Like an opt-in way for any sentence.
Example: the sentence is missing a final dot, would you want to add it? Yes/No and automatically does for you.
This can be helpful for sanitization and let to create custom warnings for specific locales like to use different unicode symbols and so on.

#5 @dd32
20 months ago

I agree that not everything should be fixed automatically, this was a case where the fixes were obvious and "always" right and has significantly reduced the amount of warnings (IMHO)

Some warnings/fixes could be applied prior to the submission of the string, highlighting missing placeholders or tags prior to submission, etc. and offering automatic fixes there would make sense for things like HTML tags or highlighting the original/translation additions/deletions.

This ticket was mentioned in Slack in #polyglots by nao. View the logs.


20 months ago

#7 @dd32
20 months ago

In 9766:

Translate: The 'warnings' property isn't always set, since a translation might be being updated by the importer which does it in stages.

Amends r9741.
See #5154.

#8 @dd32
20 months ago

In 9801:

Translate: Sometimes the warnings key is set to a non-empty value that isn't an array.

See #5154.

#9 @dd32
19 months ago

In monitoring the #polyglots-warnings channel, the only other things I've seen that would be reasonable to autocorrect are:

  • non-ASCII $ in printf placeholders, there's a few various other unicode variants of the dollar sign
  • non-ASCII characters used in printf placeholders, such as a unicode S variant

Those seem to happen very rarely, so I'm going to skip adding anything for those and close this ticket as fixed for now.

There's another ticket to add some JS-based warnings pre-submit as well, which will hopefully remove the need for this in the first place and/or support auto-fixing some warnings.

If the warning logging that will hopefully be added as part of #5152 reveals anything major, we can re-open or create a new ticket.

#10 @dd32
19 months ago

  • Resolution set to fixed
  • Status changed from new to closed

#11 @ocean90
19 months ago

In 9889:

Translate: Add missing static keyword to avoid a deprecation notice.

See #5154.

#12 @dd32
10 months ago

In 10683:

Translate: Auto-correct spaced placeholders, common with machine translations.

This replaces % 1 $ s with %1$s in translations if it corrects the translation.

Machine translations aren't always wanted, but in many cases simply re-submitting the translation with the fixed placeholder is all that's done, this just reduces the amount of warnings generated, allowing translators to focus on the language content of the string.

See #5563, #5154.

#13 @dd32
10 months ago

In 10684:

Translate: Auto-correct curly quotes when HTML tags mismatch, catches cases where attributes are incorrectly curled.

See #5154.

#14 @dd32
10 months ago

In 10685:

Translate: Automatically correct strings which contain extra spaces within HTML tags.

This fixes strings such as </ p>, <a href="%s" >, and < / strong >. <a href="#" >
This does not affect spaces between attributes, such as <a href="#" target="_blank"> due to the limited cases recorded of that being an issue.

See #5154.

#15 @dd32
10 months ago

In 10686:

Translate: Automatically correct the case of sprintf format placeholders to their correct case.

This only applies to a sub-set of placeholders which can only be lower-case %[bcdosu], not those which can be both cases but mean different things: %[EFGX]
The most common %s and %d placeholders are covered by this, so %S / %D will be corrected.

See #5154.

#16 @dd32
10 months ago

Follow up to remove some warnings: #5621

Another auto-correct could be, this is both to speed up translators in-the-know, but also as it seems to be a common issue looking through the generated warnings.

  • If no printf placeholders are present in translation, exist in the original, and % (space inclusive) is contained within the translation, replace them in order. ie. % ba % would end up as %s to %s if that's what the original was.

Leaving this as closed, just noting the idea.

#17 @dd32
7 months ago

In 10956:

Translate: When removing spaced placeholders, only do it if it ends on a word boundary.

This fixes it attempting to correct 100% over to 100%over, while still allowing it to convert <a href='% 1 $ s'> to <a href='%1$s'>.

Follow up to r10683.
See #5563, #5154.

#18 follow-up: @tobifjellner
7 months ago

This fixes it attempting to correct 100% over to 100%over,...

In Swedish, we always put a space between a number and a following percent sign or unit. I.e. 110V, 25mm, and 50% we change to 110 V, 25 mm and 50 %.

#19 in reply to: ↑ 18 ; follow-up: @dd32
7 months ago

Replying to tobifjellner:

This fixes it attempting to correct 100% over to 100%over,...

In Swedish, we always put a space between a number and a following percent sign or unit. I.e. 110V, 25mm, and 50% we change to 110 V, 25 mm and 50 %.

Yeah that's fine, that won't be affected by this at all. The descriptions for these fixes are a bit.. not great and I apologise for that :)
I also need to write some unit tests, so that you can see what exactly is happening here.. so that errors don't happen too..

This was an issue where the auto-correction was improper.

Take this translation for example: https://translate.wordpress.org/projects/meta/wordpress-org/en-gb/default/?filters%5Bstatus%5D=either&filters%5Boriginal_id%5D=11765836&filters%5Btranslation_id%5D=83944362

Original: In March 2004, the <a href="%1$s">.... 80%% of our users? ....
Submitted: In March 2004, the <a href="%1$s">.... 80% of our users? ....

The translation fixer kicked in and transformed it like so:
Submitted: In March 2004, the <a href="%1$s">.... 80% of our users? ....
Whitespace fix: In March 2004, the <a href="%1$s">.... 80%of our users? ....
Percent fix: No change, doesn't apply as %o is a valid printf-style placeholder (which we shouldn't touch)
Result: Still has warnings because %% isn't present. Translation fix failed, Translation accepted with warnings about improper placeholders.

What should've happened, and will after this:
Submitted: In March 2004, the <a href="%1$s">.... 80% of our users? ....
Whitespace fix: No change, doesn't apply
Percent fix: In March 2004, the <a href="%1$s">.... 80%% of our users? ....
Result: No warnings, all placeholders exist, no stray placeholders, translation is accepted.

The whitespace fix would look something like this:
Submitted: In March 2004, the <a href="% 1 $ s">.... 80% of our users? .... (automated translation tools, etc)
Whitespace fix: In March 2004, the <a href="%1$s">.... 80% of our users? ....
Percent fix: In March 2004, the <a href="%1$s">.... 80%% of our users? ....
Result: Accepted, no warnings, all placeholders as expected.

There was also an issue, fixed through a different commit, [10955] where a placeholder wouldn't be picked up as being incorrect if escaped.
For example:
Original: Over in %s we code.
Translation: Over in %%s we code.
Previous Result: No warnings, %s exists within string, all is okay.
Expected Result: Warning, %s placeholder not present.

Or:
Original: ... 100%% ...
Translation: ... 100% ...
Previous Result: No warnings, %% is not a placeholder to check for.
Expected Result: Warning, %% placeholder not present. (But the translation fixer will kick in and fix this, see next example)

Or:
Original: ... <a href="%s>100 percent</a> ...
Translation: ... <a href="%s>100 %</a> ...
Previous Result: No warnings, %s placeholder exists. String will fail though since %< will be picked up by printf().
Expected Result: Warning, % not escaped.
Fixer kicks in and converts the translation to ... <a href="%s>100 %%</a> ...
Final Result: String is accepted, no warnings.

Last edited 7 months ago by dd32 (previous) (diff)

#20 in reply to: ↑ 19 @dd32
7 months ago

Replying to dd32:

Original: In March 2004, the <a href="%1$s">.... 80%% of our users? ....
Submitted: In March 2004, the <a href="%1$s">.... 80% of our users? ....

The translation fixer kicked in and transformed it like so:
Submitted: In March 2004, the <a href="%1$s">.... 80% of our users? ....
Whitespace fix: In March 2004, the <a href="%1$s">.... 80%of our users? ....
Percent fix: No change, doesn't apply as %o is a valid printf-style placeholder (which we shouldn't touch)
Result: Still has warnings because %% isn't present. Translation fix failed, Translation accepted with warnings about improper placeholders.

Just to clarify even further, if after the fixer runs, there's any warnings (either existing or new ones) it doesn't touch the string, it bails, and whatever was submitted gets used. The only time the fixer makes a change, is it making the change 100% resolves the warning and doesn't produce new warnings.
So for example, If after it runs it now has a %o where it didn't previously, that's a new warning so it'll abort and not change anything.

A better example is maybe the strings with leading/trailing whitespace

Original: Blue⏎
Submitted: Blauw
Fixer kicks in and updates it to: Blauw⏎ suffixing the trailing whitespace that the original string had, The trailing whitespace warning is no longer present and it didn't generate any new warnings so it saves it as Blauw⏎.

Note: See TracTickets for help on using tickets.