Making WordPress.org

Opened 12 months ago

Closed 11 months ago

Last modified 8 weeks ago

#5154 closed defect (fixed)

Automatically fix some translation errors

Reported by: dd32 Owned by:
Milestone: Priority: normal
Component: Translate Site & Plugins Keywords:


Split off from #5152

There are some common translation warnings which can be automatically corrected, simplifying the translation process by reducing the amount of time a translator (and editor) have to spend checking translations.

The two main ones I see right now that can be automatically fixed are:

  • Newlines, either prefixed or suffixed to originals/translations
  • Unicode percent signs being used rather than ASCII percent signs in placeholders

Are there any others? I'm hesitant to fix mangled HTML tags, as although most are usually just an extra space around a < or </ it's a good sign of a machine translation that's usually not perfect.

Change History (16)

#1 @dd32
12 months ago

In 9741:

Translate: Add a plugin to automatically fix some common translation errors.

If it can't fix all of a translations errors, it doesn't alter the submitted translation.

See #5154.

#2 @tobifjellner
12 months ago

I agree that automatically fixing signs of machine translation should be avoided.
As a vector for the future, perhaps some automatic fixes could be offered for specific locales, but that would probably become a project on its own.

This ticket was mentioned in Slack in #polyglots by casiepa. View the logs.

12 months ago

#4 @Mte90
12 months ago

Looking at experience with other tools like Transifex, PoEdit, Lokalize and I think Crodin I think that is better to suggestions that fix automatically for you.
Like an opt-in way for any sentence.
Example: the sentence is missing a final dot, would you want to add it? Yes/No and automatically does for you.
This can be helpful for sanitization and let to create custom warnings for specific locales like to use different unicode symbols and so on.

#5 @dd32
12 months ago

I agree that not everything should be fixed automatically, this was a case where the fixes were obvious and "always" right and has significantly reduced the amount of warnings (IMHO)

Some warnings/fixes could be applied prior to the submission of the string, highlighting missing placeholders or tags prior to submission, etc. and offering automatic fixes there would make sense for things like HTML tags or highlighting the original/translation additions/deletions.

This ticket was mentioned in Slack in #polyglots by nao. View the logs.

12 months ago

#7 @dd32
12 months ago

In 9766:

Translate: The 'warnings' property isn't always set, since a translation might be being updated by the importer which does it in stages.

Amends r9741.
See #5154.

#8 @dd32
12 months ago

In 9801:

Translate: Sometimes the warnings key is set to a non-empty value that isn't an array.

See #5154.

#9 @dd32
11 months ago

In monitoring the #polyglots-warnings channel, the only other things I've seen that would be reasonable to autocorrect are:

  • non-ASCII $ in printf placeholders, there's a few various other unicode variants of the dollar sign
  • non-ASCII characters used in printf placeholders, such as a unicode S variant

Those seem to happen very rarely, so I'm going to skip adding anything for those and close this ticket as fixed for now.

There's another ticket to add some JS-based warnings pre-submit as well, which will hopefully remove the need for this in the first place and/or support auto-fixing some warnings.

If the warning logging that will hopefully be added as part of #5152 reveals anything major, we can re-open or create a new ticket.

#10 @dd32
11 months ago

  • Resolution set to fixed
  • Status changed from new to closed

#11 @ocean90
11 months ago

In 9889:

Translate: Add missing static keyword to avoid a deprecation notice.

See #5154.

#12 @dd32
8 weeks ago

In 10683:

Translate: Auto-correct spaced placeholders, common with machine translations.

This replaces % 1 $ s with %1$s in translations if it corrects the translation.

Machine translations aren't always wanted, but in many cases simply re-submitting the translation with the fixed placeholder is all that's done, this just reduces the amount of warnings generated, allowing translators to focus on the language content of the string.

See #5563, #5154.

#13 @dd32
8 weeks ago

In 10684:

Translate: Auto-correct curly quotes when HTML tags mismatch, catches cases where attributes are incorrectly curled.

See #5154.

#14 @dd32
8 weeks ago

In 10685:

Translate: Automatically correct strings which contain extra spaces within HTML tags.

This fixes strings such as </ p>, <a href="%s" >, and < / strong >. <a href="#" >
This does not affect spaces between attributes, such as <a href="#" target="_blank"> due to the limited cases recorded of that being an issue.

See #5154.

#15 @dd32
8 weeks ago

In 10686:

Translate: Automatically correct the case of sprintf format placeholders to their correct case.

This only applies to a sub-set of placeholders which can only be lower-case %[bcdosu], not those which can be both cases but mean different things: %[EFGX]
The most common %s and %d placeholders are covered by this, so %S / %D will be corrected.

See #5154.

#16 @dd32
8 weeks ago

Follow up to remove some warnings: #5621

Another auto-correct could be, this is both to speed up translators in-the-know, but also as it seems to be a common issue looking through the generated warnings.

  • If no printf placeholders are present in translation, exist in the original, and % (space inclusive) is contained within the translation, replace them in order. ie. % ba % would end up as %s to %s if that's what the original was.

Leaving this as closed, just noting the idea.

Note: See TracTickets for help on using tickets.