#5154 closed defect (bug) (fixed)
Automatically fix some translation errors
Reported by: | dd32 | Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Component: | Translate Site & Plugins | Keywords: | |
Cc: |
Description
Split off from #5152
There are some common translation warnings which can be automatically corrected, simplifying the translation process by reducing the amount of time a translator (and editor) have to spend checking translations.
The two main ones I see right now that can be automatically fixed are:
- Newlines, either prefixed or suffixed to originals/translations
- Unicode percent signs being used rather than ASCII percent signs in placeholders
Are there any others? I'm hesitant to fix mangled HTML tags, as although most are usually just an extra space around a <
or </
it's a good sign of a machine translation that's usually not perfect.
Change History (20)
#2
@
4 years ago
I agree that automatically fixing signs of machine translation should be avoided.
As a vector for the future, perhaps some automatic fixes could be offered for specific locales, but that would probably become a project on its own.
This ticket was mentioned in Slack in #polyglots by casiepa. View the logs.
4 years ago
#4
@
4 years ago
Looking at experience with other tools like Transifex, PoEdit, Lokalize and I think Crodin I think that is better to suggestions that fix automatically for you.
Like an opt-in way for any sentence.
Example: the sentence is missing a final dot, would you want to add it? Yes/No and automatically does for you.
This can be helpful for sanitization and let to create custom warnings for specific locales like to use different unicode symbols and so on.
#5
@
4 years ago
I agree that not everything should be fixed automatically, this was a case where the fixes were obvious and "always" right and has significantly reduced the amount of warnings (IMHO)
Some warnings/fixes could be applied prior to the submission of the string, highlighting missing placeholders or tags prior to submission, etc. and offering automatic fixes there would make sense for things like HTML tags or highlighting the original/translation additions/deletions.
This ticket was mentioned in Slack in #polyglots by nao. View the logs.
4 years ago
#9
@
4 years ago
In monitoring the #polyglots-warnings channel, the only other things I've seen that would be reasonable to autocorrect are:
- non-ASCII
$
in printf placeholders, there's a few various other unicode variants of the dollar sign - non-ASCII characters used in printf placeholders, such as a unicode S variant
Those seem to happen very rarely, so I'm going to skip adding anything for those and close this ticket as fixed for now.
There's another ticket to add some JS-based warnings pre-submit as well, which will hopefully remove the need for this in the first place and/or support auto-fixing some warnings.
If the warning logging that will hopefully be added as part of #5152 reveals anything major, we can re-open or create a new ticket.
#16
@
4 years ago
Follow up to remove some warnings: #5621
Another auto-correct could be, this is both to speed up translators in-the-know, but also as it seems to be a common issue looking through the generated warnings.
- If no printf placeholders are present in translation, exist in the original, and
%
(space inclusive) is contained within the translation, replace them in order. ie.% ba %
would end up as%s to %s
if that's what the original was.
Leaving this as closed, just noting the idea.
#18
follow-up:
↓ 19
@
3 years ago
This fixes it attempting to correct 100% over to 100%over,...
In Swedish, we always put a space between a number and a following percent sign or unit. I.e. 110V, 25mm, and 50% we change to 110 V, 25 mm and 50 %.
#19
in reply to:
↑ 18
;
follow-up:
↓ 20
@
3 years ago
Replying to tobifjellner:
This fixes it attempting to correct 100% over to 100%over,...
In Swedish, we always put a space between a number and a following percent sign or unit. I.e. 110V, 25mm, and 50% we change to 110 V, 25 mm and 50 %.
Yeah that's fine, that won't be affected by this at all. The descriptions for these fixes are a bit.. not great and I apologise for that :)
I also need to write some unit tests, so that you can see what exactly is happening here.. so that errors don't happen too..
This was an issue where the auto-correction was improper.
Take this translation for example: https://translate.wordpress.org/projects/meta/wordpress-org/en-gb/default/?filters%5Bstatus%5D=either&filters%5Boriginal_id%5D=11765836&filters%5Btranslation_id%5D=83944362
Original: In March 2004, the <a href="%1$s">.... 80%% of our users? ....
Submitted: In March 2004, the <a href="%1$s">.... 80% of our users? ....
The translation fixer kicked in and transformed it like so:
Submitted: In March 2004, the <a href="%1$s">.... 80% of our users? ....
Whitespace fix: In March 2004, the <a href="%1$s">.... 80%of our users? ....
Percent fix: No change, doesn't apply as %o
is a valid printf-style placeholder (which we shouldn't touch)
Result: Still has warnings because %%
isn't present. Translation fix failed, Translation accepted with warnings about improper placeholders.
What should've happened, and will after this:
Submitted: In March 2004, the <a href="%1$s">.... 80% of our users? ....
Whitespace fix: No change, doesn't apply
Percent fix: In March 2004, the <a href="%1$s">.... 80%% of our users? ....
Result: No warnings, all placeholders exist, no stray placeholders, translation is accepted.
The whitespace fix would look something like this:
Submitted: In March 2004, the <a href="% 1 $ s">.... 80% of our users? ....
(automated translation tools, etc)
Whitespace fix: In March 2004, the <a href="%1$s">.... 80% of our users? ....
Percent fix: In March 2004, the <a href="%1$s">.... 80%% of our users? ....
Result: Accepted, no warnings, all placeholders as expected.
There was also an issue, fixed through a different commit, [10955] where a placeholder wouldn't be picked up as being incorrect if escaped.
For example:
Original: Over in %s we code.
Translation: Over in %%s we code.
Previous Result: No warnings, %s
exists within string, all is okay.
Expected Result: Warning, %s
placeholder not present.
Or:
Original: ... 100%% ...
Translation: ... 100% ...
Previous Result: No warnings, %% is not a placeholder to check for.
Expected Result: Warning, %%
placeholder not present. (But the translation fixer will kick in and fix this, see next example)
Or:
Original: ... <a href="%s>100 percent</a> ...
Translation: ... <a href="%s>100 %</a> ...
Previous Result: No warnings, %s placeholder exists. String will fail though since %<
will be picked up by printf()
.
Expected Result: Warning, %
not escaped.
Fixer kicks in and converts the translation to ... <a href="%s>100 %%</a> ...
Final Result: String is accepted, no warnings.
#20
in reply to:
↑ 19
@
3 years ago
Replying to dd32:
Original:
In March 2004, the <a href="%1$s">.... 80%% of our users? ....
Submitted:In March 2004, the <a href="%1$s">.... 80% of our users? ....
The translation fixer kicked in and transformed it like so:
Submitted:In March 2004, the <a href="%1$s">.... 80% of our users? ....
Whitespace fix:In March 2004, the <a href="%1$s">.... 80%of our users? ....
Percent fix: No change, doesn't apply as%o
is a valid printf-style placeholder (which we shouldn't touch)
Result: Still has warnings because%%
isn't present. Translation fix failed, Translation accepted with warnings about improper placeholders.
Just to clarify even further, if after the fixer runs, there's any warnings (either existing or new ones) it doesn't touch the string, it bails, and whatever was submitted gets used. The only time the fixer makes a change, is it making the change 100% resolves the warning and doesn't produce new warnings.
So for example, If after it runs it now has a %o
where it didn't previously, that's a new warning so it'll abort and not change anything.
A better example is maybe the strings with leading/trailing whitespace
Original: Blue⏎
Submitted: Blauw
Fixer kicks in and updates it to: Blauw⏎
suffixing the trailing whitespace that the original string had, The trailing whitespace warning is no longer present and it didn't generate any new warnings so it saves it as Blauw⏎
.
In 9741: