WordPress.org

Making WordPress.org

Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#2280 closed defect (fixed)

Broken encoding in some posts on support forums

Reported by: SergeyBiryukov Owned by: pento
Milestone: Priority: high
Component: Support Forums Keywords:
Cc:

Description

Since yesterday (November 28th, Monday), some posts on support forums are occasionally displayed in wrong encoding (Windows-1252 instead of UTF-8), which leads to broken Cyrillic or Japanese characters.

A workaround is to click Edit and re-save the affected post, which makes it readable. One user reported that email notification for the post was readable as well, only the post itself in the forums thread had wrong encoding.

First reported in #meta by @rmccue, regarding a post on make/core:

rmccue: https://make.wordpress.org/core/2016/11/25/preferred-languages-research/ has an encoding bug
rmccue uploaded an image: Screenshot 2016-11-28 11.47.38.png 

dd32: Calling @pento is ^ related to your HyperDB change?

rmccue: Also seeing the classic ’ -> ’

pento: Ugh, possibly. Google’s cached version from yesterday isn’t showing it, so I assume it’s related.

Then by @takahashi_fumiki, regarding a thread on support forums:

takahashi_fumiki: Every multibyte characters were broken on Forum. 
https://wordpress.org/support/topic/proxy-error%e3%81%8c%e8%a1%a8%e7%a4%ba%e3%81%95%e3%82%8c%e3%82%8b/#post-8486994
But now it seemed fixed. Ping for just confirmation.

pento: Thanks @takahashi_fumiki, we were using the wrong character set for a little bit, it’s fixed now.
Please let us know if you see any other problems. :)

At this point, the issue was supposedly fixed, but our moderators still encountered it:

yui: https://ru.wordpress.org/support/topic/%d0%be%d1%88%d0%b8%d0%b1%d0%ba%d0%b8-%d0%bf%d1%80%d0%b8-%d0%b8%d0%bc%d0%bf%d0%be%d1%80%d1%82%d0%b5-xml-%d0%bd%d0%b5-%d1%83%d0%b4%d0%b0%d0%bb%d0%be%d1%81%d1%8c-%d0%b8%d0%bc%d0%bf%d0%be%d1%80%d1%82/#post-228350
cyrillic chars are broken here
It seems to be in CP 1252 instead of UTF8
> we were using the wrong character set for a little bit
message in topic is fresh

A bit later, I encountered it myself:

sergey: @pento @dd32 The encoding issue still happens on https://ru.wordpress.org/support/,
Cyrillic chars are randomly broken in some posts. Just happened again with this reply (1 min ago):
https://ru.wordpress.org/support/topic/не-работает-встроенная-галлерея/#post-228395
> Ñ ÐºÐ°ÐºÐ¾Ð¹ верÑии было обновление? в админку заходили?
> Обновление базы данных предлагалоÑÑŒ (Сделали?) ?
Clicking Edit and re-saving appears to have fixed it.
Could it be that the fix has not propagated to all servers yet? (edited)

pento: @sergey: I’ve re-deployed everything, just to check.
I’m off to bed, but if there are still problems, get someone to roll it back and I’ll try again some other time.

sergey: @pento: Thanks, will keep an eye on the forums :)

Still happens 12 hours later, see the latest screenshots.

I don't see any related commits on https://meta.trac.wordpress.org/log/, so apparently it was a dotorg-specific change and it has to do with HyperDB (per one of the first messages).

Could someone track it down or roll it back pending further investigation?

Attachments (3)

Screenshot 2016-11-28 11.47.38.png (66.3 KB) - added by SergeyBiryukov 3 years ago.
Ryan's screenshot
meta-2280.png (56.6 KB) - added by SergeyBiryukov 3 years ago.
meta-2280.2.png (26.8 KB) - added by SergeyBiryukov 3 years ago.

Download all attachments as: .zip

Change History (18)

@SergeyBiryukov
3 years ago

Ryan's screenshot

#1 @SergeyBiryukov
3 years ago

meta-2280.png is one of the affected posts, and meta-2280.2.png is the same post in editing mode.

(The post since then has been edited to make it readable for the user I was replying to.)

Last edited 3 years ago by SergeyBiryukov (previous) (diff)

#2 @dd32
3 years ago

  • Owner set to pento
  • Status changed from new to assigned

@pento is offline on planes at present, but I'm going to leave this for him. The change in question was an update to HyperDB, reverting it isn't really something I want to do at this point for fear of breaking something else..

As it's not happening consistently, I'm guessing this has something to do with whether the DB is connected by the post-write action or something else prior to it (possibly to a different DB entirely?), although I'm not sure why that should matter..

If I had to guess, I'd suspect this is caused by _real_escape() when another DB is connected pre-write.

This ticket was mentioned in Slack in #forums by pixolin. View the logs.


3 years ago

#4 @pixolin
3 years ago

Greetings from Germany. 👋
Just to confirm: We currently experience the same issue in our German support forums https://de.wordpress.org/support: Umlaut (Ää, Öö, Üü) and other special chars (ß, €, …) are displayed with wrong encoding. With my superpowers (ahem) as moderator I currently fix posts by editing them in the backend.

Last edited 3 years ago by pixolin (previous) (diff)

#5 @SergeyBiryukov
3 years ago

https://plugins.trac.wordpress.org/changeset/1410864/hyperdb seems related:

Various improvements:

  • Add MySQLi support with MySQL fallback
  • Handle and retry MySQL timeouts
  • Add support for track server states
  • Add callbacks for: pre/post connect queries, pre/post query statements, adding query comments
  • Better integration with the latest core wp-db features

#6 in reply to: ↑ description ; follow-up: @SergeyBiryukov
3 years ago

Replying to SergeyBiryukov:

A workaround is to click Edit and re-save the affected post, which makes it readable.

Just noticed that this only works when using the front-end editing form. When trying to edit an affected reply in the admin, the encoding is still wrong.

#7 follow-up: @SergeyBiryukov
3 years ago

Here's a fresh unedited reply (archived, requires admin or moderator access to view):
https://ru.wordpress.org/support/topic/не-могу-русифицировать-тему/?view=all#post-228646

#8 @pento
3 years ago

I've been doing some more digging on this - the posts are all being saved correctly to the database, but occasionally, a corrupt version is being saved to Memcache.

Given the random and occasional nature of the the corrupt version appearing, I'm thinking that it's likely a race condition somewhere. Tracking down the cause will be... tricky. :-)

(This also explains why editing from the front end works, but from wp-admin doesn't - the former retrieves the post from the DB, the latter retrieves it from the cache.)

#10 in reply to: ↑ 6 @fierevere
3 years ago

Replying to SergeyBiryukov:

Just noticed that this only works when using the front-end editing form. When trying to edit an affected reply in the admin, the encoding is still wrong.

https://ru.wordpress.org/support/reply/228912/edit/
front-end editing archived reply does not fix the encoding

#11 follow-up: @pento
3 years ago

Okay, I think I've tracked this down. Notifications change the character set, but sometimes it's possible that it won't be changed back.

I've pushed out a fix for this, please let me know if any new posts or comments occur with this bug.

#12 @fierevere
3 years ago

Yes, seems to be fixed, havent seen any badly encoded messages today

#13 @pixolin
3 years ago

Works on our German forums, too. Thank you.

#14 @pento
3 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

Thanks for the confirmation (and for your patience while we worked through this). :-)

I'll close this ticket as fixed, feel free to re-open, or create a new ticket, if you see any new instances of this problem crop up.

#15 in reply to: ↑ 11 @SergeyBiryukov
3 years ago

Replying to pento:

Okay, I think I've tracked this down. Notifications change the character set, but sometimes it's possible that it won't be changed back.

Thank you! Any chance to fix #354 while we're at it? :) I still get notifications for ??????, including this topic.

Note: See TracTickets for help on using tickets.