Opened 6 years ago
Closed 4 years ago
#3728 closed defect (bug) (fixed)
Some city names shown incorrectly in Events API
Reported by: | Presskopp | Owned by: | dd32 |
---|---|---|---|
Milestone: | Priority: | normal | |
Component: | Events API | Keywords: | |
Cc: |
Description
Hi,
I just found that entering the german city of Gießen shows the events correctly, but not the name of the city, it will be shown as Giesen
Taking it a step further with the city of Bad Gottleuba-Berggießhübel it will only show Bad, which is bad ;)
After talking to @obenland he said it will be recognized by the API as Bad Gottleuba-Berggießhübel
I tested some more cities having the ß character and such, but they were ok so far.
Attachments (2)
Change History (21)
This ticket was mentioned in Slack in #meta by tellyworth. View the logs.
6 years ago
#5
@
6 years ago
I found another glitch using 5.1-beta3-44723, local installation:
Entering Hamburg I get results for Hambûrg!
#7
@
6 years ago
IIRC this is mostly expected as we use the "primary" name for a city in the response (which is mostly English-centric AFAIK) but many lookups of non-ascii cities actually match in the "alternative names" which results in the returned data differing from the searched data.
To fix this I believe we'd want to change the data-source it's looking up against.
This ticket was mentioned in Slack in #meta by tobifjellner. View the logs.
6 years ago
#10
@
5 years ago
- Summary changed from Showing community events is not working for some cities to Some city names shown incorrectly in Events API
#11
@
4 years ago
- Resolution set to fixed
- Status changed from assigned to closed
This was fixed via #5117
#12
@
4 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
The issue is still present
Gießen
-> Giesen
Bad Gottleuba-Berggießhübel
-> Bad
#13
follow-up:
↓ 14
@
4 years ago
It looks like this is no longer an encoding issue, but a data-source issue.
Gießen -> Giesen
The data-source lists both of those as separate cities, Giesen having a much larger population and an alternate-name (not primary name) of Gießen. The co-ordinates are also slightly different (but quite obviously the same place) so it's really just an issue that the data had two cities that should've been combined.
Bad Gottleuba-Berggießhübel -> Bad
That city doesn't exist in the data-set, nor does a city that starts with that, which is why it gets truncated back to Bad
which is an alt-name for Badou, Togo
.
I think updating the source-data might help here, I don't think it's been updated in about 2 years.
We don't use a proper GIS here, just a very bland lookup table that has ~750k city names for ~150k unique cities around the world, it 100% doesn't match every city the world has to offer (let alone even 80% of them) but it does include some obscure places such as Grytviken
which has between 2 and 30 people living there depending on the time of year.
#14
in reply to:
↑ 13
;
follow-up:
↓ 17
@
4 years ago
Replying to dd32:
I think updating the source-data might help here
Gießen -> Giesen
Looks like that's been corrected in the new dataset (which hasn't been imported yet)
Bad Gottleuba-Berggießhübel -> Bad
That doesn't exist in the smaller dataset we're using, but does exist in the full data-set.
The dataset we're using is from https://www.geonames.org/, The smaller set is ~30M and has now increased from 150k to 200k cities, where as the larger set of ~1.5GB and covers 12million city names (plus a 6-10 alt-names per city on average).
It might be worth us running a combined dataset - The smaller set, plus the official city names from the larger dataset (excluding the altnames) which would probably end up with a ~250M dataset
#17
in reply to:
↑ 14
@
4 years ago
- Keywords needs-patch removed
- Owner changed from dryanpress to dd32
- Status changed from reopened to assigned
Replying to dd32:
Replying to dd32:
I think updating the source-data might help here
Gießen -> Giesen
Bad Gottleuba-Berggießhübel -> Bad
These both now work after updating the dataset:
- https://api.wordpress.org/events/1.0/?location=Bad%20Gottleuba-Berggie%C3%9Fh%C3%BCbel
- https://api.wordpress.org/events/1.0/?location=Gie%C3%9Fen
I've migrated it from the limited dataset to the full 12million cities dataset, just so there's less to update in the future.
In doing that, it's introduced some "bugs" in what previously worked, but no longer does, for example, Australia
previously matched the country, where as now it matches a region in mexico. But it does also now know where my Secret Rocket Yards are.
I'm going to leave this open to see if any reports come in of breakage from the update.
I'm happy to look at this next week. Any other major special characters to account for @obenland?