#5562 closed defect (bug) (fixed)
Events: Exact city matches fail because of state abbreviations
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | Priority: | normal | |
Component: | Events API | Keywords: | |
Cc: |
Description (last modified by )
These two searches (and probably many others) return the "wrong" Chicago.
Instead of Chicago, Illinois (Geonames ID 4887398) they return cities with an alternate name of Chicago, with a much smaller population. It may even match some buildings with population 0
that are classified as `P.PPL`?
- https://api.wordpress.org/events/1.0/?number=5&ip=127.0.0.0&locale=en_US&timezone=America/Los_Angeles&location=chicago - returns Craigmont, Idaho ( Geonames ID 5590063 )
- https://api.wordpress.org/events/1.0/?number=5&ip=127.0.0.0&locale=en_US&timezone=America/Chicago&location=chicago - returns
Chicago and Northwestern Railroad Station
in Kenosha, Wisconsin Geonames ID 5248432
I've added some PHPUnit cases in r10553 (xref r10544 and r16810-dotorg for info on the new test suite).
It looks like the problem may be related to how cities are now being imported into geonames_summary.name
as Chicago, IL
instead of the canonical Chicago
in geonames.name
. So, they fail the exact-match search, and the fuzzy search turns up the less-desirable result.
Possibly related to #3728, #5117, r16728-dotorg
I'll look into it more tomorrow, but wanted to document what I've found so far. CC @dd32
Change History (11)
#3
@
4 years ago
It looks like the problem may be related to how cities are now being imported into
geonames_summary.name
asChicago, IL
instead of the canonicalChicago
ingeonames.name
. So, they fail the exact-match search, and the fuzzy search turns up the less-desirable result.
As per r16728-dotorg the Chicago, IL
entry should be added in addition to the canonical Chicago
one. I'll review the queries and results today though to triple check that it's working how I intended it to..
#5
in reply to:
↑ description
@
4 years ago
Apologies for the rambling comment @iandunn.. I'm just working through it and noting my thought process for myself..
After looking into this, there's several things happening:
- There was a data error, in that some of the source data had invalid UTF8 data which caused only partial Chicago data to be imported.
I've re-imported the data after fixing the issues (not all committed yet) which results in a more complete data set, but still 'invalid' responses.
Replying to iandunn:
- This looks correct with the current logic. Yes, hang on, let me explain..
- Timezone trumps population, #3367 comes into play here.
- Alternate names are treated as the same level of priority as "preferred" names
I don't think timezone should be a trump here in every case, but rather only when it's a preferred name (ie. Portland, OR / Portland, ME).
The Chicago examples has come up due to the greater number of cities after #3728, it looks like most of the chicago entries are probably alternate names (There's Chicago's in every US timezone, just like there's a Portland in each timezone).
As part of that, I've rebuilt the table again, this time marking alternate names as alts, so that preferred names can be prioritised.. Unfortunately I've done something wrong, and the alt
field didn't get filled correctly, but from what I can tell, it'll fix these two examples and a few others I've been able to come up with.
I'll loop back to this tomorrow, but if you want to Ian, you can run queries against the geoname_summary_20210106082158
table to test with. Adding ORDER BY alt ASC
to the start of the order by's seems to do the trick (Test failures such as Dona ana
are due to the alt flag failure I mentioned above)
I've struggled to test it with PHPUnit, as it keeps running out of memory and requires PHPUnit8 by looks (Not 6, 7, or 9 as I had installed)..
#6
@
4 years ago
Ah, yes, complexity abounds :) Thanks for digging into it!
Weighting the preferred name sounds like a good approach. I'll play around a bit with that too, and see if I can add some more tests around the various situations.
Although, this might be a reason to revisit the idea of switching to an external API to handle this kind of thing. I remember Josepha and I looked into Google's Geocoding API after the WP 4.8 release; I'll try to find the details of where that left off.
Another idea: Instead of the API trying and pick the perfect match, maybe it should just return all the likely ones, and Core could give the user a dropdown to pick the right one. Google's Places Autocomplete API is designed for that (h/t https://developers.google.com/maps/documentation/geocoding/best-practices).
---
I initially had problems getting PHPUnit working on my sandbox, but it's working well for me now, using v9.5. If you're running into the same problems I was, disabling xdebug should fix it. Check out the new readme file in the API root if you haven't already, it has some details.
#7
@
4 years ago
--process-isolation
can also help for PHPUnit, but I've only found it necessary when using the watch task.
#8
@
4 years ago
- Owner set to dd32
- Resolution set to fixed
- Status changed from new to closed
In 10554:
#10
follow-up:
↓ 11
@
4 years ago
Kudos for the fix :)
I found one problem with the new ranking, but fixed it in r10557.
What's the reason to import entries with population = 0
?
I opened a #core52249 for the idea about showing users multiple potential locations.
#11
in reply to:
↑ 10
@
4 years ago
Replying to iandunn:
What's the reason to import entries with
population = 0
?
Quite simply, the population figures aren't available for every location, they're available for most locations, but not all.
For reference, only 10% of the current non-alternate name records have a population of greater than 0.
Previous to #3728 it looks like it was in the range of 75% had a population count > 0.
In 10553: