Opened 6 years ago

Closed 3 years ago

#3367 closed defect (bug) (fixed)

Events API: Improve city disambiguation

Reported by: iandunn's profile iandunn Owned by:
Milestone: Priority: normal
Component: Events API Keywords: needs-patch needs-unit-tests


Moved from #wp42787, reported by @shedonist:

The Events & News widget in the admin dashboard only accepts a city name and ignores when you add a state. This means it picks a state for you and it may not be the state you want. This makes it extremely difficult to locate meetups in cities where different cities with the same name both have meetups.

For example, I run a meetup in Portsmouth, NH. When I search for "Portsmouth, NH" or "Portsmouth" the widget is showing results near Portsmouth, VA. When I search for "Boston", the Portsmouth, NH does show (so it is in there!). Similarly, if you search for "Portland, OR" or "Portland" you only see the meetups near Portland, ME (Portsmouth, NH and Westbrook, ME).

When I search for "Portsmouth, NH", I get this:
[06-Dec-2017 23:00:12 UTC] debug_community_events_response: Valid response received. Details: {"request_url":"https:\/\/\/events\/1.0\/","request_args":{"number":5,"ip":"","locale":"en_US","timezone":"America\/New_York","location":"Portsmouth, NH"},"response_code":200,"response_body":{"location":{"description":"Portsmouth","latitude":"36.8354300","longitude":"-76.2982700","country":"US"},"events":"4 events trimmed."}}

If I search for "Portsmouth", what I get seems to be identical except for the location field:
[06-Dec-2017 23:00:25 UTC] debug_community_events_response: Valid response received. Details: {"request_url":"https:\/\/\/events\/1.0\/","request_args":{"number":5,"ip":"","locale":"en_US","timezone":"America\/New_York","location":"Portsmouth"},"response_code":200,"response_body":{"location":{"description":"Portsmouth","latitude":"36.8354300","longitude":"-76.2982700","country":"US"},"events":"4 events trimmed."}}

If I search for "Portland, OR", I get this:
[06-Dec-2017 23:01:35 UTC] debug_community_events_response: Valid response received. Details: {"request_url":"https:\/\/\/events\/1.0\/","request_args":{"number":5,"ip":"","locale":"en_US","timezone":"America\/New_York","location":"Portland, OR"},"response_code":200,"response_body":{"location":{"description":"Portland","latitude":"43.6614700","longitude":"-70.2553300","country":"US"},"events":"5 events trimmed."}}

This is related to #2823, it may be one of the things that's just too complex to accomplish in a homegrown API, and we should instead just focus on the move to Google's. But it's worth some investigating first to find out.

This will need lots of unit tests to cover different variations in formats, etc.

Change History (4)

#1 @dd32
5 years ago

  • Component changed from API to Events API

#2 @dd32
3 years ago

#3728 has increased the number of cities in the database, as a result it's much more likely to have conflicts here.

One option, that will probably work rather well for the USA (USA is the primary country that this really affects I think?) is to add City, STATE to the lookup table as well.

For both Portsmouth and Portland, there's a dozen or so states which have a place called that throughout the USA.

Adding City, State will probably work well in those cases, as long as we also limit it to populations greater than say 1,000 to avoid the places that aren't necessarily required. That should reduce the number of extra entries needed to be added to the database.

In most cases, just searching for "Portland" would return the right result if the timezone is set correctly, and if searching from Australia or UK it'll return the local Portland instead.

You can view the various cities in the database by looking at Portland and Portsmouth

#3 @dd32
3 years ago

In 10475:

Events API: Don't strip commas when performing location lookups. This allows for disambiguation to occur better by treating commas as separators.

See #3367.

#4 @dd32
3 years ago

  • Resolution set to fixed
  • Status changed from new to closed

Decided to go ahead and add the City, State entries for the USA since I'm already looking at the code.

Added ~15k extra entries, you can see the difference with these two queries: Portsmouth, VA, Portsmouth, NH (Removing the state code may just return Portsmouth, GB if your IP isn't USA/you don't specify a timezone)

Searching for just Portland with a America/Los_Angeles or America/New_York timezone will return the East (OR) or West (ME) Portland.

There's no output to help with the disambiguation in those cases though, in other words, you won't know you've got the wrong Portland, as it'll just show "Portland". That's the case due to the data-source not having the structured address details for every country, and that in many cases the search term/result is not actually the "proper" name for the location in the data.

Added via r16728-dotorg plus the above API change to not remove the comma.

I haven't seen many other reports of this problem outside of the USA, so I limited it to there. If in the future we have other cases that require further disambiguation, that can be dealt with then.

Note: See TracTickets for help on using tickets.