Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing Czech Republic addresses #83

Closed
sch3399 opened this issue Mar 6, 2020 · 4 comments · Fixed by #88
Closed

Parsing Czech Republic addresses #83

sch3399 opened this issue Mar 6, 2020 · 4 comments · Fixed by #88

Comments

@sch3399
Copy link

sch3399 commented Mar 6, 2020

Hi team,
I have successfully installed pelias, but I have a problem with the autocomplete.
The query [street,city] /autocomplete/?text=Nerudova 20,Praha returns the correct result.
image

But the rotated query [city,street] /autocomplete/?text=Praha,Nerudova 20 does not return any result and parser pelias creates bad query decomposition
image

Is it possible to modify the configuration and get the same result as in the first case?

Search /search?text=Praha,Nerudova 20 without autocomplete the result is correct in both cases, but the parser is not pelies, but libpostal
image

Thank you for the advice

@missinglink
Copy link
Member

missinglink commented Mar 6, 2020

Unfortunately not @sch3399, this is a difficult address to parse because it has a few uncommon conventions:

  • The street name Nerudova has no prefix/suffix, if it ended with Ave or began with Rue de then it would be easier to parse.
  • Most parsers assume that the segments are specified left-to-right in decreasing granularity from address -> city -> country.

You'll notice that the libpostal result interprets Praha as a 'query', meaning that it thinks it's a venue name or similar, not a region.
Libpostal does, however do a better job at detecting that Nerudova is the street than the pelias native parser.

I'll move this issue to the pelias/parser repo as someone might be able to tackle this issue over there.

Some more info from you would be helpful:

  • How common are these street names with no prefix/suffix in the Czech Republic?
  • Is it a common convention for the people of Czech Republic to write their address with the city name first?
  • Please provide one or two examples for developers outside Czech Republic

@missinglink missinglink transferred this issue from pelias/pelias Mar 6, 2020
@missinglink missinglink changed the title Different analysis text Parsing Czech Republic addresses Mar 6, 2020
@sch3399
Copy link
Author

sch3399 commented Mar 6, 2020

  1. The vast majority of streets in the Czech Republic have a one-word name without a prefix / suffix.
    Korunní 810, Praha
    Kájovská 68, Český Krumlov
    Beethovenova 641/9, Brno

  2. I don't know the likelihood of a reverse search [city, street], but it's not unusual.

  3. Other neighboring states:
    Divadelná 41/3, Trnava (Slovakia)
    Szewska 6, Kraków (Poland)
    Zadarska 17, Pula (Croatia)

Thank you very much for a possible solution

@missinglink
Copy link
Member

missinglink commented Apr 17, 2020

@sch3399 I had a look at this today and I was able to get the parser working for the cases you provided.
If possible, could you please provide some more test cases?

see #88

@sch3399
Copy link
Author

sch3399 commented Apr 22, 2020

@missinglink I am sending other test cases for the Czech Republic

Ostrava, U Koupaliště 1570/10
Hradec Králové, Karla Čapka 694/5
Kolín, Pražská 3
Neratovice, Jungmannova 676
Králíky, Bedřicha Smetany 561
Prachatice, Dlouhá 93
Ronov nad Doubravou, Nábřežní 180
Brno, Orlí 517/22
Nový Jičín, Dvořákova 713/11
Praha, V Šáreckém údolí 53/27
Praha, Nad Panenskou 164/4
Rožmitál pod Třemšínem, Kpt. Jaroše 403
Klatovy, Jiráskova 15
Frýdek-Místek, Radniční 1244
Zlín, Rašínova 70

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants