Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Right single quotation mark in node name causes it to be unsearchable in autocomplete #169

Open
BrindusaN opened this issue Sep 14, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@BrindusaN
Copy link

BrindusaN commented Sep 14, 2022

Describe the bug

When searching via autocomplete for this place (В’ячеслава Чорновола вулиця 8), no results are returned. However, reverse geocoding does return the place.

In the autocomplete request I see that the parser disregards everything in front of the right single quotation mark(), causing the subject to be 8 ячеслава Чорновола вулиця (street: ячеслава Чорновола вулиця, housenumber: 8).

I have tested with a different place that has apostrophe instead in it's name and it works as expected:

  • searched for this place (П'ятихатки Вулиця 11)
  • autocomplete works as expected, the place is returned
  • the parser has the subject 11 П'ятихатки Вулиця (street: П'ятихатки Вулиця, housenumber: 11).
  • reverse geocode also works

Is this a parser issue or the schema is also affected? I see in this file that this character is not included.

Steps to Reproduce
Search for a place that has a right single quotation mark in it's name using autocomplete

Expected behavior
Expected the place to be returned since it exists in the database.

Environment (please complete the following information):
NA

Pastebin/Screenshots
NA

Additional context
NA

References

NA

@BrindusaN BrindusaN added the bug Something isn't working label Sep 14, 2022
@missinglink
Copy link
Member

Is it one of these quotes?
https://github.com/pelias/parser/blob/master/tokenization/split_funcs.js#L10

The Pelias parser treats those quotes as word boundaries, although there is a code comment below noting that this should only be for quote pairs.

@missinglink
Copy link
Member

I'm not sure if this is a data error or a code error, surely 'apostrophe' is the correct character to use?

a mark ' used to indicate the omission of letters or figures

The same dictionary describes a quotation mark as:

used chiefly to indicate the beginning and the end of a quotation in which the exact phraseology of another or of a text is directly cited

@BrindusaN
Copy link
Author

Hi,

Yes, it is one of the characters in the split_funcs.

AFAIK the right single quotation mark can be used in some languages to alter the sound of a letter (a diacritical mark). [Wikipedia](https://en.wikipedia.org/wiki/Right_single_quotation_mark#:~:text=The%20Unicode%20character%20'%20(U%2B,right%20(closing)%20quotation%20mark.) describes a right single quotation mark as:

The Unicode character ’ (U+2019 right single quotation mark) is used both for a typographic apostrophe and a single right (closing) quotation mark.

Both the apostrophe and the right single quotation mark are modifier letters. It is used in Ukrainian language.

@missinglink
Copy link
Member

Agh ok, thanks for posting that link, we're definitely in this situation of "difficulty of software distinguishing which character is intended by a user's typing".

I don't have the time to work on this right now but I'd be fine with removing it from the quotes array, question is, will that break anything?

A more robust solution would involve splitting these quotes into opening/closing pairs and only considering them as word boundaries when both exist in the text, although this may cause issues with autocomplete.

@missinglink missinglink transferred this issue from pelias/api Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants