Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5080 false positives for "Close similar name" with roman numerals #2383

Open
Amunak opened this issue Nov 3, 2024 · 4 comments
Open

5080 false positives for "Close similar name" with roman numerals #2383

Amunak opened this issue Nov 3, 2024 · 4 comments

Comments

@Amunak
Copy link

Amunak commented Nov 3, 2024

We have a lot of numbered streets in Prague with only a roman numeral distinguishing them, and Osmose reports them as suspicious (see issues like this).

From reading the source regex it seems like this case should be skipped, but isn't for some reason.

Can you look into it please?

@Famlam
Copy link
Collaborator

Famlam commented Nov 3, 2024

Interesting, there's more strange errors with those names. For instance "No street with name "Herálecká II" found around" directly next to a street with that name https://www.openstreetmap.org/way/234453526

@Famlam
Copy link
Collaborator

Famlam commented Nov 3, 2024

Oh, I see, there's a special character used:
and instead of II and III
(The first two are a single character, the latter multiple uppercase i. On osm they look the same)

Is this a common script thing in your alphabet / street numbering? (Especially since the addr:street tagging seems to use regular uppercase i)

@Amunak
Copy link
Author

Amunak commented Nov 4, 2024

Ahh interesting. I don't think you can even have characters like that in official street names; to me this appears more like a thing where an editor or maybe even the keyboard just automatically replaces the characters with the unicode for the roman numeral (or maybe the user just did it? Idk).

Like, technically it is a more correct representation, but I can also see how it could cause issues. With that being said I think the solution would be to just add the "number forms" unicode range to the regex.

@Famlam
Copy link
Collaborator

Famlam commented Nov 4, 2024

I would agree it's wrong to warn about "close similar names" indeed, because it's like "street 1" vs "street 2", so we should fix that.

Note to self: \p{N}?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants