Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid regular expression: "^[А-ЯЁ ]+$" #705

Open
Zaczero opened this issue Aug 25, 2023 · 7 comments
Open

Invalid regular expression: "^[А-ЯЁ ]+$" #705

Zaczero opened this issue Aug 25, 2023 · 7 comments

Comments

@Zaczero
Copy link

Zaczero commented Aug 25, 2023

[out:json][timeout:60][bbox:{{bbox}}];
nwr[name~"^[А-ЯЁ ]+$"];
out body qt;
>;
out skel qt;
@mmd-osm
Copy link
Contributor

mmd-osm commented Aug 26, 2023

While your query works on the overpass-api.de instance, some other instances like kumi.systems fail with the error message above. Some versions of C POSIX regular expressions don't seem to handle ranges with cyrillic characters properly.

As a quick workaround, you might try some other Overpass instance, or maybe avoid the range altogether by explicitly specifying all characters (not properly tested):

[out:json][timeout:60][bbox:{{bbox}}];
nwr[name~"^[АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁ ]+$"];
out body qt;
>;
out skel qt;

Minimum example https://cpp.godbolt.org/z/qz6Tn56j9 fails on some systems.

@Zaczero
Copy link
Author

Zaczero commented Aug 26, 2023

Interesting! This problem impacts my Overpass instance, which I set up using the instructions from https://overpass-api.de/full_installation.html on a Debian docker image. I'm wondering if I've overlooked something.

FROM debian:bookworm-slim

# Install dependencies
RUN apt-get update && apt-get install -y \
    wget \
    g++ \
    make \
    expat \
    libexpat1-dev \
    zlib1g-dev \
    liblz4-dev \
    lighttpd \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Download, extract and compile Overpass
RUN wget https://dev.overpass-api.de/releases/osm-3s_latest.tar.gz -O osm-3s_latest.tar.gz && \
    mkdir ./src && \
    tar -xzf osm-3s_latest.tar.gz -C ./src --strip-components=1 && \
    rm osm-3s_latest.tar.gz && \
    cd src && \
    ./configure --prefix="/app" --enable-lz4 && \
    make dist install clean && \
    cp -r rules .. && \
    cd .. && \
    rm -r ./src
...

@mmd-osm
Copy link
Contributor

mmd-osm commented Aug 26, 2023

By the way, I'm getting the same issue on Ubuntu 22.04, which is also based on Debian bookworm. For some reason, the previous Debian version bullseye seems to work ok.

You could try and replace the first line in your Dockerfile by FROM debian:bullseye-slim to see it helps. We still need to figure out what exactly is causing this issue on the newer Debian version.

@Zaczero
Copy link
Author

Zaczero commented Aug 26, 2023

image

I think I found the cause of that.
To check the currently applied locale:

std::cout << "Current Locale: " << setlocale(LC_ALL, NULL) << std::endl;

But maybe there is a better way to set the UTF-8 locale in the first place.

@Zaczero
Copy link
Author

Zaczero commented Aug 26, 2023

I have read that Python officially supports systems that have at least one of installed:

  • C.UTF-8
  • C.utf8
  • UTF-8

Maybe the same could be done in the overpass-api case.

...btw, I do confirm that switching to FROM debian:bullseye-slim fixed the issue.

@drolbr
Copy link
Owner

drolbr commented Nov 23, 2023

It looks like buggy Regex engines from the base system are a real problem. The final solution, even if a workaround, should be to open an avenue to use the Regex engine of choice. I don't know whether the final solution will do some during install time or runtime.

@Zaczero
Copy link
Author

Zaczero commented Nov 23, 2023

If the app uses the C locale (since the requested locale is not installed), I don't see it as much of a regex engine issue. The app should simply support a wider range of UTF-8 locales, as other apps do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants