-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import Fails with Z_BUF_ERROR unexpected end of file #393
Comments
The bug seems to be in https://github.com/geopipes/geonames-stream, although we haven't touched that code for years. Can you please confirm that the file zip file located in
|
Hi @missinglink, thank you for the hint. I have been there already. For your question: I downloaded and |
We've had a lot of problems over the years with streaming zip decompressors, mainly because the format wasn't originally designed to support streaming decompression like you'd expect from It's been the cause of lots of bug reports and so we've refactored a lot of code to simply perform a download step followed by an extract step, which proves to be much more reliable. I forget the specifics but some zip files cause the error while others don't, I never quite figured out why. I'd be interested to hear if using the docker setup yeilds different results, which would point to something static inside them working when installing from scratch does not. |
It's also possible this PR introduced a regression |
Agh I remember now, the central directory record of the ZIP file is located at the end of the file, so any operations which require that info must buffer the whole file, or stream it twice. |
Hello fellow Julian! The Geonames importer and unzipping process has caused us lots of pain over the years, and I think much of it, like Peter said, has to do with the fact that ZIP files don't work well with streaming (which was a big goal of Pelias early on in the project back when Node.js and streams were the hot new thing - not so much a goal anymore). I believe the Geonames servers routinely "append" or otherwise modify (rather than create from scratch) their ZIP files which causes issues like this. You can try to recreate the zip file from scratch with something like this: unzip ${geonames_file}.zip
rm ${geonames_file}.zip
zip ${geonames_file}.zip ${geonames_file}.txt Its also not uncommon for the Geonames servers to be down completely, or produce truly broken ZIP files, but that doesn't seem to be the problem here. Overall, we absolutely should fix this by removing all the ZIP related functionality from the Node.js code in the Geonames importer, and relying on the However to be honest, the Geonames dataset just isn't really a big priority, and the architecture of this importer has the core functionality buried in a separate NPM module, making it even harder to change (#297). So realistically, hacks like the one above are the easiest path forward unless we find someone who really wants to take on some major overhauling. |
Hello @orangejulius and @missinglink Therefore, the zip file seems to be valid for the cli-zip of linux but not valid for the zip library used here. I hope that some folks read this before investing some additional days in environment setting. This is not really an issue of geonames, which is why I will close the issue now. |
Describe the bug
When I use
npm run start
oryarn start
, the process ends up in an errorSteps to Reproduce
I use
Ubuntu Minimal: 20.04.1
node: 12.18.4
Bun: 0.0.10
geonames-stream: 2.2.0
unzipper: 0.7.6 and 0.10.11
STR:
npm install
npm run download
npm run download_metatdata
npm run start
Expected behavior
The import process should complete without any error.
Environment (please complete the following information):
Pastebin/Screenshots
Additional context
Pelias.json is very typical here:
I do not use Docker in this setup.
I found that there are two "unzipper" versions in the .lock-File. I am already on removing one and try again. Anyway it might be cool, to not let node deal with the zip-file but instead unzip the downloaded file(s) once and further on work on the txt files directly.
The text was updated successfully, but these errors were encountered: