Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import Fails with Z_BUF_ERROR unexpected end of file #393

Closed
JuStTheDev opened this issue Sep 30, 2020 · 7 comments
Closed

Import Fails with Z_BUF_ERROR unexpected end of file #393

JuStTheDev opened this issue Sep 30, 2020 · 7 comments
Labels

Comments

@JuStTheDev
Copy link

Describe the bug
When I use npm run start or yarn start, the process ends up in an error

Error: unexpected end of file                                                                                                 
    at Zlib.zlibOnError [as onerror] (zlib.js:180:17) {                                                                       
  errno: -5,                                                                                                                  
  code: 'Z_BUF_ERROR'                                                                                                         
}
events.js:292
      throw er; // Unhandled 'error' event
      ^

Error: unexpected end of file
    at Zlib.zlibOnError [as onerror] (zlib.js:180:17)
Emitted 'error' event on BunWrapper instance at:
    at errorOrDestroy (internal/streams/destroy.js:108:12)
    at BunWrapper.onerror (_stream_readable.js:752:7)
    at BunWrapper.emit (events.js:315:20)
    at BunWrapper.<anonymous> (/root/pelias/geonames/node_modules/bun/lib/bun.js:31:21)
    at BunWrapper.emit (events.js:315:20)
    at Parse.<anonymous> (/root/pelias/geonames/node_modules/bun/lib/bun.js:31:21)
    at Parse.emit (events.js:327:22)
    at InflateRaw.<anonymous> (/root/pelias/geonames/node_modules/geonames-stream/node_modules/unzipper/lib/parse.js:153:44)
    at InflateRaw.emit (events.js:327:22)
    at Zlib.zlibOnError [as onerror] (zlib.js:183:8) {
  errno: -5,
  code: 'Z_BUF_ERROR'
}
error Command failed with exit code 1.

Steps to Reproduce
I use
Ubuntu Minimal: 20.04.1
node: 12.18.4
Bun: 0.0.10
geonames-stream: 2.2.0
unzipper: 0.7.6 and 0.10.11

STR:

  1. Follow Installation from pelias doc
  2. run npm install
  3. run npm run download
  4. run npm run download_metatdata
  5. run npm run start
  6. see error after a while. (See it above)

Expected behavior
The import process should complete without any error.

Environment (please complete the following information):

  • OS: Linux (Ubuntu 20.04.1)
  • Docker not used
  • Node version: 12.18.4
  • yarn/npm both used as runner as well as as installer
  • 128G RAM
  • 96 Free at Time of start

Pastebin/Screenshots

Additional context
Pelias.json is very typical here:

...
 "geonames": {
    "datapath": "/data/pelias/geonames",
    "countryCode": "ALL"
  },
...

I do not use Docker in this setup.
I found that there are two "unzipper" versions in the .lock-File. I am already on removing one and try again. Anyway it might be cool, to not let node deal with the zip-file but instead unzip the downloaded file(s) once and further on work on the txt files directly.

@JuStTheDev JuStTheDev added the bug label Sep 30, 2020
@missinglink
Copy link
Member

The bug seems to be in https://github.com/geopipes/geonames-stream, although we haven't touched that code for years.

Can you please confirm that the file zip file located in /data/pelias/geonames:

  1. was downloaded correctly
  2. is a valid zip file

@JuStTheDev
Copy link
Author

Hi @missinglink,

thank you for the hint. I have been there already. For your question: I downloaded and md5sumed the file three times on two different machines. All the same md5sum. Secondly I assume it is a valid zip-File since zip -T [file] gives positive result and really unzipping the file returns in a sane textfile that looks sane until the end.
I am now trying the docker image approach.

@missinglink
Copy link
Member

We've had a lot of problems over the years with streaming zip decompressors, mainly because the format wasn't originally designed to support streaming decompression like you'd expect from gzip, bzip etc.

It's been the cause of lots of bug reports and so we've refactored a lot of code to simply perform a download step followed by an extract step, which proves to be much more reliable.

I forget the specifics but some zip files cause the error while others don't, I never quite figured out why.

I'd be interested to hear if using the docker setup yeilds different results, which would point to something static inside them working when installing from scratch does not.

@missinglink
Copy link
Member

It's also possible this PR introduced a regression
geopipes/geonames-stream#19

@missinglink
Copy link
Member

Agh I remember now, the central directory record of the ZIP file is located at the end of the file, so any operations which require that info must buffer the whole file, or stream it twice.

@orangejulius
Copy link
Member

Hello fellow Julian!

The Geonames importer and unzipping process has caused us lots of pain over the years, and I think much of it, like Peter said, has to do with the fact that ZIP files don't work well with streaming (which was a big goal of Pelias early on in the project back when Node.js and streams were the hot new thing - not so much a goal anymore).

I believe the Geonames servers routinely "append" or otherwise modify (rather than create from scratch) their ZIP files which causes issues like this. You can try to recreate the zip file from scratch with something like this:

unzip ${geonames_file}.zip
rm ${geonames_file}.zip
zip ${geonames_file}.zip ${geonames_file}.txt

Its also not uncommon for the Geonames servers to be down completely, or produce truly broken ZIP files, but that doesn't seem to be the problem here.

Overall, we absolutely should fix this by removing all the ZIP related functionality from the Node.js code in the Geonames importer, and relying on the unzip command line utility which is probably faster and more robust.

However to be honest, the Geonames dataset just isn't really a big priority, and the architecture of this importer has the core functionality buried in a separate NPM module, making it even harder to change (#297). So realistically, hacks like the one above are the easiest path forward unless we find someone who really wants to take on some major overhauling.

@JuStTheDev
Copy link
Author

Hello @orangejulius and @missinglink
I tried the same from within docker and had the very same behavior. Then, today, I tried the unzip-rezip-trick of @orangejulius. With so treated zip file I could complete the import in both environments.

Therefore, the zip file seems to be valid for the cli-zip of linux but not valid for the zip library used here.

I hope that some folks read this before investing some additional days in environment setting.

This is not really an issue of geonames, which is why I will close the issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants