Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

desync tar gives up on strangely ordered tar, but gives successful exit code #210

Open
dominics opened this issue Jan 12, 2022 · 3 comments

Comments

@dominics
Copy link
Contributor

dominics commented Jan 12, 2022

Setup

I have two .tar files. They were both produced by tar (GNU tar) 1.34, and have similar contents (NodeJS node_modules directories), and the only difference (after some testing) is that one was produced with a different file sort order, when originally running tar:

  • tar -c --hard-dereference -f works.tar ./node_modules ./a/b/node_modules
  • tar -c --hard-dereference -f breaks.tar ./a/b/node_modules ./node_modules

These tar files end up the same size, and the sole difference between them is shown in ordering:

$ gtar -tf breaks.tar | head
./a/b/node_modules/
./a/b/node_modules/.bin/
./a/b/node_modules/.bin/gql-gen                 # these are symlinks to ../../node_modules/<path>
./a/b/node_modules/.bin/graphql-codegen
./a/b/node_modules/.bin/graphql-code-generator
./node_modules/
./node_modules/indexes-of/
./node_modules/indexes-of/.npmignore
./node_modules/indexes-of/test.js
./node_modules/indexes-of/LICENSE
$ gtar -tf works.tar | head
./node_modules/
./node_modules/indexes-of/
./node_modules/indexes-of/.npmignore
./node_modules/indexes-of/test.js
./node_modules/indexes-of/LICENSE
./node_modules/indexes-of/index.js
./node_modules/indexes-of/README.md
./node_modules/indexes-of/package.json
./node_modules/pako/
./node_modules/pako/LICENSE

Behavior

When given to desync tar, these files both produce exit codes of zero, but breaks.tar fails to be processed:

$ time desync tar --config /tmp/config --verbose --tar-add-root --input-format tar --index --store /tmp/store breaks.caidx breaks.tar && echo "Works!"

real	0m0.090s
user	0m0.011s
sys	0m0.020s
Works!

But it's obvious in the case of breaks.tar that no work has happened: the .caidx file is only 144 bytes long, we didn't spend any time, and if we are piping the .tar file in, we don't even finish reading stdin (giving EPIPE).

It seemed at first I might be able to work around this with GNU tar's --sort=name, but that's inadequate: it still obeys the ordering given on the command line/file list (so I really need to make sure that's also relatively sorted).

  1. Is there something I might be doing wrong?
  2. Does this seem like a reasonable restriction on the input tar format, or more like unexpected behavior?
  3. Is there some extra logging we might enable to make it easier to see error like this? (Ideally an error exit status, but some more logs than I'm getting would probably be sufficient)

I'd be fine with just working around this myself (i.e. if the answer for question 1 is: "yes, it's a reasonable restriction, get a better .tar file!"), if it weren't for the exit with success on failure; that seemed concerning enough to open a bug for

dominics pushed a commit to vital-software/monofo-buildkite-plugin that referenced this issue Jan 12, 2022
dominics pushed a commit to vital-software/monofo-buildkite-plugin that referenced this issue Jan 12, 2022
@dominics
Copy link
Contributor Author

dominics commented Jan 13, 2022

Hmm, this is probably #139 under the hood, thinking about it. I am using --tar-add-root, as shown above, but I guess it's a similar problem maybe one level up from the root or something. I think this means reordering (of any sort) might be a fool's errand while there's no parent entry for some of the files in the tar stream

Still trying to produce a minimal example I can share - will work on getting some debug output to do so perhaps (it'd be hugely helpful if we could detect the situation and error)

@folbricht
Copy link
Owner

Yes, this is indeed #139 and it's really due to the catar format itself. It's meant to be ordered and stable so you'd have to make sure that the input is as well. There definitely are feature in tar that can't be mapped to catar unfortunately. As for why it doesn't actually fail, that's a limitation of the algorithm I used to walk the tree, when there's a break like that in the input, it thinks the input is done and exist. I might be able to at least fix that part and have it fail properly (the issue is likley here https://github.com/folbricht/desync/blob/master/tar.go#L94).

@dominics
Copy link
Contributor Author

dominics commented Jan 14, 2022

I've put together a minimal test case and my notes on getting a tar format that works at https://github.com/dominics/desync-tar-requirements - and https://github.com/dominics/desync-tar-requirements#example-output shows two files lists, one working and one not. I hope that helps anyone else running into exit code 141

Because I have programmatic control of the inputs to this particular process, I can probably pretty easily do the dir to file recursion myself, make sure there are entries for intermediate directories, and turn GNU tar's --no-recursion on (so that I'm in charge of recursing down to the level of the files to produce a complete tree); it was just knowing exactly what to look for in the file list that helps

Edit: I also didn't understand how powerful GNU tar's --file-list is. You can use --file-list within a file list file (as well as a bunch of other "positional arguments", such as --recursion/--no-recursion, as long as you're not setting --null in an attempt to cope with bloody newlines in file names (who uses those, right?!), which disables this). So, that makes including just the necessary non-recursive "intermediate directories" pretty easy.

Edit: updated title because I briefly thought I saw input files that would be able to be converted to catar but would fail to index from there. But it wasn't failing to index: instead the input file just ended up with less than the minimum chunk size's worth of catar, so the index was extremely small - set back when I realized it was a false test

@dominics dominics changed the title desync tar gives up on strangely ordered tar, but gives successful exit code desync tar gives up on indexing tar, but gives successful exit code Jan 16, 2022
@dominics dominics changed the title desync tar gives up on indexing tar, but gives successful exit code desync tar gives up on strangely ordered tar, but gives successful exit code Jan 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants