-
Notifications
You must be signed in to change notification settings - Fork 16
xdelta3.XDeltaError: Output of decoding delta longer than expected #2
Comments
I'm getting the same error. look here
Basically I assume that the size of the output ( I guess that was wrong. Perhaps b1 has lots of repeated content which xdelta 3 is cleverly encoding. In short I think xdelta3 is performing better than I assumed it could. I'm no expert C developer, but I would guess the solution is to just increase I'll create a PR for this. Do you mind if I use the two files in |
see #3 @karamanolev, would be great if you could try that branch and check it's working for you. If ok, I'll merge and deploy an update. |
@samuelcolvin Feel free to use the files. Please mind that they were retrieved from a website that I don't own, but I don't think it's ever going to be an issue. As for the output size, it can be very big compared to inputs. Take a look:
This means you can have an input of 0 bytes and 17 bytes respectively and generate an output of 10000 bytes (10k bytes). If you have 0 bytes and 1 gigabyte of repeated characters, the delta is 1923 bytes. There won't be a formula in the world that allows you to statically allocate a buffer you know is going to enough. |
Ok, but in real work #3 is better than the current value. Unless you want to submit a PR to dynamically allocate the output buffer size? |
I'll take a look. The right way to do it, I think, is to implement the streaming capabilities of xdelta3. A compromise with amortized linear complexity would be to start with a reasonably sized buffer (e.g. the old size) and then double it each time you end up with ENOSPC. That shouldn't be too hard to implement. |
Agreed doubling would be easier. The problem is it will be slow(er) since the decode will have to retry until a suitable size is reached, what's worse this performance degradation will be silent. I can see a situation along the lines of "Here's this sitation where xdelta three should be brilliant and save me loads of data but it's being a bit slow. Why is that?". But then I don't see a better solution. Perhaps it would be worth asking on the main xdelta3 repo for advice. |
It shouldn't be slower by more than 2 times (amortized linear). And certainly better than throwing errors. As for the right way - the streaming interface + allocating buffer blocks as you go. https://github.com/jmacd/xdelta/blob/wiki/ProgrammingGuide.md it's documented here, xd3_stream/xd3_config. The implementation would be a bit more tricky though. |
ok, makes sense. Could you create a PR for doubling buffer size? |
ok, I've updated #3 to double @karamanolev does that look ok to you? |
@samuelcolvin Your PR looks OK to me. I've opened another, alternative one, which also happens to fix my other big issue with the library as is (the NoDelta exception is just surprising, in a bad way). I put argumentation in the PR, but here's some more: |
Alternative solution: upon encoding, prepend in "delta" the size of the original string - that's a single integer. Upon decoding, read this value first, then allocate a buffer sized exactly as needed - you won't need reallocation nor compromise on speed. |
Makes lots of sense, then default to the current value. PR welcome. |
I'm trying to use your library. Here's a simple example of how I'm trying it:
This gives me an exception:
xdelta3.XDeltaError: Output of decoding delta longer than expected
I'm attaching a zip file with b1.bin and b2.bin. By the way, they're simple HTML files, I just made sure to treat them as binary so I don't trip over encodings. Also, when trying it in reverse (not b1 -> b2, but b2 -> b1) it all works well.
$ python --version
Python 3.6.1
Running on Mac OS, Python installed via pyenv. xdelta3==0.0.5.
files.zip
The text was updated successfully, but these errors were encountered: