xdelta3.XDeltaError: Output of decoding delta longer than expected #2

karamanolev · 2017-12-21T13:49:49Z

I'm trying to use your library. Here's a simple example of how I'm trying it:

    b1 = open('b1.bin', 'rb').read()
    b2 = open('b2.bin', 'rb').read()

    d = xdelta3.encode(b1, b2)
    b3 = xdelta3.decode(b1, d)
    print(b2 == b3)

This gives me an exception: xdelta3.XDeltaError: Output of decoding delta longer than expected

I'm attaching a zip file with b1.bin and b2.bin. By the way, they're simple HTML files, I just made sure to treat them as binary so I don't trip over encodings. Also, when trying it in reverse (not b1 -> b2, but b2 -> b1) it all works well.

$ python --version
Python 3.6.1
Running on Mac OS, Python installed via pyenv. xdelta3==0.0.5.

files.zip

The text was updated successfully, but these errors were encountered:

samuelcolvin · 2017-12-21T14:07:23Z

I'm getting the same error.

look here

output shouldn't be bigger than the original plus the delta, but give a little leeway

Basically I assume that the size of the output (b3 here) won't be bigger than len(b1) + len(d) plus some breaking space.

I guess that was wrong. Perhaps b1 has lots of repeated content which xdelta 3 is cleverly encoding. In short I think xdelta3 is performing better than I assumed it could.

I'm no expert C developer, but I would guess the solution is to just increase output_alloc to maybe input_size + source_size * 2.

I'll create a PR for this. Do you mind if I use the two files in files.zip for unit tests?

samuelcolvin · 2017-12-21T14:43:44Z

see #3 @karamanolev, would be great if you could try that branch and check it's working for you.

If ok, I'll merge and deploy an update.

karamanolev · 2017-12-21T15:45:56Z

@samuelcolvin Feel free to use the files. Please mind that they were retrieved from a website that I don't own, but I don't think it's ever going to be an issue.

As for the output size, it can be very big compared to inputs. Take a look:

import xdelta3
b1 = b''
b2 = b't' * 10000
len(xdelta3.encode(b1, b2))  # This is 17

This means you can have an input of 0 bytes and 17 bytes respectively and generate an output of 10000 bytes (10k bytes). If you have 0 bytes and 1 gigabyte of repeated characters, the delta is 1923 bytes. There won't be a formula in the world that allows you to statically allocate a buffer you know is going to enough.

samuelcolvin · 2017-12-21T15:47:50Z

Ok, but in real work #3 is better than the current value.

Unless you want to submit a PR to dynamically allocate the output buffer size?

karamanolev · 2017-12-21T15:58:42Z

I'll take a look. The right way to do it, I think, is to implement the streaming capabilities of xdelta3. A compromise with amortized linear complexity would be to start with a reasonably sized buffer (e.g. the old size) and then double it each time you end up with ENOSPC. That shouldn't be too hard to implement.

samuelcolvin · 2017-12-21T16:02:55Z

Agreed doubling would be easier. The problem is it will be slow(er) since the decode will have to retry until a suitable size is reached, what's worse this performance degradation will be silent.

I can see a situation along the lines of "Here's this sitation where xdelta three should be brilliant and save me loads of data but it's being a bit slow. Why is that?". But then I don't see a better solution.

Perhaps it would be worth asking on the main xdelta3 repo for advice.

karamanolev · 2017-12-21T16:06:50Z

It shouldn't be slower by more than 2 times (amortized linear). And certainly better than throwing errors. As for the right way - the streaming interface + allocating buffer blocks as you go. https://github.com/jmacd/xdelta/blob/wiki/ProgrammingGuide.md it's documented here, xd3_stream/xd3_config. The implementation would be a bit more tricky though.

samuelcolvin · 2017-12-21T16:11:28Z

ok, makes sense. Could you create a PR for doubling buffer size?

samuelcolvin · 2017-12-21T17:10:13Z

ok, I've updated #3 to double output_alloc until the buffer is big enough.

@karamanolev does that look ok to you?

karamanolev · 2017-12-21T17:32:56Z

@samuelcolvin Your PR looks OK to me. I've opened another, alternative one, which also happens to fix my other big issue with the library as is (the NoDelta exception is just surprising, in a bad way). I put argumentation in the PR, but here's some more:
If you get the NoDelta exception, you'll have to add application logic to handle it anyway (you can't just ignore it or something like this). If you're going to add app logic for that, you can just as well check for the delta length yourself.
Imagine you don't want to add logic and you just want to store the delta, even if it's a little larger, that's fine. That will likely be 0.001% of your cases and will probably not cause a big storage issue. In that case you're left to hang dry - you can't use the library, because it just throws an exception, no return value at all. This is what happened in my case.

mwojnars · 2020-02-17T11:06:42Z

Alternative solution: upon encoding, prepend in "delta" the size of the original string - that's a single integer. Upon decoding, read this value first, then allocate a buffer sized exactly as needed - you won't need reallocation nor compromise on speed.

samuelcolvin · 2020-02-17T17:51:35Z

Makes lots of sense, then default to the current value.

PR welcome.

samuelcolvin linked a pull request Dec 21, 2017 that will close this issue

fix for large decoded value #3

Open

SmileSydney mentioned this issue Jan 16, 2020

How to install package using branch fix-large-decode #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xdelta3.XDeltaError: Output of decoding delta longer than expected #2

xdelta3.XDeltaError: Output of decoding delta longer than expected #2

karamanolev commented Dec 21, 2017 •

edited

Loading

samuelcolvin commented Dec 21, 2017

samuelcolvin commented Dec 21, 2017

karamanolev commented Dec 21, 2017

samuelcolvin commented Dec 21, 2017

karamanolev commented Dec 21, 2017

samuelcolvin commented Dec 21, 2017

karamanolev commented Dec 21, 2017

samuelcolvin commented Dec 21, 2017

samuelcolvin commented Dec 21, 2017

karamanolev commented Dec 21, 2017

mwojnars commented Feb 17, 2020

samuelcolvin commented Feb 17, 2020

xdelta3.XDeltaError: Output of decoding delta longer than expected #2

xdelta3.XDeltaError: Output of decoding delta longer than expected #2

Comments

karamanolev commented Dec 21, 2017 • edited Loading

samuelcolvin commented Dec 21, 2017

samuelcolvin commented Dec 21, 2017

karamanolev commented Dec 21, 2017

samuelcolvin commented Dec 21, 2017

karamanolev commented Dec 21, 2017

samuelcolvin commented Dec 21, 2017

karamanolev commented Dec 21, 2017

samuelcolvin commented Dec 21, 2017

samuelcolvin commented Dec 21, 2017

karamanolev commented Dec 21, 2017

mwojnars commented Feb 17, 2020

samuelcolvin commented Feb 17, 2020

karamanolev commented Dec 21, 2017 •

edited

Loading