Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't check for EOF when unpacked size is specified #26

Merged
merged 2 commits into from
Mar 5, 2020
Merged

don't check for EOF when unpacked size is specified #26

merged 2 commits into from
Mar 5, 2020

Conversation

ibaryshnikov
Copy link
Contributor

Pull Request Overview

This pull request fixes the case where the last byte is 0 with known unpacked size.
Related issue is #11, in particular #11 (comment)

Testing Strategy

This pull request was tested by...

  • Added relevant unit tests.
  • Added relevant end-to-end tests (such as .lzma, .lzma2, .xz files).

Supporting Documentation and References

The best reference I was able to find is
https://svn.python.org/projects/external/xz-5.0.3/doc/lzma-file-format.txt

Uncompressed Size is stored as unsigned 64-bit little endian
integer. A special value of 0xFFFF_FFFF_FFFF_FFFF indicates
that Uncompressed Size is unknown. End of Payload Marker (*)
is used if and only if Uncompressed Size is unknown.

TODO, help wanted

I'm unsure what self.rep[0] is checking here

if self.rep[0] == 0xFFFF_FFFF {
    if rangecoder.is_finished_ok()? {
        break;
    }
    return Err(error::Error::LZMAError(String::from(
        "Found end-of-stream marker but more bytes are available",
    )));
}

@dragly
Copy link
Contributor

dragly commented Feb 4, 2020

This fix works for us as well. I agree with what @gendx said in #11 (comment) and would prefer to find a test-case that reproduces the issue with a proper unpacked size. However, it would be nice if this PR could be merged and an issue created for the test case, if the code is otherwise sound.

@ibaryshnikov
Copy link
Contributor Author

ibaryshnikov commented Feb 5, 2020

@dragly

would prefer to find a test-case that reproduces the issue with a proper unpacked size

I think the test case I submitted has a proper unpacked size. If it doesn't, I'd be curious to know why

@dragly
Copy link
Contributor

dragly commented Mar 4, 2020

Sorry, I misread your example. I thought you provided the unpacked data and that the length you gave was different from the actual amount of data.

One option is to just change the round_trip_file test to also do a round-trip with unpacked size written to the header (the default is not to write it to the header):

fn round_trip_file(filename: &str) {
    use std::io::Read;

    let mut x = Vec::new();
    std::fs::File::open(filename)
        .unwrap()
        .read_to_end(&mut x)
        .unwrap();

    round_trip(x.as_slice());

    // Do another round trip, but this time also write it to the header
    let encode_options = lzma_rs::compress::Options {
        unpacked_size: lzma_rs::compress::UnpackedSize::WriteToHeader(Some(x.len() as u64)),
    };
    let decode_options = lzma_rs::decompress::Options {
        unpacked_size: lzma_rs::decompress::UnpackedSize::ReadFromHeader,
    };
    round_trip_with_options(x.as_slice(), &encode_options, &decode_options);
}

This actually trips up the range-coder-edge-case file with the following error:

called `Result::unwrap()` on an `Err` value: LZMAError("Expected unpacked size of 3040092 but decompressed to 3040057")

So I think that it is sufficient to modify the test to include the additional round-trip.

However, if you prefer to use your own data (where does it come from, by the way?), you could add the result of uncompressing your bytes as a new unpacked-size-edge-case file and change the round_trip_files test to

#[test]
fn round_trip_files() {
    let _ = env_logger::try_init();
    round_trip_file("tests/files/foo.txt");
    round_trip_file("tests/files/range-coder-edge-case");
    round_trip_file("tests/files/unpacked-size-edge-case");
}

Either way, it becomes clear that the issue is inherent to the lzma-rs library, because it shows that it is not able to properly decompress a file it has itself compressed.

Copy link
Owner

@gendx gendx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, it would be nice if this PR could be merged and an issue created for the test case, if the code is otherwise sound.

Sorry for the delay in reviewing this, things have been quite busy on my side last month.


I also agree with @dragly's comment regarding the tests. Updating round_trip tests to test the header and catch this regression would be more robust in the long run.

src/decode/lzma.rs Show resolved Hide resolved
@ibaryshnikov
Copy link
Contributor Author

ibaryshnikov commented Mar 5, 2020

@dragly the data is a part of openctm file. It appears that I'm working with wasm, too.

@dragly @gendx I like the idea of using roundtrip test instead, I will send an update shortly.

@gendx
Copy link
Owner

gendx commented Mar 5, 2020

bors r+

bors bot added a commit that referenced this pull request Mar 5, 2020
26: don't check for EOF when unpacked size is specified r=gendx a=ibaryshnikov

### Pull Request Overview

This pull request fixes the case where the last byte is 0 with known unpacked size.
Related issue is #11, in particular #11 (comment)


### Testing Strategy

This pull request was tested by...

- [x] Added relevant unit tests.
- [ ] Added relevant end-to-end tests (such as `.lzma`, `.lzma2`, `.xz` files).


### Supporting Documentation and References

The best reference I was able to find is
https://svn.python.org/projects/external/xz-5.0.3/doc/lzma-file-format.txt

```
Uncompressed Size is stored as unsigned 64-bit little endian
integer. A special value of 0xFFFF_FFFF_FFFF_FFFF indicates
that Uncompressed Size is unknown. End of Payload Marker (*)
is used if and only if Uncompressed Size is unknown.
```


### TODO, help wanted

I'm unsure what self.rep[0] is checking here
```rust
if self.rep[0] == 0xFFFF_FFFF {
    if rangecoder.is_finished_ok()? {
        break;
    }
    return Err(error::Error::LZMAError(String::from(
        "Found end-of-stream marker but more bytes are available",
    )));
}
```

Co-authored-by: ibaryshnikov <[email protected]>
@gendx gendx merged commit 2a1767a into gendx:master Mar 5, 2020
@bors
Copy link
Contributor

bors bot commented Mar 5, 2020

Timed out

bors bot added a commit that referenced this pull request Mar 5, 2020
29: Small pull-request to test Bors r=gendx a=gendx

### Pull Request Overview

This pull request does some minor cleanup and tests #28.

30: Generalize round_trip_with_options test to all tests. r=gendx a=gendx

### Pull Request Overview

This pull request generalizes the tests introduced in #26 to all round-trip tests (not only files).

Co-authored-by: G. Endignoux <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants