-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LZMAError("Expected unpacked size of 149198 but decompressed to 483334")' #11
Comments
An LZMA stream can include an unpacked_size hint in its header (https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma.rs#L61-L74), which the code then verifies to reject inconsistencies (https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma.rs#L312-L320). Additionally, the LZMA2 format is a wrapper around LZMA, which can also provide an unpacked size hint on top of it (https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma2.rs#L89-L95 and https://github.com/gendx/lzma-rs/blob/master/src/decode/lzma2.rs#L161). On top of that, XZ compresses each file with an LZMA2 stream. So it looks like either your file was corrupted or there is a bug in my code due to a corner case that I didn't see before.
|
Would #17 (or a variant of it) work for this use case? |
@gendx I created a reproduction, will it help? use std::io::BufReader;
use lzma_rs::decompress::{Options, UnpackedSize};
const DATA: &[u8] = &[
93, 0, 0, 1, 0, 0, 0, 111, 253, 255, 255, 163, 183, 255, 71, 62, 72, 21, 114, 57, 97, 81, 184,
146, 40, 230, 143, 221, 66, 251, 179, 253, 113, 133, 36, 209, 157, 136, 6, 166, 184, 144, 144,
180, 72, 27, 108, 146, 211, 153, 161, 58, 255, 52, 129, 75, 240, 91, 145, 234, 14, 20, 173, 77,
167, 21, 218, 124, 215, 37, 87, 175, 123, 84, 42, 90, 42, 15, 40, 156, 200, 228, 82, 146, 100,
78, 137, 120, 145, 121, 117, 60, 144, 172, 178, 50, 13, 116, 246, 17, 195, 181, 90, 136, 248,
128, 160, 103, 203, 131, 61, 101, 79, 13, 188, 166, 86, 177, 61, 29, 24, 147, 226, 211, 42, 16,
116, 153, 103, 9, 17, 112, 188, 159, 117, 114, 125, 209, 157, 150, 224, 44, 197, 39, 232, 193,
190, 15, 0, 4, 130, 28, 84, 73, 91, 189, 120, 8, 69, 78, 165, 182, 187, 252, 105, 241, 61, 199,
210, 26, 194, 15, 70, 225, 186, 144, 150, 195, 46, 150, 103, 144, 224, 196, 136, 25, 140, 45,
169, 29, 100, 201, 225, 234, 59, 16, 254, 147, 168, 89, 240, 42, 238, 251, 69, 135, 217, 29,
243, 218, 10, 172, 191, 192, 95, 186, 36, 117, 158, 138, 110, 8, 207, 141, 154, 9, 159, 181, 3,
71, 95, 111, 99, 247, 247, 33, 89, 114, 7, 61, 46, 250, 138, 21, 2, 105, 135, 90, 83, 215, 223,
60, 180, 69, 243, 112, 226, 228, 100, 144, 11, 167, 204, 83, 148, 112, 122, 31, 30, 71, 230,
64, 211, 22, 193, 147, 121, 76, 180, 3, 79, 198, 164, 40, 176, 206, 62, 34, 200, 114, 9, 81,
33, 129, 115, 94, 77, 166, 124, 38, 148, 20, 62, 133, 46, 21, 63, 37, 112, 202, 221, 26, 34, 4,
13, 189, 74, 75, 162, 189, 241, 123, 154, 163, 59, 7, 148, 203, 156, 18, 125, 126, 147, 209,
158, 105, 231, 27, 203, 191, 132, 50, 146, 226, 22, 201, 251, 40, 255, 101, 201, 255, 75, 201,
60, 5, 36, 246, 121, 87, 144, 239, 19, 138, 52, 229, 23, 193, 207, 4, 113, 151, 154, 147, 223,
52, 140, 114, 174, 146, 90, 0, 42, 38, 113, 62, 58, 164, 224, 122, 82, 205, 66, 43, 153, 64,
134, 64, 140, 123, 119, 237, 154, 159, 175, 94, 254, 119, 160, 234, 217, 50, 124, 84, 137, 204,
160, 36, 83, 32, 91, 171, 136, 100, 221, 214, 36, 161, 168, 31, 105, 199, 188, 91, 14, 248, 37,
175, 98, 22, 164, 68, 234, 76, 175, 144, 32, 39, 10, 60, 201, 181, 100, 52, 184, 202, 194, 77,
159, 147, 177, 98, 172, 139, 31, 185, 230, 46, 171, 105, 55, 106, 24, 254, 236, 255, 110, 189,
247, 139, 213, 200, 241, 113, 20, 28, 232, 144, 194, 54, 188, 180, 193, 196, 73, 234, 60, 111,
87, 228, 113, 186, 65, 174, 66, 219, 80, 167, 249, 36, 43, 57, 144, 101, 25, 188, 250, 28, 217,
2, 203, 195, 217, 6, 52, 125, 206, 106, 211, 148, 190, 119, 126, 34, 100, 117, 218, 183, 135,
108, 77, 244, 54, 116, 167, 24, 113, 104, 211, 29, 14, 143, 255, 124, 241, 74, 135, 140, 131,
196, 245, 234, 245, 213, 189, 35, 139, 127, 212, 247, 0,
];
const PACKED_SIZE: u64 = 566;
const UNPACKED_SIZE: u64 = 5048;
fn main() {
let mut input = BufReader::new(DATA);
let mut output = vec![];
let options = Options {
unpacked_size: UnpackedSize::UseProvided(Some(UNPACKED_SIZE)),
};
let result = lzma_rs::lzma_decompress_with_options(&mut input, &mut output, &options);
println!("The result is {:?}", result);
} It prints: "Expected unpacked size of 5048 but decompressed to 5046". |
Thanks @ibaryshnikov for your example. However, I don't see how it's not behaving as expected. You provide an expected unpacked size of 5048 bytes, but the decompressed output is only 5046 bytes. When I set the expected size to 5046 your example stream decompresses fine. So to me this works as intended - if the decompressed size doesn't match the expected one you provided, an error should be reported instead of returning any partial and/or potentially corrupted result. If you don't know the expected size, you can use |
@gendx thanks for checking this example. It's a bit tricky to check when the input is ended. We can have one code, and iterate several times over it using different ranges. In my example, the code before the last is 1063818487, and we have two different valid ranges for it, first is 2663792640 and second is 1320009537. Then there's a switch to the last code, which is 0. Again, we can iterate over this code using different ranges. After removing the break on pub fn is_finished_ok(&mut self) -> io::Result<bool> {
Ok(self.code == 0 && util::is_eof(self.stream)?)
} I got three ranges for code 0: 2212886016, 1089365498 and 547851036 (before there was only 2212886016). That's how we can find the last two bytes, and have 5048 in total. I've compared the results with the library from another language and it seems correct. I don't think it's related to the original issue where the difference between unpacked size is quite solid (149198 vs 483334), but It may be a separate issue. @gendx what do you think? |
We are seeing the same issue, although with a very tiny difference:
Unfortunately, it is again in a file that I cannot share. I have also so far been unable to reproduce the issue with other files. However, the fix in #26 works for us as well. |
This fixes the error Expected unpacked size of x but decompressed to y gendx/lzma-rs#11
This fixes the error Expected unpacked size of x but decompressed to y gendx/lzma-rs#11
As mentioned in #26 (comment) the issue can also be reproduced by compressing the use lzma_rs;
use std::io::prelude::*;
fn main() {
let mut x = Vec::new();
std::fs::File::open("tests/files/range-coder-edge-case")
.unwrap()
.read_to_end(&mut x)
.unwrap();
let encode_options = lzma_rs::compress::Options {
unpacked_size: lzma_rs::compress::UnpackedSize::WriteToHeader(Some(x.len() as u64)),
};
let decode_options = lzma_rs::decompress::Options {
unpacked_size: lzma_rs::decompress::UnpackedSize::ReadFromHeader,
};
let mut compressed: Vec<u8> = Vec::new();
lzma_rs::lzma_compress_with_options(
&mut std::io::BufReader::new(x.as_slice()),
&mut compressed,
&encode_options,
)
.unwrap();
let mut bf = std::io::BufReader::new(compressed.as_slice());
let mut decomp: Vec<u8> = Vec::new();
lzma_rs::lzma_decompress_with_options(&mut bf, &mut decomp, &decode_options).unwrap();
} |
26: don't check for EOF when unpacked size is specified r=gendx a=ibaryshnikov ### Pull Request Overview This pull request fixes the case where the last byte is 0 with known unpacked size. Related issue is #11, in particular #11 (comment) ### Testing Strategy This pull request was tested by... - [x] Added relevant unit tests. - [ ] Added relevant end-to-end tests (such as `.lzma`, `.lzma2`, `.xz` files). ### Supporting Documentation and References The best reference I was able to find is https://svn.python.org/projects/external/xz-5.0.3/doc/lzma-file-format.txt ``` Uncompressed Size is stored as unsigned 64-bit little endian integer. A special value of 0xFFFF_FFFF_FFFF_FFFF indicates that Uncompressed Size is unknown. End of Payload Marker (*) is used if and only if Uncompressed Size is unknown. ``` ### TODO, help wanted I'm unsure what self.rep[0] is checking here ```rust if self.rep[0] == 0xFFFF_FFFF { if rangecoder.is_finished_ok()? { break; } return Err(error::Error::LZMAError(String::from( "Found end-of-stream marker but more bytes are available", ))); } ``` Co-authored-by: ibaryshnikov <[email protected]>
I'm having the same issue with this file: fn main() {
let mut file = std::io::BufReader::new(std::fs::File::open("Unity.tar.xz").unwrap());
let mut decomp: Vec<u8> = Vec::new();
lzma_rs::xz_decompress(&mut file, &mut decomp).unwrap();
} This code produces the following error:
|
Any ideas on what could cause this? Code:
The text was updated successfully, but these errors were encountered: