-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reject all DEFLATE streams that zlib rejects #288
Comments
There are several edge cases where for performance reasons, libdeflate is intentionally more accepting than zlib, in a safe way. The specific case that your example triggers is the case where the encoded codeword lengths expand to more than the number of codewords. But there are a few others too. There isn't any real problem with doing this, since in general corruption in a DEFLATE stream can only be detected by a checksum anyway. Can you elaborate on why you consider this to be a problem? |
Hi, @ebiggers , thanks for your explantion. Generally, any corruption of the compressed data should be timely notified to the users because it may lead to severe effects and difficult to make recovery. This is different from uncompressed text, which may probably still be useful despite the presence of some corrupted bytes. It seems unlikely a real problem because there is a very low possibility that both checksum and corrupted data are satisfied at the same time. However, considering its wide usage, including some critical systems, the situation will be changed when a stealthy attacker is involved (e.g., an attacker may combine other bugs to hijack the checksum function, which can be used to correct the checksum maliciously). |
Yes, which is why people who want to detect data corruption need to use a checksum (e.g. as the gzip and zlib wrapper formats for DEFLATE do), and not rely on the incidental built-in redundancies of the DEFLATE format which are much, much less effective at detecting data corruption. Corrupting a DEFLATE stream will very often create another valid DEFLATE stream. In contrast, just a 32-bit checksum will detect 99.99999997% of corruptions. The question of when the DEFLATE decompressor should report an error when it's given an invalid stream, vs. remap it to a valid stream, is really just a minor quality-of-implementation question. I'd argue that reporting DEFLATE decompression errors is actually sort of bad, because it misleads people into thinking that DEFLATE has built-in error detection, which it doesn't. You need a checksum if you want to detect data corruption. That being said, I do understand that zlib is the standard implementation of DEFLATE, and it's generally better to be consistent with it... if only so that people running fuzzers that compare the implementations aren't confused. It's easy to make libdeflate return an error when "the encoded codeword lengths expand to more than the number of codewords", so I'll do that. That handles two of your examples. However, that still leaves the fact that DEFLATE streams can contain invalid litlen and offset symbols. Those are really hard to handle efficiently, other than by remapping them to valid symbols (as libdeflate does). I am probably not going to change what libdeflate does for those, as it does not seem worth it...
Detecting malicious modifications is totally irrelevant here, as a cryptographic MAC would be needed for that. |
When there are more encoded litlen and offset codeword lengths than there are supposed to be based on the earlier fields, return LIBDEFLATE_BAD_DATA instead of ignoring the extra lengths. This isn't very important, but this aligns the behavior with zlib. Also add a regression test for this change. Update #288
When there are more encoded litlen and offset codeword lengths than there are supposed to be based on the earlier fields, return LIBDEFLATE_BAD_DATA instead of ignoring the extra lengths. This isn't very important, but this aligns the behavior with zlib. Also add a regression test for this change. Update #288
When there are more encoded litlen and offset codeword lengths than there are supposed to be based on the earlier fields, return LIBDEFLATE_BAD_DATA instead of ignoring the extra lengths. This isn't very important, but this aligns the behavior with zlib. Also add a regression test for this change. Update #288
65da376 handled 2574818 would handle |
thanks for your explanation and bug fixing! |
Hi, I recently fuzz the libdeflate for parsing zlib format file and found some interesting cases.
Specifically, libdeflate accepts the file without any issue while another parser, the zlib rejects it, and I also contacted the zlib authors.
To check it whether valid or not, I use the following code (mainly from
zlib_decompress/fuzz.c
)These interesting files are attached!
pocs.zip
The text was updated successfully, but these errors were encountered: