Reject all DEFLATE streams that zlib rejects #288

Dongmuliang · 2023-01-09T17:09:50Z

Hi, I recently fuzz the libdeflate for parsing zlib format file and found some interesting cases.
Specifically, libdeflate accepts the file without any issue while another parser, the zlib rejects it, and I also contacted the zlib authors.

To check it whether valid or not, I use the following code (mainly from zlib_decompress/fuzz.c)

int main(int argc, char **argv)
{
	struct libdeflate_decompressor *d;
	int ret;
	int fd = open(argv[1], O_RDONLY);
	struct stat stbuf;
	assert(fd >= 0);
	ret = fstat(fd, &stbuf);
	assert(!ret);

	char in[stbuf.st_size];
	ret = read(fd, in, sizeof in);
	assert(ret == sizeof in);

	char out[sizeof(in) * 30];

	d = libdeflate_alloc_decompressor();
	size_t out_size = 0 ;

	enum libdeflate_result res = libdeflate_zlib_decompress(d, in, sizeof in, out, sizeof out, &out_size);
	printf("decode res:%d\n", res);
	libdeflate_free_decompressor(d);
	return 0;
}

These interesting files are attached!
pocs.zip

The text was updated successfully, but these errors were encountered:

ebiggers · 2023-01-09T18:08:24Z

There are several edge cases where for performance reasons, libdeflate is intentionally more accepting than zlib, in a safe way. The specific case that your example triggers is the case where the encoded codeword lengths expand to more than the number of codewords. But there are a few others too.

There isn't any real problem with doing this, since in general corruption in a DEFLATE stream can only be detected by a checksum anyway.

Can you elaborate on why you consider this to be a problem?

Dongmuliang · 2023-01-09T23:34:06Z

Hi, @ebiggers , thanks for your explantion. Generally, any corruption of the compressed data should be timely notified to the users because it may lead to severe effects and difficult to make recovery. This is different from uncompressed text, which may probably still be useful despite the presence of some corrupted bytes.
Therefore, keeping it silent and accepting it is not a good choice.

It seems unlikely a real problem because there is a very low possibility that both checksum and corrupted data are satisfied at the same time. However, considering its wide usage, including some critical systems, the situation will be changed when a stealthy attacker is involved (e.g., an attacker may combine other bugs to hijack the checksum function, which can be used to correct the checksum maliciously).

ebiggers · 2023-01-10T03:32:34Z

Hi, @ebiggers , thanks for your explantion. Generally, any corruption of the compressed data should be timely notified to the users because it may lead to severe effects and difficult to make recovery.

Yes, which is why people who want to detect data corruption need to use a checksum (e.g. as the gzip and zlib wrapper formats for DEFLATE do), and not rely on the incidental built-in redundancies of the DEFLATE format which are much, much less effective at detecting data corruption. Corrupting a DEFLATE stream will very often create another valid DEFLATE stream. In contrast, just a 32-bit checksum will detect 99.99999997% of corruptions.

The question of when the DEFLATE decompressor should report an error when it's given an invalid stream, vs. remap it to a valid stream, is really just a minor quality-of-implementation question.

I'd argue that reporting DEFLATE decompression errors is actually sort of bad, because it misleads people into thinking that DEFLATE has built-in error detection, which it doesn't. You need a checksum if you want to detect data corruption.

That being said, I do understand that zlib is the standard implementation of DEFLATE, and it's generally better to be consistent with it... if only so that people running fuzzers that compare the implementations aren't confused.

It's easy to make libdeflate return an error when "the encoded codeword lengths expand to more than the number of codewords", so I'll do that. That handles two of your examples.

However, that still leaves the fact that DEFLATE streams can contain invalid litlen and offset symbols. Those are really hard to handle efficiently, other than by remapping them to valid symbols (as libdeflate does). I am probably not going to change what libdeflate does for those, as it does not seem worth it...

the situation will be changed when a stealthy attacker is involved (e.g., an attacker may combine other bugs to hijack the checksum function, which can be used to correct the checksum maliciously).

Detecting malicious modifications is totally irrelevant here, as a cryptographic MAC would be needed for that.

When there are more encoded litlen and offset codeword lengths than there are supposed to be based on the earlier fields, return LIBDEFLATE_BAD_DATA instead of ignoring the extra lengths. This isn't very important, but this aligns the behavior with zlib. Also add a regression test for this change. Update #288

Update #288

ebiggers · 2023-01-10T06:49:08Z

65da376 handled poc1 and poc4.

2574818 would handle poc2 and poc3, but it would be a more complex change. I'm not sure it's worth merging, given that the existing behavior is safe and acceptable too. It also would only detect invalid offset symbols when they occur less than 4 GiB into the stream. I haven't found a way to detect them at positions greater than 4 GiB without adding overhead to the decompression inner loop. I don't want to slow down decompression for everyone just because of some artificial test.

Dongmuliang · 2023-01-10T08:21:58Z

thanks for your explanation and bug fixing!

ebiggers changed the title ~~accepting invalid inputs~~ Reject all DEFLATE streams that zlib rejects Jan 10, 2023

ebiggers added the enhancement label Jan 10, 2023

ebiggers mentioned this issue Jan 10, 2023

deflate_decompress: return error when too many codeword lengths #289

Merged

ebiggers added a commit that referenced this issue Jan 10, 2023

deflate_decompress: return error when invalid symbol used

2574818

Update #288

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reject all DEFLATE streams that zlib rejects #288

Reject all DEFLATE streams that zlib rejects #288

Dongmuliang commented Jan 9, 2023

ebiggers commented Jan 9, 2023

Dongmuliang commented Jan 9, 2023

ebiggers commented Jan 10, 2023

ebiggers commented Jan 10, 2023

Dongmuliang commented Jan 10, 2023

Reject all DEFLATE streams that zlib rejects #288

Reject all DEFLATE streams that zlib rejects #288

Comments

Dongmuliang commented Jan 9, 2023

ebiggers commented Jan 9, 2023

Dongmuliang commented Jan 9, 2023

ebiggers commented Jan 10, 2023

ebiggers commented Jan 10, 2023

Dongmuliang commented Jan 10, 2023