Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write dict_id in zlib header in big-endian style. #207

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ZhaiMo15
Copy link

@ZhaiMo15 ZhaiMo15 commented Mar 18, 2022

To use a preset dictionary, a dict_id should be written in zlib header. That information should be written in big-endian style but not little-endian.

Mo Zhai [email protected]

@gbtucker
Copy link
Contributor

Looks like it should be written as bit-endian. However, how about int isal_read_zlib_header()?

int isal_read_zlib_header(struct inflate_state *state, struct isal_zlib_header *zlib_hdr)
{
...
                if (zlib_hdr->dict_flag) {
        case ISAL_ZLIB_DICT:
                        ret = fixed_size_read(state, &next_in, ZLIB_DICT_LEN);
...

The fixed_size_read() uses a cpy so maybe this works. Would we great to test on a big-endian machine.

@ZhaiMo15
Copy link
Author

I don't have a big-endian machine, I can't prove whether it works or not. However, zlib trailer is the same as zlib dict, if zlib trailer worked fine on a big-endian machine, I think zlib dict would work as well.

@rhpvorderman
Copy link
Contributor

Does ISA-L support any big-endian machines? x86_64 is little-endian and aarch64 is mostly little-endian, if I remember correctly.

@gbtucker
Copy link
Contributor

I don't have a big-endian machine, I can't prove whether it works or not. However, zlib trailer is the same as zlib dict, if zlib trailer worked fine on a big-endian machine, I think zlib dict would work as well.

Good point but the trailer is read as trailer = load_le_u32(next_in); in igzip/igzip_inflate.c:2004. check_zlib_checksum().

Does ISA-L support any big-endian machines? x86_64 is little-endian and aarch64 is mostly little-endian, if I remember correctly.

@iii-i did a lot of work to ensure s390, big-endian was correct but we don't have one in the test pool.

@ZhaiMo15
Copy link
Author

I still think it's the same.

  1. zlib dict read, in igzip/igzip_inflate.c:2166-2173:
case ISAL_ZLIB_DICT:
	ret = fixed_size_read(state, &next_in, ZLIB_DICT_LEN);
	if (ret) {
		state->block_state = ISAL_ZLIB_DICT;
		break;
	}

	zlib_hdr->dict_id = load_le_u32(next_in);
  1. zlib trailer read, in igzip/igzip_inflate.c:1998-2004:
ret = fixed_size_read(state, &next_in, ZLIB_TRAILER_LEN);
if (ret) {
	state->block_state = ISAL_CHECKSUM_CHECK;
	return ret;
}

trailer = load_le_u32(next_in);

@iii-i
Copy link
Contributor

iii-i commented Mar 22, 2022

You can get a big-endian machine for testing here: https://www.ibm.com/community/z/linuxone-cc/login-vm/

@rhpvorderman
Copy link
Contributor

Is anything blocking this? This seems like a major bug fix (current behavior not in line with zlib format specification).

@gbtucker
Copy link
Contributor

gbtucker commented Nov 5, 2022

@rhpvorderman, I don't think this one is correct. I haven't seen any confirmation that this is an actual bug and this fix may introduce a bug if it is currently correct. As I understood it, @MoZhai15 added this from observation in an unrelated issue without any BE testing.

I created a PR, #207, if it is a bug indeed.

Also this line was changed explicitly by Ilya to store_le_u32() in the port to BE as were all the header writes.

There is currently an issue with the igzip_wrapper_hdr_test but if you skip these it reports Error Code 17: Incorrect dictionary id found showing that this fix is at least inconsistent between how the dictid is written and read.

So I would label this as waiting on test and confirmation.

@rhpvorderman
Copy link
Contributor

I added some tests to python-isal. (https://github.com/pycompression/python-isal/blob/a6c2dcff4e2386d7c9b5a6406ccf399ad3a4d3de/tests/test_compat.py#L170)

  • deflate block with zlib wrapper created by zlib using a dict. decompression by isa-l.
  • deflate block with zlib wrapper created by isal using a dict. decompressed by zlib.

The later test succeeds. So it looks like ISA-L writes the dict correctly. The former test fails however. So ISA-L apparently cannot parse the zlib dict correctly? Or something goes wrong in the python wrapper code. That is also a possibility.

@pablodelara
Copy link
Contributor

Hi @rhpvorderman. Is this still an issue? Thanks!

@ZhaiMo15
Copy link
Author

ZhaiMo15 commented Jan 4, 2024

I still believe so, but it's been a long time, I cannot find my demo.
As I mentioned in #206.
My old case:

Moreover, I'd like to ask a question about isal_write_zlib_header(). Suppose dict_id is 0x62c0215, isal zlib header looks like below:

00000000  78 7d 15 02 2c 06 f3 54  
             5 |<- d id  ->|<- deflate ...

However default zlib header looks below:

00000000  78 7d 06 2c 02 15 f3 54  
             5 |<- d id  ->|<- deflate ...

Should dict_id be written in zlib header in big endian? Namely, replace

if (dict_flag)
	store_le_u32(out_buf + 2, z_hdr->dict_id);

by

if (dict_flag)
	store_be_u32(out_buf + 2, z_hdr->dict_id);

And after this patch, it works fine.

Consider of zlib trailer, write_trailer in BE
igzip/igzip.c:2016

store_be_u32(stream->next_out,
	(crc & 0xFFFF0000) | ((crc & 0xFFFF) + 1) % ADLER_MOD);

read_trailer in LE
igzip/igzip_inflate.c:2151

zlib_hdr->dict_id = load_le_u32(next_in);

Then zlib read dict in LE.
igzip/igzip.c:1130

store_le_u32(out_buf + 2, z_hdr->dict_id);

zlib write dict in LE.
igzip/igzip_inflate.c:1982

trailer = load_le_u32(next_in);

My point is trailer and dict are similar, so the read dict should be BE.

@rhpvorderman
Copy link
Contributor

Hi @rhpvorderman. Is this still an issue? Thanks!

The test cases in python-isal work. So not an issue anymore.

@pablodelara
Copy link
Contributor

Thanks! Actually, with this patch, the "igzip_wrapper_hdr_test" unit test fails, so unless the read has to be in big endian, this will break the tests. Since python-isal works, it looks like the header produced by isa-l is the same as the one produced by zlib?

@rhpvorderman
Copy link
Contributor

Looks like it. Python-isal tests compatibility against zlib. Maybe the best course of action is to keep the code as is until someone files a bug report about wrong headers?

@pablodelara
Copy link
Contributor

Looks like it. Python-isal tests compatibility against zlib. Maybe the best course of action is to keep the code as is until someone files a bug report about wrong headers?

Thanks. Looking into those tests, it looks like the header is not actually read, so it won't matter how the dict_id was written, the dictionary is passed directly to both interfaces. Am I right?

@rhpvorderman
Copy link
Contributor

Yes, but the header is also parsed by zlib (the zstream is initiated with wbits=15). I don't know if it ignores the dict id if inflateSetDict or deflateSetDict functions have been used beforehand. I am not that knowledgeable on zlib internals.

@pablodelara
Copy link
Contributor

@rhpvorderman It might be ignoring the header, because I checked and isal is not writing the dict id. As Greg Tucker pointed out here #206 (comment), this is expected and isal_write_zlib_header should be called.

@rhpvorderman
Copy link
Contributor

@pablodelara Ah, ok. Well in that case python-isal might do something wrong. It is not a common use case at all for python-isal, so the issue is certainly not a blocker for me. I also do not have time to investigate this further on short notice. Sorry for that.

@pablodelara
Copy link
Contributor

Thanks for the input. Let's defer this for the next release. It is not clear there is something wrong, as RFC1950 doesn't state the endianness of this dict_id field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants