Skip to content

Commit

Permalink
Zlib.gunzip should not fail with utf-8 strings
Browse files Browse the repository at this point in the history
zstream_discard_input was encoding and character-aware when given input is user-provided, so this discards `len` chars instead of `len` bytes.

Also Zlib.gunzip explains in its rdoc that it is equivalent with the following code, but this doesn't fail for UTF-8 String.

```ruby
string = %w[1f8b0800c28000000003cb48cdc9c9070086a6103605000000].pack("H*").force_encoding('UTF-8')
sio = StringIO.new(string)
gz = Zlib::GzipReader.new(sio, encoding: Encoding::ASCII_8BIT)
p gz.read
gz&.close
```
  • Loading branch information
sorah committed Aug 10, 2023
1 parent a68a1f7 commit 68a944f
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 1 deletion.
2 changes: 1 addition & 1 deletion ext/zlib/zlib.c
Original file line number Diff line number Diff line change
Expand Up @@ -923,7 +923,7 @@ zstream_discard_input(struct zstream *z, long len)
z->input = Qnil;
}
else {
z->input = rb_str_substr(z->input, len,
z->input = rb_str_subseq(z->input, len,
RSTRING_LEN(z->input) - len);
}
}
Expand Down
7 changes: 7 additions & 0 deletions test/zlib/test_zlib.rb
Original file line number Diff line number Diff line change
Expand Up @@ -1457,6 +1457,13 @@ def test_gunzip
assert_raise(Zlib::GzipFile::Error){ Zlib.gunzip(src) }
end

# Zlib.gunzip input is always considered a binary string, regardless of its String#encoding.
def test_gunzip_encoding
# vvvvvvvv = mtime, but valid UTF-8 string of U+0080
src = %w[1f8b0800c28000000003cb48cdc9c9070086a6103605000000].pack("H*").force_encoding('UTF-8')
assert_equal 'hello', Zlib.gunzip(src.freeze)
end

def test_gunzip_no_memory_leak
assert_no_memory_leak(%[-rzlib], "#{<<~"{#"}", "#{<<~'};'}")
d = Zlib.gzip("data")
Expand Down

0 comments on commit 68a944f

Please sign in to comment.