Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How about a paired web minifier? #65

Open
JoeUX opened this issue May 2, 2020 · 2 comments
Open

How about a paired web minifier? #65

JoeUX opened this issue May 2, 2020 · 2 comments

Comments

@JoeUX
Copy link

JoeUX commented May 2, 2020

Enhancement request/idea: Consider writing a minifier to pair with libdeflate, since a major use case of libdeflate is probably to compress web content – HTML, JS, and CSS files.

It might be possible to leverage libdeflate's parsing logic toward a web minifier's parser. Relatedly, if you knew that the input to the gzip compressor was minified HTML, CSS, and JS, could you accelerate the compression? Maybe the parser or match finder? Or, what if you knew that you weren't going to have any matches longer than say 40 bytes?

Some of this might be easier if you knew that the input wasn't just minified web content, but more specifically was content that you minified, according to your minifier rules, standards, or spec. So it would have certain features or patterns. Line endings would be normalized, there'd never be certain forms of whitespace in code areas, etc. We could even have metadata for these HTML, JS, and CSS files that reported the length of the file, the longest length repeated string, the max number of repeats, etc. Could that metadata significantly help libdeflate?

I'm not aware of a minifier written in C. It would probably be the fastest minifier on earth, by a wide margin, especially if you used the SIMD in parsing and matching that you use in libdeflate, and it would be an interesting and popular project on its own. I'm not good with C, but I could write a spec for the minifier and/or the minifier output. Some minifiers are unsafe and break websites, so it would be nice to have one that actually had a spec, and was safe.

@HansBrende
Copy link

@ebiggers I am also curious to know if there is a possible performance enhancement if you knew in advance that there are no matches longer than some configurable number of bytes (use-case: I'm compressing streams of JSON objects, and each object contains a UUID. Therefore, no match could possibly be longer than the maximum distance between UUIDs).

@ebiggers
Copy link
Owner

ebiggers commented Jun 2, 2023

I am also curious to know if there is a possible performance enhancement if you knew in advance that there are no matches longer than some configurable number of bytes

In principle, sure. In practice, it wouldn't be much and would be hard to implement, considering that the limit presumably would need to be dynamic and not known at compilation time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants