[Engine] Evaluation of ZSTD (Z-standard) compression algorithm for log data #19

Superskyyy · 2022-09-11T19:32:47Z

AIOps engine will receive a large amount of log data from SkyWalking, and we decided to utilize a Redis stream as the buffer before stream processing. One noticeable issue is that standard Zlib cannot compress logs well as they arrive one by one (not enough knowledge to compress), according to how-compression-algorithm-works; therefore, costing extra memory/disk.

Note we delete logs immediately from the stream after processing, but still, it's worth to compressing logs for the sake of network bandwidth and prevent overloading Redis.

So here comes ZSTD , which can facilitate our flow by two directions (I): simply replacing Zlib with ZSTD to achieve a 2x average compression speed. (II**): to utilize dictionary compressor, that is, learning from a small sample batch of logs and then using the knowledge to further boost compression, this could save extra memory/disk. (Todo evaluate how to execute the learning phase - do we learn one for each service? do we learn one unified model or periodically retrain? etc.)

Some public discussions that prove its feasibility:
https://groups.google.com/g/redis-db/c/slk-c33EZ7U/m/tx81gCMDDQAJ - adoption case
http://facebook.github.io/zstd/ - performance comparison
https://github.com/animalize/pyzstd - target python lib for implementation

=======================================
Initial Experimentation Results and suggestions are welcome:

The results below show ZSTD with dictionary training on a very small amount (first 1k, increasing to 5k doesn't help) log data from the same service would save 33% more memory/disk in storage for the remaining 500k data.

(further experiments are needed to see if generally applicable)
The additional idea is that if we compose a good dataset that represents "what a normal log would look like", then it can be used as universal training data, compression ratio could be further pushed.

Note: My docker Redis bandwidth is slow.

ZLIB
size of log in Megabyte 86.237173MB
Time taken to send 500k messages with batch 2000: 12.09048318862915 seconds
92MB used in actual Redis key

ZSTD with dict training
done training dict on first 1000 log samples
func:train_zstd took: 0.06115330 sec
size of log in Mega byte 54.717921MB
Time taken to send 500k messages with batch 2000: 8.131911993026733 seconds
58MB used in actual Redis key

ZSTD with basic compressor [default level]
size of log in Megabyte 88.285950MB
Time taken to send 500k messages with batch 2000: 9.860241889953613 seconds

ZSTD with rich memory compressor [default level] (a bit decreased compression ratio)
size of log in Megabyte 88.386098MB
Time taken to send 500k messages with batch 2000: 9.413931131362915 seconds

wu-sheng · 2022-09-11T23:18:09Z

Notice, Redis is not allowed as a dependency in the ASF, due to license.
It is OK you choose for now.

Superskyyy · 2022-09-11T23:39:44Z

Notice, Redis is not allowed as a dependency in the ASF, due to license.
It is OK you choose for now.

I checked Redis-core itself is BSD3, we do not use any extensions/modules that have any code with their RSAL license. Would that still be a problem? I'm a bit confused about these things and hope to learn more. Also, in skywalking-python, we have a docker-compose.yaml that deploys Redis during test.. Does it mean that Redis can be used in dev and testing as long as the final release artifact doesn't involve it?

In the future, we could switch to ship with kvrocks, but it unfortunately doesn't fully support stream consumer group commands yet (that we heavily rely on).

wu-sheng · 2022-09-12T00:21:23Z

Are you only using Redis core? Many modules would be AGPL, even common clause.

I didn't check the features you are going to use, so, this is a reminder.

Also, you mentioned it works as a buffer, that is usually queue server role, why do you choose redis queue?

Superskyyy · 2022-09-12T00:43:48Z

Are you only using Redis core? Many modules would be AGPL, even common clause.

I didn't check the features you are going to use, so, this is a reminder.

Also, you mentioned it works as a buffer, that is usually queue server role, why do you choose redis queue?

Thanks for the clarification! I just rechecked and it's strictly only Redis core as this screenshot shows streams engine in it. And I don't plan to use anything beyond core.

There are two main reasons why I choose Redis over a full-size MQ:

We also use Redis to store machine learning model snapshots and other metadata. So the reason is to not introduce another dependency, it will be too much for a secondary system (AIOps engine) for a secondary system (SkyWalking)
I find Redis Streams provide the exact same functionalities/speed as Kafka can offer to our use case, but are easier to work with/maintain than MQs.

I plan to add support for queue-based storage (Kafka) in the long run. For now, I think Redis streams work the best.

wu-sheng · 2022-09-12T00:48:44Z

OK, like I said, for now, even for an AGPL module, it is fine. Until you want to move this into the ASF.

Superskyyy · 2022-09-12T00:52:02Z

OK, like I said, for now, even for an AGPL module, it is fine. Until you want to move this into the ASF.

Understood, Thank you!

Superskyyy · 2022-09-16T19:48:36Z

TODO: implement a self-optimizer by monitoring the metric of compression ratio, if that degrades significantly, we retrain the dictionary and propagate it to each consumer to improve compression performance.

Superskyyy added Engine The work is on the engine side Core Core functionality that impacts the engine design labels Sep 11, 2022

Superskyyy self-assigned this Sep 11, 2022

Superskyyy added this to AIOps Engine Sep 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Engine] Evaluation of ZSTD (Z-standard) compression algorithm for log data #19

[Engine] Evaluation of ZSTD (Z-standard) compression algorithm for log data #19

Superskyyy commented Sep 11, 2022 •

edited

Loading

wu-sheng commented Sep 11, 2022

Superskyyy commented Sep 11, 2022 •

edited

Loading

wu-sheng commented Sep 12, 2022

Superskyyy commented Sep 12, 2022 •

edited

Loading

wu-sheng commented Sep 12, 2022

Superskyyy commented Sep 12, 2022

Superskyyy commented Sep 16, 2022

[Engine] Evaluation of ZSTD (Z-standard) compression algorithm for log data #19

[Engine] Evaluation of ZSTD (Z-standard) compression algorithm for log data #19

Comments

Superskyyy commented Sep 11, 2022 • edited Loading

wu-sheng commented Sep 11, 2022

Superskyyy commented Sep 11, 2022 • edited Loading

wu-sheng commented Sep 12, 2022

Superskyyy commented Sep 12, 2022 • edited Loading

wu-sheng commented Sep 12, 2022

Superskyyy commented Sep 12, 2022

Superskyyy commented Sep 16, 2022

Superskyyy commented Sep 11, 2022 •

edited

Loading

Superskyyy commented Sep 11, 2022 •

edited

Loading

Superskyyy commented Sep 12, 2022 •

edited

Loading