Skip to content

About the On-stream analytics and storage cost of Fluss #177

Answered by wuchong
JohnZp asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @JohnZp , thanks for the detailed testing! I will answer the questions below:

why is it so much slower than just the data on the lake, isn't the incremental part of the fluss all in memory?

The incremental part is the changelog part from the lake table snapshot time until now. Log and Changlog are stored in local disk (and tiered to remote storage if configured), so it is not stored in memory. The current implementation of union read is very basic with many optimizations planned in future versions. Currently, for a simple count() of union read on the incremental part, it needs to read all the incremental changelog data to the query engine. This is inefficient and can be optimized to …

Replies: 3 comments 5 replies

Comment options

You must be logged in to vote
1 reply
@JohnZp
Comment options

Answer selected by polyzos
Comment options

You must be logged in to vote
1 reply
@JohnZp
Comment options

Comment options

You must be logged in to vote
3 replies
@JohnZp
Comment options

@wuchong
Comment options

@JohnZp
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants