Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Compression job should process chunks in order of range_start #6755

Open
RobAtticus opened this issue Mar 8, 2024 · 3 comments · May be fixed by #7148
Open

[Enhancement]: Compression job should process chunks in order of range_start #6755

RobAtticus opened this issue Mar 8, 2024 · 3 comments · May be fixed by #7148
Labels
enhancement An enhancement to an existing feature for functionality internal-team-ask

Comments

@RobAtticus
Copy link
Member

What type of enhancement is this?

User experience

What subsystems and features will be improved?

Compression

What does the enhancement do?

The compression job should process the chunks in order of their range_start so that the experimental rollup functionality is more effective. Without an order, it's possible for chunks to processed in an order that prevent full rollups from being done, because it may start rolling up a chunk "later" in timeline, then go back in the timeline, but now that partially rolled up chunk is too large to rollup into the one further back.

Implementation challenges

No response

@RobAtticus RobAtticus added the enhancement An enhancement to an existing feature for functionality label Mar 8, 2024
@nikkhils
Copy link
Contributor

@RobAtticus the current show_chunks logic uses the hypertable_id and table_id numbering values to do the sorting of the returned chunks. Typically, if we consider append only data insertions then that should be in sync with the time ranges.

We could return the chunks in dimension slice order though

@RobAtticus
Copy link
Member Author

Is show_chunks used as part of the compression policy job? Basically what I've found is that sometimes the compression job will skip around in the set of chunks to be compressed, which leads to inefficient rollups. So this issue was about that, although I also think show_chunks should enforce dimension slice order rather than rely on the IDs (given backfills, untiering a chunk, etc)

@nikkhils
Copy link
Contributor

@RobAtticus yeah, show_chunks is used in the compression policy logic.

yeah, maybe dimension_slice based sorting is the way to go. We will need documentation changes also if we go this route.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement to an existing feature for functionality internal-team-ask
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants