Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify compressed pathkey lookup #5960

Merged
merged 1 commit into from
Aug 15, 2023
Merged

Conversation

akuzm
Copy link
Member

@akuzm akuzm commented Aug 10, 2023

When looking up a pathkey for compressed scan, we used to do a lot of work, including a quadratic lookup through all the equivalence members, to always arrive at the same canonical pathkey we started from. Just remove this useless code for a significant planning speedup.

This uncovers two bugs, fix them as well (see comments below).

Disable-check: force-changelog-file

@codecov
Copy link

codecov bot commented Aug 10, 2023

Codecov Report

Merging #5960 (a02935e) into main (04ce1bc) will decrease coverage by 0.03%.
The diff coverage is 90.24%.

@@            Coverage Diff             @@
##             main    #5960      +/-   ##
==========================================
- Coverage   87.03%   87.01%   -0.03%     
==========================================
  Files         243      243              
  Lines       56023    55959      -64     
  Branches    12402    12382      -20     
==========================================
- Hits        48760    48692      -68     
+ Misses       4924     4905      -19     
- Partials     2339     2362      +23     
Files Changed Coverage Δ
src/planner/planner.h 85.71% <ø> (ø)
tsl/src/nodes/decompress_chunk/decompress_chunk.c 89.30% <90.24%> (-0.61%) ⬇️

... and 34 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Comment on lines +1497 to +1519
/*
* Note that we have to separately generate the parameterized path info
* for decompressed chunk path. The compressed parameterized path only
* checks the clauses on segmentby columns, not on the compressed
* columns.
*/
path->custom_path.path.param_info =
get_baserel_parampathinfo(root,
info->chunk_rel,
compressed_path->param_info->ppi_req_outer);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug was hidden because in add_segmentby_to_equivalence_class, besides the proper EquivalenceMembers for segmentby columns, we also generated nonsensical EMs for compressed columns. They translated to impossible join clauses and prevented the formation of parameterized paths where the decompressed and compressed parameterizations would be different.

@@ -1352,6 +1303,17 @@ add_segmentby_to_equivalence_class(EquivalenceClass *cur_ec, CompressionInfo *in
if (context->current_col_info == NULL)
continue;

if (!COMPRESSIONCOL_IS_SEGMENT_BY(context->current_col_info))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug, we didn't check if the column is a segmentby, and generated nonsensical EquivalenceMembers for compressed columns.

@akuzm akuzm changed the title Check if the compressed pathkey lookup function is just useless Simplify compressed pathkey lookup Aug 11, 2023
Filter: (("time" > 'Wed Jan 19 16:00:00 2000 PST'::timestamp with time zone) AND ("time" < 'Thu Jan 20 17:00:00 2000 PST'::timestamp with time zone))
Rows Removed by Filter: 1
-> Custom Scan (DecompressChunk) on _hyper_1_3_chunk met (actual rows=1 loops=2)
Filter: (("time" > 'Wed Jan 19 16:00:00 2000 PST'::timestamp with time zone) AND ("time" < 'Thu Jan 20 17:00:00 2000 PST'::timestamp with time zone) AND ("*VALUES*".column1 = device_id) AND ("*VALUES*".column2 = v0))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redundant filter on device_id (duplicate of index filter) is another bug, will be fixed here: #5962

@akuzm akuzm marked this pull request as ready for review August 11, 2023 12:22
@github-actions
Copy link

@mkindahl, @sb230132: please review this pull request.

Powered by pull-review


if (compressed_em == NULL)
{
/* Not a segmentby column, rest of pathkeys should be handled by compress_orderby */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to test this condition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double-checked and seems like it is redundant with the loop condition, so I changed it to an assertion.

* then other ec classes */
prepend_ec_for_seqnum(root, info, sort_info, var, sortop, nulls_first);
/*
* Create the EquivalenceClass for the sequence number, so that we can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if sequence number is the right term here. Wouldn't compressed chunk be more appropriate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update it to say "the sequence number column". The rest of the pathkeys is fulfilled by sorting on this column, so here we create a EC for it.

@jnidzwetzki
Copy link
Contributor

Thank you for this PR. I assume this change fixes the planner regression we discussed. Do we have a tsbench run for this PR?

@akuzm
Copy link
Member Author

akuzm commented Aug 15, 2023

Thank you for this PR. I assume this change fixes the planner regression we discussed. Do we have a tsbench run for this PR?

I didn't run tsbench, but locally the result was good, about 2x improvement in planning speed for the queries like the one you added to tsbench.

When looking up a pathkey for compressed scan, we used to do a lot of
work, including a quadratic lookup through all the equivalence members,
to always arrive at the same canonical pathkey we started from. Just
remove this useless code for a significant planning speedup.

This uncovers two bugs in parameterization of decompressed paths and
generation of equivalence members for segmentby columns, fix them as
well.
@akuzm akuzm enabled auto-merge (rebase) August 15, 2023 12:30
@akuzm akuzm merged commit 3a27669 into timescale:main Aug 15, 2023
34 checks passed
@akuzm akuzm deleted the compressed-pathkey branch August 15, 2023 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants