Always use index for insert conflict resolution #7529

antekresic · 2024-12-11T14:41:24Z

During insert conflict resolution, we find the index on the compressed chunk based on the unique constraint on the chunk. However, we can/should always use the index since we have all the index column values in the slot thats being inserted so create scan keys until we find a column that doesn't exist in the uncompressed chunk.

akuzm

Maybe it makes sense to merge it to the 2.17.x directly. Approving since this is an urgent fix.

I think the proper fix would need more changes, e.g. the key_columns is not needed anymore. It also looks suboptimal that we're going through every index for every input tuple again.

erimatnor

Approving since this is an important fix.

Still, I think this function needs refactoring. Like @akuzm points out, key_columns is no longer needed and it looks like this is more complicated than it needs to be without delving too deep into it now.

Could we at least get rid of the key_columns arg in this fix since it is a dead argument?

Also, wondering if the other similar functions to build scan keys have this problem too since they use key_columns too? (I didn't actually check how it is used so cannot tell now whether it is an issue or not).

erimatnor · 2024-12-11T15:29:44Z

I think the proper fix would need more changes, e.g. the key_columns is not needed anymore. It also looks suboptimal that we're going through every index for every input tuple again.

100% agree. Finding the index should be separated from building the scan key. Then we should open the index in the chunk insert state or similar and keep it open until the insert is finished. Only the scankey needs to be updated for the next tuple.

antekresic · 2024-12-11T15:30:13Z

Actually, key columns are necessary, we just need to skip over columns which are not in key columns. Previous implementation could potentially give wrong results.

As far as the index caching, I don't disagree:
https://github.com/timescale/timescaledb/blob/main/tsl/src/compression/compression_dml.c#L416

erimatnor · 2024-12-11T15:37:43Z

Actually, key columns are necessary, we just need to skip over columns which are not in key columns. Previous implementation could potentially give wrong results.

Ok, I guess I need to take another look. How could it give wrong results? Did it decompress the wrong segments, or?

erimatnor · 2024-12-11T15:50:28Z

tsl/src/compression/compression_scankey.c

 			if (!bms_is_member(column_attno, key_columns))
 			{
-				break;
+				continue;


Will this really work? Don't you need at least the first index column as a scan key?

There is a check later for number of scankeys a particular index can use, we only use the index if we can generate at least one scan key.

But we can't use an index if we only match, e.g., the last index key right?

With this check you might generate a suffix.

Let's say an index includes a,b,c

then you can use it if you match <a>, <a,b>, <a,b,c>, or <a,c> right?

But you cannot use the index if you match <b> or <c> or <b,c>... or am I missing something?

Here it looks like you can generate a match for the latter examples too since you can skip any column, including the first one.

Ideally, we should also look for longest prefix match. Here it looks like we are happy with first matching index even if there is a better one that matches more keys.

Maybe the simple approach works for our case because we know what indexes exist, but I am not sure the code is really doing the right thing here.

Even though not optimal, there are cirumstances where picking an index without the leading edge is actually done in Postgres?

feike=# \d a Table "public.a" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- odd | boolean | | | j | integer | | | t | text | | | Indexes: "a_odd_j_idx" btree (odd, j) feike=# explain select * from a where j=4; QUERY PLAN ---------------------------------------------------------------------------- Index Scan using a_odd_j_idx on a (cost=0.42..18484.43 rows=1 width=1909) Index Cond: (j = 4) (2 rows)

It's more a statistics based algorithm that should pick the index though?

It's more a statistics based algorithm that should pick the index though?

Also, this index matching code isn't using stats at all.

FWIW, If you are confident that the current code is OK despite the concerns/questions I raised, then feel free to go ahead.

But we can't use an index if we only match, e.g., the last index key right?

It's going to return the correct results. It'll be a kind of "index seq scan", where you scan an index but don't actually have any search keys, only plain scan keys ("index cond" vs "index filter" in explain). We actually had a bug like this in our catalog lookups where we used a wrong index.

But we can't use an index if we only match, e.g., the last index key right?

It's going to return the correct results. It'll be a kind of "index seq scan", where you scan an index but don't actually have any search keys, only plain scan keys ("index cond" vs "index filter" in explain). We actually had a bug like this in our catalog lookups where we used a wrong index.

Sure, but this isn't only about correctness, but about performance. Mostly wondering if falling back to a seqscan isn't faster in that case... 🤷‍♂️

I highly doubt using a seq scan is ever faster, unless there is a very small number of batches. Just for the fact that we are looking for a specific tuple.

I think its an OK assumption that we want to do an index scan here whenever we can.

tsl/src/compression/compression_scankey.c

erimatnor · 2024-12-12T07:42:40Z

What's the current status of this fix? Is it ready to be merged?

During insert conflict resolution, we find the index on the compressed chunk based on the unique constraint on the chunk. However, we can/should always use the index since we have all the index column values in the slot thats being inserted so create scan keys until we find a column that doesn't exist in the uncompressed chunk.

antekresic force-pushed the antekresic/idx_find_fix branch from 39d7f54 to 4decd26 Compare December 11, 2024 14:43

antekresic marked this pull request as ready for review December 11, 2024 14:43

antekresic requested review from svenklemm and erimatnor December 11, 2024 14:44

antekresic self-assigned this Dec 11, 2024

antekresic added bug compression labels Dec 11, 2024

antekresic enabled auto-merge (rebase) December 11, 2024 14:44

akuzm approved these changes Dec 11, 2024

View reviewed changes

erimatnor approved these changes Dec 11, 2024

View reviewed changes

antekresic force-pushed the antekresic/idx_find_fix branch from 4decd26 to 75d728e Compare December 11, 2024 15:28

svenklemm approved these changes Dec 11, 2024

View reviewed changes

erimatnor self-requested a review December 11, 2024 15:40

erimatnor reviewed Dec 11, 2024

View reviewed changes

tsl/src/compression/compression_scankey.c Outdated Show resolved Hide resolved

antekresic force-pushed the antekresic/idx_find_fix branch from 75d728e to d7278e6 Compare December 11, 2024 16:16

antekresic force-pushed the antekresic/idx_find_fix branch from d7278e6 to 05752e0 Compare December 12, 2024 08:45

antekresic force-pushed the antekresic/idx_find_fix branch from 05752e0 to 0e3600f Compare December 12, 2024 09:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always use index for insert conflict resolution #7529

Always use index for insert conflict resolution #7529

antekresic commented Dec 11, 2024

akuzm left a comment

erimatnor left a comment

erimatnor commented Dec 11, 2024

antekresic commented Dec 11, 2024 •

edited

Loading

erimatnor commented Dec 11, 2024

erimatnor Dec 11, 2024

antekresic Dec 11, 2024

erimatnor Dec 11, 2024 •

edited

Loading

erimatnor Dec 11, 2024

feikesteenbergen Dec 11, 2024

erimatnor Dec 11, 2024

erimatnor Dec 11, 2024

akuzm Dec 11, 2024

erimatnor Dec 11, 2024 •

edited

Loading

antekresic Dec 12, 2024

erimatnor commented Dec 12, 2024

Always use index for insert conflict resolution #7529

Are you sure you want to change the base?

Always use index for insert conflict resolution #7529

Conversation

antekresic commented Dec 11, 2024

akuzm left a comment

Choose a reason for hiding this comment

erimatnor left a comment

Choose a reason for hiding this comment

erimatnor commented Dec 11, 2024

antekresic commented Dec 11, 2024 • edited Loading

erimatnor commented Dec 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erimatnor Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erimatnor Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erimatnor commented Dec 12, 2024

antekresic commented Dec 11, 2024 •

edited

Loading

erimatnor Dec 11, 2024 •

edited

Loading

erimatnor Dec 11, 2024 •

edited

Loading