-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make Distributed single shard table use logic similar to SingleShardTableShard #7572
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #7572 +/- ##
=======================================
Coverage 89.68% 89.68%
=======================================
Files 283 283
Lines 60427 60431 +4
Branches 7525 7526 +1
=======================================
+ Hits 54194 54199 +5
+ Misses 4079 4078 -1
Partials 2154 2154 |
@@ -1916,8 +1921,19 @@ CreateHashDistributedTableShards(Oid relationId, int shardCount, | |||
* tables which will not be part of an existing colocation group. Therefore, | |||
* we can directly use ShardReplicationFactor global variable here. | |||
*/ | |||
CreateShardsWithRoundRobinPolicy(relationId, shardCount, ShardReplicationFactor, | |||
useExclusiveConnection); | |||
if (shardCount == 1 && localTableEmpty && ShardReplicationFactor == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the localTableEmpty
needed? seems a bit unexpected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had too change this up (still testing this up) but nah i pushed this down to a lower layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing single shard distributed table sets min/max as NULL but a distributed table with single shard still would have min/max being Long.Min/Long.Max so we can't fully leverage the existing stuff. So had to move this down to the selection logic to handle this when picking the nodes for the shards.
86aa9ee
to
6b8df85
Compare
@@ -774,15 +774,64 @@ SELECT a.author_id as first_author, b.word_count as second_word_count | |||
FROM articles_hash a, articles_single_shard_hash b | |||
WHERE a.author_id = 10 and a.author_id = b.author_id | |||
ORDER BY 1,2 LIMIT 3; | |||
DEBUG: Creating router plan | |||
DEBUG: query has a single distribution column value: 10 | |||
first_author | second_word_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JelteF scenarios like this can change behavior coz the single shard is now on a different node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's a big issue (although we should modify the test accordingly).
The reason I don't think it's a big issue is that it requires setting citus.enable_non_colocated_router_query_pushdown
. Which already is a not-recommended GUC, because it basically requires that you don't ever do re-balancing.
" all nodes of the cluster"), | ||
NULL, | ||
&EnableSingleShardTableMultiNodePlacement, | ||
false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we want to backport this, I think it indeed makes sense for this to be false by default. But lets create a PR right after merging this to change the default to true for future releases.
(1 row) | ||
|
||
-- Now check placement: | ||
SELECT (SELECT colocation_id FROM citus_tables WHERE table_name = 'shard_count_table_1_inst_1'::regclass) != (SELECT colocation_id FROM citus_tables WHERE table_name = 'shard_count_table_1_inst_2'::regclass); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not testing the intended thing afaict. You want to test that the placement of the shard is different, not that the colocation id is different. The colocation id would also be different if citus.enable_single_shard_table_multi_node_placement
was set to off
.
The placement of the shards can be checked easily in pg_dist_shard_placement
or citus_shards
.
CREATE TABLE articles_single_shard_hash_mx_partition_inst1 (LIKE articles_single_shard_hash_mx); | ||
CREATE TABLE articles_single_shard_hash_mx_partition_inst2 (LIKE articles_single_shard_hash_mx); | ||
SET citus.shard_count TO 1; | ||
SET citus.enable_single_shard_table_multi_node_placement to on; | ||
SELECT create_distributed_table('articles_single_shard_hash_mx_partition_inst1', 'author_id', colocate_with => 'none'); | ||
SELECT create_distributed_table('articles_single_shard_hash_mx_partition_inst2', 'author_id', colocate_with => 'none'); | ||
set citus.enable_single_shard_table_multi_node_placement to off; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really think this test currently adds much in its current form. What was the intent here?
When creating distributed tables with a single shard (that has a shard_key_value) the placement logic today places all the tables on a single node. When a single shard distributed table is created support better random placement by using the colocationId (similar to schema_based_sharding) to place these tables across all the nodes in the cluster