Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make Distributed single shard table use logic similar to SingleShardTableShard #7572

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

visridha
Copy link

@visridha visridha commented Apr 3, 2024

When creating distributed tables with a single shard (that has a shard_key_value) the placement logic today places all the tables on a single node. When a single shard distributed table is created support better random placement by using the colocationId (similar to schema_based_sharding) to place these tables across all the nodes in the cluster

Copy link

codecov bot commented Apr 3, 2024

Codecov Report

Merging #7572 (d2fc258) into main (3929a5b) will increase coverage by 0.00%.
Report is 1 commits behind head on main.
The diff coverage is 100.00%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7572   +/-   ##
=======================================
  Coverage   89.68%   89.68%           
=======================================
  Files         283      283           
  Lines       60427    60431    +4     
  Branches     7525     7526    +1     
=======================================
+ Hits        54194    54199    +5     
+ Misses       4079     4078    -1     
  Partials     2154     2154           

@@ -1916,8 +1921,19 @@ CreateHashDistributedTableShards(Oid relationId, int shardCount,
* tables which will not be part of an existing colocation group. Therefore,
* we can directly use ShardReplicationFactor global variable here.
*/
CreateShardsWithRoundRobinPolicy(relationId, shardCount, ShardReplicationFactor,
useExclusiveConnection);
if (shardCount == 1 && localTableEmpty && ShardReplicationFactor == 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the localTableEmpty needed? seems a bit unexpected

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had too change this up (still testing this up) but nah i pushed this down to a lower layer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing single shard distributed table sets min/max as NULL but a distributed table with single shard still would have min/max being Long.Min/Long.Max so we can't fully leverage the existing stuff. So had to move this down to the selection logic to handle this when picking the nodes for the shards.

@visridha visridha force-pushed the users/visridha/SingleShardHashDistributed branch from 86aa9ee to 6b8df85 Compare April 3, 2024 19:28
@visridha visridha changed the title make Distributed single shard table use SingleShardTableShard make Distributed single shard table use logic similar to SingleShardTableShard Apr 3, 2024
@@ -774,15 +774,64 @@ SELECT a.author_id as first_author, b.word_count as second_word_count
FROM articles_hash a, articles_single_shard_hash b
WHERE a.author_id = 10 and a.author_id = b.author_id
ORDER BY 1,2 LIMIT 3;
DEBUG: Creating router plan
DEBUG: query has a single distribution column value: 10
first_author | second_word_count
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JelteF scenarios like this can change behavior coz the single shard is now on a different node

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's a big issue (although we should modify the test accordingly).

The reason I don't think it's a big issue is that it requires setting citus.enable_non_colocated_router_query_pushdown. Which already is a not-recommended GUC, because it basically requires that you don't ever do re-balancing.

" all nodes of the cluster"),
NULL,
&EnableSingleShardTableMultiNodePlacement,
false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we want to backport this, I think it indeed makes sense for this to be false by default. But lets create a PR right after merging this to change the default to true for future releases.

(1 row)

-- Now check placement:
SELECT (SELECT colocation_id FROM citus_tables WHERE table_name = 'shard_count_table_1_inst_1'::regclass) != (SELECT colocation_id FROM citus_tables WHERE table_name = 'shard_count_table_1_inst_2'::regclass);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not testing the intended thing afaict. You want to test that the placement of the shard is different, not that the colocation id is different. The colocation id would also be different if citus.enable_single_shard_table_multi_node_placement was set to off.

The placement of the shards can be checked easily in pg_dist_shard_placement or citus_shards.

Comment on lines +386 to +392
CREATE TABLE articles_single_shard_hash_mx_partition_inst1 (LIKE articles_single_shard_hash_mx);
CREATE TABLE articles_single_shard_hash_mx_partition_inst2 (LIKE articles_single_shard_hash_mx);
SET citus.shard_count TO 1;
SET citus.enable_single_shard_table_multi_node_placement to on;
SELECT create_distributed_table('articles_single_shard_hash_mx_partition_inst1', 'author_id', colocate_with => 'none');
SELECT create_distributed_table('articles_single_shard_hash_mx_partition_inst2', 'author_id', colocate_with => 'none');
set citus.enable_single_shard_table_multi_node_placement to off;
Copy link
Contributor

@JelteF JelteF Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really think this test currently adds much in its current form. What was the intent here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants