From b2936808b8b205566d2bc4ccbb9c3afaf9598b9d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E7=B5=B5=E7=A9=BA=E4=BA=8B=E3=82=B9=E3=83=94=E3=83=AA?= =?UTF-8?q?=E3=83=83=E3=83=88?= Date: Tue, 25 Jun 2024 20:39:41 +0800 Subject: [PATCH 1/3] [Doc] Doc for shared-data tablet parallel Scan (#47458) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: 絵空事スピリット (cherry picked from commit 17c2eded7366c79d3759e3c1eaedc1bf030261ac) # Conflicts: # docs/en/reference/System_variable.md # docs/zh/reference/System_variable.md --- docs/en/reference/System_variable.md | 63 ++++++++++++++++++++++++++++ docs/zh/reference/System_variable.md | 20 +++++++++ 2 files changed, 83 insertions(+) diff --git a/docs/en/reference/System_variable.md b/docs/en/reference/System_variable.md index 746638bc30357..00ca800bf074e 100644 --- a/docs/en/reference/System_variable.md +++ b/docs/en/reference/System_variable.md @@ -294,7 +294,70 @@ This variable is supported from v2.5.18 and v3.1.7. However, if there are some hotspot tablets, this feature may degrade the query performance because it directs the queries to the same BE, making it unable to fully use the resources of multiple BEs in high-concurrency scenarios. +<<<<<<< HEAD Default value: `false`, which means the system selects a replica for each query. This feature is supported since 2.5.6, 3.0.8, and 3.1.4. +======= +* **Default**: false, which means the system selects a replica for each query. +* **Introduced in**: v2.5.6, v3.0.8, v3.1.4, and v3.2.0. + + +### enable_lake_tablet_internal_parallel + +* **Description**: Whether to enable Parallel Scan for Cloud-native tables in a shared-data cluster. +* **Default**: false +* **Data type**: Boolean +* **Introduced in**: v3.3.0 + +### tablet_internal_parallel_mode + +* **Description**: Internal Parallel Scan strategy of tablets. Valid Values: + * `auto`: When the number of Tablets to be scanned on BE or CN nodes is less than the Degree of Parallelism (DOP), the system automatically determines whether Parallel Scan is needed based on the estimated size of the Tablets. + * `force_split`: Forces the splitting of Tablets and performs Parallel Scan. +* **Default**: auto +* **Data type**: String +* **Introduced in**: v2.5.0 + +### enable_scan_datacache + +* **Description**: Specifies whether to enable the Data Cache feature. After this feature is enabled, StarRocks caches hot data read from external storage systems into blocks, which accelerates queries and analysis. For more information, see [Data Cache](../data_source/data_cache.md). In versions prior to 3.2, this variable was named as `enable_scan_block_cache`. +* **Default**: false +* **Introduced in**: v2.5 + +### enable_populate_datacache + +* **Description**: Specifies whether to cache data blocks read from external storage systems in StarRocks. If you do not want to cache data blocks read from external storage systems, set this variable to `false`. Default value: true. This variable is supported from 2.5. In versions prior to 3.2, this variable was named as `enable_scan_block_cache`. +* **Default**: true +* **Introduced in**: v2.5 + +### enable_tablet_internal_parallel + +* **Description**: Whether to enable adaptive parallel scanning of tablets. After this feature is enabled, multiple threads can be used to scan one tablet by segment, increasing the scan concurrency. +* **Default**: true +* **Introduced in**: v2.3 + +### enable_query_cache + +* **Description**: Specifies whether to enable the Query Cache feature. Valid values: true and false. `true` specifies to enable this feature, and `false` specifies to disable this feature. When this feature is enabled, it works only for queries that meet the conditions specified in the application scenarios of [Query Cache](../using_starrocks/query_cache.md#application-scenarios). +* **Default**: false +* **Introduced in**: v2.5 + +### enable_adaptive_sink_dop + +* **Description**: Specifies whether to enable adaptive parallelism for data loading. After this feature is enabled, the system automatically sets load parallelism for INSERT INTO and Broker Load jobs, which is equivalent to the mechanism of `pipeline_dop`. For a newly deployed v2.5 StarRocks cluster, the value is `true` by default. For a v2.5 cluster upgraded from v2.4, the value is `false`. +* **Default**: false +* **Introduced in**: v2.5 + +### enable_pipeline_engine + +* **Description**: Specifies whether to enable the pipeline execution engine. `true` indicates enabled and `false` indicates the opposite. Default value: `true`. +* **Default**: true + +### enable_sort_aggregate + +* **Description**: Specifies whether to enable sorted streaming. `true` indicates sorted streaming is enabled to sort data in data streams. +* **Default**: false +* **Introduced in**: v2.5 +>>>>>>> 17c2eded73 ([Doc] Doc for shared-data tablet parallel Scan (#47458)) ### enable_global_runtime_filter diff --git a/docs/zh/reference/System_variable.md b/docs/zh/reference/System_variable.md index e902573944fcf..43e9265c171b5 100644 --- a/docs/zh/reference/System_variable.md +++ b/docs/zh/reference/System_variable.md @@ -251,7 +251,27 @@ group-by-count-distinct 查询中为 count distinct 列设置的分桶数。该 如果待查询的表中存在大量 tablet,开启该特性会对性能有提升,因为会更快的将 tablet 的元信息以及数据缓存在内存中。但是,如果查询存在一些热点 tablet,开启该特性可能会导致性能有所退化,因为该特性倾向于将一个热点 tablet 的查询调度到相同的 BE 上,在高并发的场景下无法充分利用多台 BE 的资源。 +<<<<<<< HEAD 默认值:`false`,表示使用原来的机制,即每次查询会从多个副本中选择一个。自 2.5.6、3.0.8、3.1.4 版本起,StarRocks 支持该参数。 +======= +### enable_lake_tablet_internal_parallel + +* 描述:是否开启存算分离集群内云原生表的 Tablet 并行 Scan. +* 默认值:false +* 类型:Boolean +* 引入版本:v3.3.0 + +### tablet_internal_parallel_mode + +* 描述:Tablet 内部并行 Scan 策略。有效值: + * `auto`: 在 BE 或 CN 节点需要扫描的 Tablet 数小于 DOP 时,系统根据预估的 Tablet 大小自动判断是否需要并行 Scan。 + * `force_split`: 强制对 Tablet 进行拆分和并行扫描。 +* 默认值:auto +* 类型:String +* 引入版本:v2.5.0 + +### enable_scan_datacache +>>>>>>> 17c2eded73 ([Doc] Doc for shared-data tablet parallel Scan (#47458)) ### enable_scan_block_cache(2.5 及以后) From cf09db0699d8baba1092d4a80fcf0624da2289b2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E7=B5=B5=E7=A9=BA=E4=BA=8B=E3=82=B9=E3=83=94=E3=83=AA?= =?UTF-8?q?=E3=83=83=E3=83=88?= Date: Wed, 26 Jun 2024 10:00:31 +0800 Subject: [PATCH 2/3] Update System_variable.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: 絵空事スピリット Signed-off-by: 絵空事スピリット --- docs/en/reference/System_variable.md | 42 ---------------------------- 1 file changed, 42 deletions(-) diff --git a/docs/en/reference/System_variable.md b/docs/en/reference/System_variable.md index 00ca800bf074e..be719611f2054 100644 --- a/docs/en/reference/System_variable.md +++ b/docs/en/reference/System_variable.md @@ -294,19 +294,7 @@ This variable is supported from v2.5.18 and v3.1.7. However, if there are some hotspot tablets, this feature may degrade the query performance because it directs the queries to the same BE, making it unable to fully use the resources of multiple BEs in high-concurrency scenarios. -<<<<<<< HEAD Default value: `false`, which means the system selects a replica for each query. This feature is supported since 2.5.6, 3.0.8, and 3.1.4. -======= -* **Default**: false, which means the system selects a replica for each query. -* **Introduced in**: v2.5.6, v3.0.8, v3.1.4, and v3.2.0. - - -### enable_lake_tablet_internal_parallel - -* **Description**: Whether to enable Parallel Scan for Cloud-native tables in a shared-data cluster. -* **Default**: false -* **Data type**: Boolean -* **Introduced in**: v3.3.0 ### tablet_internal_parallel_mode @@ -329,36 +317,6 @@ This variable is supported from v2.5.18 and v3.1.7. * **Default**: true * **Introduced in**: v2.5 -### enable_tablet_internal_parallel - -* **Description**: Whether to enable adaptive parallel scanning of tablets. After this feature is enabled, multiple threads can be used to scan one tablet by segment, increasing the scan concurrency. -* **Default**: true -* **Introduced in**: v2.3 - -### enable_query_cache - -* **Description**: Specifies whether to enable the Query Cache feature. Valid values: true and false. `true` specifies to enable this feature, and `false` specifies to disable this feature. When this feature is enabled, it works only for queries that meet the conditions specified in the application scenarios of [Query Cache](../using_starrocks/query_cache.md#application-scenarios). -* **Default**: false -* **Introduced in**: v2.5 - -### enable_adaptive_sink_dop - -* **Description**: Specifies whether to enable adaptive parallelism for data loading. After this feature is enabled, the system automatically sets load parallelism for INSERT INTO and Broker Load jobs, which is equivalent to the mechanism of `pipeline_dop`. For a newly deployed v2.5 StarRocks cluster, the value is `true` by default. For a v2.5 cluster upgraded from v2.4, the value is `false`. -* **Default**: false -* **Introduced in**: v2.5 - -### enable_pipeline_engine - -* **Description**: Specifies whether to enable the pipeline execution engine. `true` indicates enabled and `false` indicates the opposite. Default value: `true`. -* **Default**: true - -### enable_sort_aggregate - -* **Description**: Specifies whether to enable sorted streaming. `true` indicates sorted streaming is enabled to sort data in data streams. -* **Default**: false -* **Introduced in**: v2.5 ->>>>>>> 17c2eded73 ([Doc] Doc for shared-data tablet parallel Scan (#47458)) - ### enable_global_runtime_filter Whether to enable global runtime filter (RF for short). RF filters data at runtime. Data filtering often occurs in the Join stage. During multi-table joins, optimizations such as predicate pushdown are used to filter data, in order to reduce the number of scanned rows for Join and the I/O in the Shuffle stage, thereby speeding up the query. From 7d3c73615b7cef5798dbd0786db071b84587a695 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E7=B5=B5=E7=A9=BA=E4=BA=8B=E3=82=B9=E3=83=94=E3=83=AA?= =?UTF-8?q?=E3=83=83=E3=83=88?= Date: Wed, 26 Jun 2024 10:03:40 +0800 Subject: [PATCH 3/3] Update System_variable.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: 絵空事スピリット --- docs/zh/reference/System_variable.md | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/docs/zh/reference/System_variable.md b/docs/zh/reference/System_variable.md index 43e9265c171b5..34f7b426dd0c1 100644 --- a/docs/zh/reference/System_variable.md +++ b/docs/zh/reference/System_variable.md @@ -251,15 +251,7 @@ group-by-count-distinct 查询中为 count distinct 列设置的分桶数。该 如果待查询的表中存在大量 tablet,开启该特性会对性能有提升,因为会更快的将 tablet 的元信息以及数据缓存在内存中。但是,如果查询存在一些热点 tablet,开启该特性可能会导致性能有所退化,因为该特性倾向于将一个热点 tablet 的查询调度到相同的 BE 上,在高并发的场景下无法充分利用多台 BE 的资源。 -<<<<<<< HEAD 默认值:`false`,表示使用原来的机制,即每次查询会从多个副本中选择一个。自 2.5.6、3.0.8、3.1.4 版本起,StarRocks 支持该参数。 -======= -### enable_lake_tablet_internal_parallel - -* 描述:是否开启存算分离集群内云原生表的 Tablet 并行 Scan. -* 默认值:false -* 类型:Boolean -* 引入版本:v3.3.0 ### tablet_internal_parallel_mode @@ -270,9 +262,6 @@ group-by-count-distinct 查询中为 count distinct 列设置的分桶数。该 * 类型:String * 引入版本:v2.5.0 -### enable_scan_datacache ->>>>>>> 17c2eded73 ([Doc] Doc for shared-data tablet parallel Scan (#47458)) - ### enable_scan_block_cache(2.5 及以后) 是否开启 Data Cache 特性。该特性开启之后,StarRocks 通过将外部存储系统中的热数据缓存成多个 block,加速数据查询和分析。更多信息,参见 [Data Cache](../data_source/data_cache.md)。该特性从 2.5 版本开始支持。