Skip to content

Commit

Permalink
[Improvement](docs) refine query acceleration doc (apache#1591)
Browse files Browse the repository at this point in the history
## Versions 

- [x] dev
- [ ] 3.0
- [ ] 2.1
- [ ] 2.0

## Languages

- [x] Chinese
- [x] English

## Docs Checklist

- [ ] Checked by AI
- [ ] Test Cases Built
  • Loading branch information
xzj7019 authored and echo-hhj committed Jan 6, 2025
1 parent 708de6e commit 01dc26c
Show file tree
Hide file tree
Showing 7 changed files with 12 additions and 12 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ For cases of using Doris Explain output to perform plan-level tuning, please ref

The Explain tool described above outlines the execution plan for an SQL query, such as planning a join operation between tables t1 and t2 as a Hash Join, with t1 designated as the build side and t2 as the probe side. When the SQL query is actually executed, understanding how much time each specific execution step takes—for instance, how long the build phase lasts and how long the probe phase lasts—is crucial for performance analysis and tuning. The Profile tool provides detailed execution information for this purpose. The following section first gives an overview of the Profile file structure and then introduces the meanings of execution times in Merged Profile, Execution Profile, and PipelineTask.

## Profile File Structure
### Profile File Structure

A Profile file contains several main sections:

Expand All @@ -59,7 +59,7 @@ A Profile file contains several main sections:

5. The detailed information about the execution side is mainly contained in the last part. Next, we will mainly introduce what information the Profile can provide for performance analysis.

## Merged Profile
### Merged Profile

To help users more accurately analyze performance bottlenecks, Doris provides aggregated profile results for each operator. Taking the EXCHANGE_OPERATOR as an example:

Expand Down Expand Up @@ -96,7 +96,7 @@ In Doris, each operator executes concurrently based on the concurrency level set

WaitForDependencyTime varies for each Operator, as the execution dependencies differ. For instance, in the case of an EXCHANGE_OPERATOR, the dependency is on data being sent by upstream operators via RPC. Thus, WaitForDependencyTime in this context specifically refers to the time spent waiting for upstream operators to send data.

## Execution Profile
### Execution Profile

Unlike the Merged Profile, the Execution Profile displays detailed metrics for a specific concurrent execution. Taking the exchange operator with id=4 as an example:

Expand Down Expand Up @@ -125,7 +125,7 @@ EXCHANGE_OPERATOR (id=4):(ExecTime: 706.351us)

In this profile, for instance, LocalBytesReceived is a metric specific to the exchange operator and not found in other operators, hence it is not included in the Merged Profile.

## PipelineTask Execution Time
### PipelineTask Execution Time

In Doris, a PipelineTask consists of multiple operators. When analyzing the execution time of a PipelineTask, several key aspects need to be focused on:

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
{
"title": "Adjusting Join Shuffle with Hint",
"language": "zh-CN"
"language": "en"
}
---

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
{
"title": "Controlling Hints with CBO Rule",
"language": "zh-CN"
"title": "Controlling CBO Rule with Hint",
"language": "en"
}
---

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
{
"title": "DML Tuning Plan",
"language": "zh-CN"
"language": "en"
}
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@ specific language governing permissions and limitations
under the License.
-->

Defining colocate group is an efficient way of Join, through which the execution engine can effectively avoid the transmission overhead of input data in Join operations (for an introduction to Colocate Group, see [JOIN](../../../query-data/join))
Defining colocate group is an efficient way of Join, through which the execution engine can effectively avoid the transmission overhead of input data in Join operations (for an introduction to Colocate Group, see [Colocation Join](../../colocation-join.md))

However, in some use cases, even if a Colocate Group has been successfully established, the execution plan may still show as Shuffle Join or Bucket Shuffle Join. This situation typically occurs when Doris is in the process of data organization, for instance, it may be migrating tablets between BE to ensure a more balanced distribution of data across multiple BE.

You can view the Colocate Group status using the command `show proc "/colocation_group"`; as shown in the figure below: If `IsStable` appears as false, it indicates that there are unavailable `colocation_group` instances.
You can view the Colocate Group status using the command `show proc "/colocation_group"`; as shown in the figure below: If `IsStable` appears as false, it indicates that there are unavailable `colocation group` instances.

![Optimizing Join with Colocate Group](/images/use-colocate-group.jpg)
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
{
"title": "Transparent Rewriting by Async-Materialized View",
"language": "zh-CN"
"language": "en"
}
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ specific language governing permissions and limitations
under the License.
-->

Colocate Group 是一种高效的 Join 方式,使得执行引擎能有效地规避 Join 操作中数据的shuffle开销。相关原理介绍和案例参考详见 [Colocate-join](../../colocation-join.md)
Colocate Group 是一种高效的 Join 方式,使得执行引擎能有效地规避 Join 操作中数据的shuffle开销。相关原理介绍和案例参考详见 [Colocation Join](../../colocation-join.md)

:::tip 注意
- 在某些场景下,即使已经成功建立了 Colocate Group,执行计划(plan)仍然可能会显示为 `Shuffle Join``Bucket Shuffle Join`。这种情况通常发生在 Doris 正在进行数据整理的过程中,比如,它可能在 BE 间迁移 tablet,以确保数据在多个 BE 之间的分布达到更加均衡的状态。
Expand Down

0 comments on commit 01dc26c

Please sign in to comment.