Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] reconstruct bitmap index #46061

Merged
merged 5 commits into from
May 29, 2024
Merged

[Doc] reconstruct bitmap index #46061

merged 5 commits into from
May 29, 2024

Conversation

hellolilyliuyi
Copy link
Contributor

@hellolilyliuyi hellolilyliuyi commented May 22, 2024

Why I'm doing:

What I'm doing:

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.3
    • 3.2
    • 3.1
    • 3.0
    • 2.5

@github-actions github-actions bot added the documentation Improvements or additions to documentation label May 22, 2024
@hellolilyliuyi hellolilyliuyi changed the title [Doc] Update bitmap index [Doc] reconstrcut bitmap index May 22, 2024
@hellolilyliuyi hellolilyliuyi changed the title [Doc] reconstrcut bitmap index [Doc] Reconstruct bitmap index May 22, 2024
@hellolilyliuyi hellolilyliuyi changed the title [Doc] Reconstruct bitmap index [Doc] reconstruct bitmap index May 22, 2024
@github-actions github-actions bot added the 3.3 label May 22, 2024
trueeyu
trueeyu previously approved these changes May 23, 2024

### 如何合理设计 Bitmap 索引,以便加速查询

选择 Bitmap 索引的首要考虑因素是**列的基数和 Bitmap 索引对查询的过滤效果。**与普遍观念相反,Bitmap 索引比较适用于**较高基数列的查询和多个低基数列的组合查询,此时 Bitmap 索引对查询的过滤效果比较好**,能够过滤较多的 Page 数据。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我觉得一般情况下,不需要透露 page 的概念。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不透露Page概念的话,不好解释低基数列为什么效果不好


:::

然而如果基数过于高,也会带来其他问题,比如**占用较多的磁盘空间**,并且因为需要导入时需要构建 Bitmap 索引,导入频繁时则**导入性能会受影响**。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“导入性能”影响有多大,如果不大,就不用说了。这里主要关注查询性能。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will have a greater impact on import performance and capacity.


然而如果基数过于高,也会带来其他问题,比如**占用较多的磁盘空间**,并且因为需要导入时需要构建 Bitmap 索引,导入频繁时则**导入性能会受影响**。

并且还需要考虑**查询时加载 Bitmap 索引的开销**。因为查询时候只会按需加载 Bitmap 索引,即 `查询条件涉及的列值/基数 x Bitmap 索引`。这一值越大,则查询时加载的 Bitmap 索引开销也越大。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

查询条件涉及的列值,是啥?列值的数量?
x Bitmap 索引 是啥意思?大小?行数?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

查询条件涉及的值的数量 / 基数 * 单个Bimtap 索引大小


并且还需要考虑**查询时加载 Bitmap 索引的开销**。因为查询时候只会按需加载 Bitmap 索引,即 `查询条件涉及的列值/基数 x Bitmap 索引`。这一值越大,则查询时加载的 Bitmap 索引开销也越大。

因此为了确定 Bitmap 索引适合列的基数和查询,建议您参考本文的 [Bitmap 索引性能测试](#Bitmap 索引性能测试),根据实际业务数据和查询进行性能测试:**在不同基数的列上使用 Bitmap 索引,分析和权衡 Bitmap 索引对于查询过滤效果,以及带来的磁盘空间占用,导入性能的影响,和查询时加载 Bitmap 索引的开销等额外影响。**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看是否能给个相对明确的范围,比如多大基数范围。否则这段话,对用户来说,也只能是“听君一席话”而已。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BitmapIndex至少能过滤掉999/1000的数据


## 使用说明
- 能够快速定位 1 个值所在的数据行号,适用于点查或是小范围查询。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么突出“1 个值”?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“1个值” 可以去掉


![figure](../../assets/3.6.1-2.png)
总共耗时约 0.91 ms**,其中加载数据花了 0.47 ms**,低基数优化字典解码花了 0.31 ms,谓词过滤花了 0.23 ms。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 ** 好像有点问题


1. 构建字典:StarRocks 根据 `Gender` 列的取值构建一个字典,将 `female` 和 `male` 分别映射为 INT 类型的编码值:`0` 和 `1`。
2. 生成 bitmap:StarRocks 根据字典的编码值生成 bitmap。因为 `female` 出现在前三行,所以 `female` 的 bitmap 是 `1110`;`male` 出现在第 4 行,所以 `male` 的 bitmap 是 `0001`。
```SQL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个就不搞 SQL语法了,看着花花绿绿的,搞成bash?

DictDecode: 329.696ms // 因为输出的行数是一样的,所以低基数优化字典解码的时间所花时间差不多
BitmapIndexFilter: 419.308ms // Bitmap 索引过滤数据的时间。
BitmapIndexFilterRows: 123.433M (123432975) // Bitmap 索引过滤掉的数据行数。
ZoneMapIndexFiter: 171.580ms // ZoneMap 索引过滤数据花了 0.17s。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么比上面多了个 zonemap的时间?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里比较复杂,这个先这样写上就行,后面我再想下怎么解释

**查询语句**:

```SQL
SELECT count(1) FROM lineorder_without_index WHERE lo_shipmode="MAIL" AND lo_quantity=10 AND lo_discount=9 AND lo_tax=8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议分行,否则看起来有些累


**查询性能分析**:由于是基于多个低基数列的组合查询,Bitmap 索引效果较好,能够过滤掉一部分 Page,读取数据的时间明显减少。

总共耗时 0.68s,**其中加载数据和 Bitmap 索引花了 0.54s,**Bitmap 索引过滤数据花了 0.14s。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 ** 好像也有些问题

Signed-off-by: hellolilyliuyi <[email protected]>
Signed-off-by: hellolilyliuyi <[email protected]>
Signed-off-by: hellolilyliuyi <[email protected]>
Copy link

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@hellolilyliuyi hellolilyliuyi merged commit ae01369 into main May 29, 2024
44 checks passed
@hellolilyliuyi hellolilyliuyi deleted the update-bitmap-index branch May 29, 2024 06:35
Copy link

@Mergifyio backport branch-3.3

@github-actions github-actions bot removed the 3.3 label May 29, 2024
Copy link
Contributor

mergify bot commented May 29, 2024

backport branch-3.3

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request May 29, 2024
Signed-off-by: hellolilyliuyi <[email protected]>
(cherry picked from commit ae01369)
@hellolilyliuyi
Copy link
Contributor Author

@mergify backport branch-3.2

Copy link
Contributor

mergify bot commented May 30, 2024

backport branch-3.2

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request May 30, 2024
Signed-off-by: hellolilyliuyi <[email protected]>
(cherry picked from commit ae01369)
wanpengfei-git pushed a commit that referenced this pull request May 30, 2024
@hellolilyliuyi
Copy link
Contributor Author

@mergify backport branch-3.1 branch-3.0

Copy link
Contributor

mergify bot commented May 30, 2024

backport branch-3.1 branch-3.0

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request May 30, 2024
Signed-off-by: hellolilyliuyi <[email protected]>
(cherry picked from commit ae01369)

# Conflicts:
#	docs/zh/using_starrocks/Bitmap_index.md
mergify bot pushed a commit that referenced this pull request May 30, 2024
Signed-off-by: hellolilyliuyi <[email protected]>
(cherry picked from commit ae01369)

# Conflicts:
#	docs/zh/using_starrocks/Bitmap_index.md
wanpengfei-git pushed a commit that referenced this pull request May 30, 2024
Signed-off-by: hellolilyliuyi <[email protected]>
Co-authored-by: hellolilyliuyi <[email protected]>
wanpengfei-git pushed a commit that referenced this pull request May 30, 2024
Signed-off-by: hellolilyliuyi <[email protected]>
Co-authored-by: hellolilyliuyi <[email protected]>
wanpengfei-git pushed a commit that referenced this pull request Jun 1, 2024
@hellolilyliuyi
Copy link
Contributor Author

@mergify backport branch-2.5

Copy link
Contributor

mergify bot commented Jun 3, 2024

backport branch-2.5

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Jun 3, 2024
Signed-off-by: hellolilyliuyi <[email protected]>
(cherry picked from commit ae01369)

# Conflicts:
#	docs/zh/using_starrocks/Bitmap_index.md
wanpengfei-git pushed a commit that referenced this pull request Jun 3, 2024
Signed-off-by: hellolilyliuyi <[email protected]>
Co-authored-by: hellolilyliuyi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants