Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] add bitmap index en #46759

Merged
merged 5 commits into from
Jun 7, 2024
Merged

[Doc] add bitmap index en #46759

merged 5 commits into from
Jun 7, 2024

Conversation

hellolilyliuyi
Copy link
Contributor

@hellolilyliuyi hellolilyliuyi commented Jun 7, 2024

Why I'm doing:

What I'm doing:

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.3
    • 3.2
    • 3.1
    • 3.0
    • 2.5

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jun 7, 2024
@github-actions github-actions bot added the 3.3 label Jun 7, 2024

StarRocks can adaptively choose whether to use a bitmap index based on column cardinality and query conditions. If a bitmap index does not effectively filter out many Pages or the overhead of loading bitmap indexes during queries is high, StarRocks will not use the bitmap index by default to avoid degrading query performance.

StarRocks determines whether to use a bitmap index based on the ratio of the number of values involved in the query condition to the column cardinality. Generally, the smaller this ratio, the better the filtering effect of the bitmap index. Thus, StarRocks uses `bitmap_max_filter_ratio/1000` as the threshold. When the number of values in the filter condition / column cardinality < `bitmap_max_filter_ratio/1000`, the bitmap index will be used. The default value of `bitmap_max_filter_ratio` is `1`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the number of values in the filter condition/column cardinality is less than bitmap_max_filter_ratio/1000

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下同


Additionally, the **overhead of loading bitmap indexes during queries** should be considered. During a query, bitmap indexes are loaded on demand, and the larger the value of `number of column values involved in query conditions/cardinality x bitmap index`, the greater the overhead of loading bitmap indexes during queries.

To determine the appropriate cardinality and query conditions for bitmap indexes, it is recommended to refer to the [Performance test on bitmap index](#performance-test-on-bitmap-index) in this topic to conduct performance tests. You can use actual business data and queries to **create bitmap indexes on columns of different cardinalities, to analyze the filtering effect of bitmap indexes on queries (at least filtering out 999/1000 of the data),the disk space usage, the impact on loading performance, and the overhead of loading bitmap indexes during queries.**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the disk space usage 前加 空格


Bitmap indexes are generally suitable for columns with high cardinality. Bitmap indexes are a good choice when Bitmap indexes can exhibit high selectivity, and its filtering effect (number of data rows filtered by the Bitmap index/total number of data rows) is lower than one in ten thousand.
Take a query based on a single column as example, such as `SELECT * FROM employees WHERE gender = 'male';`. The `gender` column in the `employees` table has values 'male' and 'female', so the cardinality is 2 (two distinct values). The query condition involves one value, so the ratio is 1/2, which is greater than 1/1000. Therefore, this query will not use the bitmap index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as an example


To evaluate the performance improvement of Bitmap indexes in StarRocks, queries are performed on a 100 GB SSB dataset in StarRocks. The test results are as follows:
Take another query based on combination of multiple columns as example, such as `SELECT * FROM employees WHERE gender = 'male' AND city IN ('Beijing', 'Shanghai');`. The cardinality of the `city` column is of 10,000, and the query condition involves two values, so the ratio is calculated as `(1*2)/(2*10000)`, which is less than 1/1000. Therefore, this query will use the bitmap index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a combination?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as an example


- `Gender`: The bitmap of `female` is `1110` and the bitmap of `male` is `0001`.
- `Producer`: The bitmap of `level_1` is `1010`, the bitmap of `level_2` is `0100`, and the bitmap of `level_3` is `0001`.
**Query performance analysis**: Since the queried column is of high cardinality, the bitmap index is effective, allowing for filtering out a portion of the pages and significantly reducing time for reading data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reducing the time for reading data

```

After the query is sent, StarRocks search for the dictionaries of `Gender` and `Income_level` at the same time to get the following information:
**Query performance analysis**: According to StarRocks' default configuration, Bitmap Index is used when the number of distinct values/column cardinality < `bitmap_max_filter_ratio/1000` (default 1/1000). Since this condition is met, the query uses the Bitmap Index, and the performance is similar to use the Bitmap Index compulsorily.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the performance is similar to that when the Bitmap Index compulsorily used.

Copy link

github-actions bot commented Jun 7, 2024

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Jun 7, 2024

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@hellolilyliuyi hellolilyliuyi merged commit 8b437bf into main Jun 7, 2024
44 checks passed
@hellolilyliuyi hellolilyliuyi deleted the add-bitmap-index-en branch June 7, 2024 07:52
Copy link

github-actions bot commented Jun 7, 2024

@Mergifyio backport branch-3.3

Copy link

github-actions bot commented Jun 7, 2024

@Mergifyio backport branch-3.2

Copy link

github-actions bot commented Jun 7, 2024

@Mergifyio backport branch-3.1

@github-actions github-actions bot removed the 3.1 label Jun 7, 2024
Copy link

github-actions bot commented Jun 7, 2024

@Mergifyio backport branch-3.0

@github-actions github-actions bot removed the 3.0 label Jun 7, 2024
Copy link

github-actions bot commented Jun 7, 2024

@Mergifyio backport branch-2.5

@github-actions github-actions bot removed the 2.5 label Jun 7, 2024
Copy link
Contributor

mergify bot commented Jun 7, 2024

backport branch-3.3

✅ Backports have been created

Copy link
Contributor

mergify bot commented Jun 7, 2024

backport branch-3.2

✅ Backports have been created

Copy link
Contributor

mergify bot commented Jun 7, 2024

backport branch-3.1

✅ Backports have been created

Copy link
Contributor

mergify bot commented Jun 7, 2024

backport branch-3.0

✅ Backports have been created

Copy link
Contributor

mergify bot commented Jun 7, 2024

backport branch-2.5

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Jun 7, 2024
(cherry picked from commit 8b437bf)
mergify bot pushed a commit that referenced this pull request Jun 7, 2024
(cherry picked from commit 8b437bf)
mergify bot pushed a commit that referenced this pull request Jun 7, 2024
(cherry picked from commit 8b437bf)

# Conflicts:
#	docs/en/using_starrocks/Bitmap_index.md
@mergify mergify bot mentioned this pull request Jun 7, 2024
42 tasks
mergify bot pushed a commit that referenced this pull request Jun 7, 2024
(cherry picked from commit 8b437bf)

# Conflicts:
#	docs/en/using_starrocks/Bitmap_index.md
mergify bot pushed a commit that referenced this pull request Jun 7, 2024
(cherry picked from commit 8b437bf)

# Conflicts:
#	docs/en/using_starrocks/Bitmap_index.md
wanpengfei-git pushed a commit that referenced this pull request Jun 7, 2024
wanpengfei-git pushed a commit that referenced this pull request Jun 7, 2024
wanpengfei-git pushed a commit that referenced this pull request Jun 7, 2024
Signed-off-by: hellolilyliuyi <[email protected]>
Co-authored-by: hellolilyliuyi <[email protected]>
wanpengfei-git pushed a commit that referenced this pull request Jun 7, 2024
Signed-off-by: hellolilyliuyi <[email protected]>
Co-authored-by: hellolilyliuyi <[email protected]>
wanpengfei-git pushed a commit that referenced this pull request Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants