-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc] add bitmap index en #46759
[Doc] add bitmap index en #46759
Conversation
|
||
StarRocks can adaptively choose whether to use a bitmap index based on column cardinality and query conditions. If a bitmap index does not effectively filter out many Pages or the overhead of loading bitmap indexes during queries is high, StarRocks will not use the bitmap index by default to avoid degrading query performance. | ||
|
||
StarRocks determines whether to use a bitmap index based on the ratio of the number of values involved in the query condition to the column cardinality. Generally, the smaller this ratio, the better the filtering effect of the bitmap index. Thus, StarRocks uses `bitmap_max_filter_ratio/1000` as the threshold. When the number of values in the filter condition / column cardinality < `bitmap_max_filter_ratio/1000`, the bitmap index will be used. The default value of `bitmap_max_filter_ratio` is `1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the number of values in the filter condition/column cardinality
is less than bitmap_max_filter_ratio/1000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
下同
|
||
Additionally, the **overhead of loading bitmap indexes during queries** should be considered. During a query, bitmap indexes are loaded on demand, and the larger the value of `number of column values involved in query conditions/cardinality x bitmap index`, the greater the overhead of loading bitmap indexes during queries. | ||
|
||
To determine the appropriate cardinality and query conditions for bitmap indexes, it is recommended to refer to the [Performance test on bitmap index](#performance-test-on-bitmap-index) in this topic to conduct performance tests. You can use actual business data and queries to **create bitmap indexes on columns of different cardinalities, to analyze the filtering effect of bitmap indexes on queries (at least filtering out 999/1000 of the data),the disk space usage, the impact on loading performance, and the overhead of loading bitmap indexes during queries.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the disk space usage 前加 空格
|
||
Bitmap indexes are generally suitable for columns with high cardinality. Bitmap indexes are a good choice when Bitmap indexes can exhibit high selectivity, and its filtering effect (number of data rows filtered by the Bitmap index/total number of data rows) is lower than one in ten thousand. | ||
Take a query based on a single column as example, such as `SELECT * FROM employees WHERE gender = 'male';`. The `gender` column in the `employees` table has values 'male' and 'female', so the cardinality is 2 (two distinct values). The query condition involves one value, so the ratio is 1/2, which is greater than 1/1000. Therefore, this query will not use the bitmap index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as an example
|
||
To evaluate the performance improvement of Bitmap indexes in StarRocks, queries are performed on a 100 GB SSB dataset in StarRocks. The test results are as follows: | ||
Take another query based on combination of multiple columns as example, such as `SELECT * FROM employees WHERE gender = 'male' AND city IN ('Beijing', 'Shanghai');`. The cardinality of the `city` column is of 10,000, and the query condition involves two values, so the ratio is calculated as `(1*2)/(2*10000)`, which is less than 1/1000. Therefore, this query will use the bitmap index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a combination?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as an example
|
||
- `Gender`: The bitmap of `female` is `1110` and the bitmap of `male` is `0001`. | ||
- `Producer`: The bitmap of `level_1` is `1010`, the bitmap of `level_2` is `0100`, and the bitmap of `level_3` is `0001`. | ||
**Query performance analysis**: Since the queried column is of high cardinality, the bitmap index is effective, allowing for filtering out a portion of the pages and significantly reducing time for reading data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reducing the time for reading data
``` | ||
|
||
After the query is sent, StarRocks search for the dictionaries of `Gender` and `Income_level` at the same time to get the following information: | ||
**Query performance analysis**: According to StarRocks' default configuration, Bitmap Index is used when the number of distinct values/column cardinality < `bitmap_max_filter_ratio/1000` (default 1/1000). Since this condition is met, the query uses the Bitmap Index, and the performance is similar to use the Bitmap Index compulsorily. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the performance is similar to that when the Bitmap Index compulsorily used.
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
@Mergifyio backport branch-3.3 |
@Mergifyio backport branch-3.2 |
@Mergifyio backport branch-3.1 |
@Mergifyio backport branch-3.0 |
@Mergifyio backport branch-2.5 |
✅ Backports have been created
|
✅ Backports have been created
|
✅ Backports have been created
|
✅ Backports have been created
|
✅ Backports have been created
|
(cherry picked from commit 8b437bf)
(cherry picked from commit 8b437bf)
(cherry picked from commit 8b437bf) # Conflicts: # docs/en/using_starrocks/Bitmap_index.md
(cherry picked from commit 8b437bf) # Conflicts: # docs/en/using_starrocks/Bitmap_index.md
(cherry picked from commit 8b437bf) # Conflicts: # docs/en/using_starrocks/Bitmap_index.md
Co-authored-by: hellolilyliuyi <[email protected]>
Co-authored-by: hellolilyliuyi <[email protected]>
Signed-off-by: hellolilyliuyi <[email protected]> Co-authored-by: hellolilyliuyi <[email protected]>
Signed-off-by: hellolilyliuyi <[email protected]> Co-authored-by: hellolilyliuyi <[email protected]>
Co-authored-by: hellolilyliuyi <[email protected]>
Why I'm doing:
What I'm doing:
Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: