Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support applying parquet bloom filters to StringView columns #12499

Closed
Tracked by #11752
alamb opened this issue Sep 17, 2024 · 2 comments · Fixed by #12503
Closed
Tracked by #11752

Support applying parquet bloom filters to StringView columns #12499

alamb opened this issue Sep 17, 2024 · 2 comments · Fixed by #12503
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Sep 17, 2024

Is your feature request related to a problem or challenge?

Part of #11752

While working to enable StringView in #12092 I found that the columns when read as StringView and BinaryView do not take advantage of Bloom filters.

Specifically this code doesn't handle StringView

ScalarValue::Utf8(Some(v)) => sbbf.check(&v.as_str()),
ScalarValue::Binary(Some(v)) => sbbf.check(v),
ScalarValue::FixedSizeBinary(_size, Some(v)) => sbbf.check(v),
ScalarValue::Boolean(Some(v)) => sbbf.check(v),
ScalarValue::Float64(Some(v)) => sbbf.check(v),
ScalarValue::Float32(Some(v)) => sbbf.check(v),

Describe the solution you'd like

Support applying parquet bloom filters to StringView columns

Describe alternatives you've considered

Basically:

  1. Make the code changes for bloom filters in Enable reading StringViewArray by default from Parquet #12092
  2. Write a test

In terms of testing, I think the easiest thing to do would be to follow the model of the existing tests for Utf8/Binary columns and pass the schema_force_view_types config flag

Additional context

No response

@alamb alamb added the enhancement New feature or request label Sep 17, 2024
@alamb alamb added the good first issue Good for newcomers label Sep 17, 2024
@alamb
Copy link
Contributor Author

alamb commented Sep 17, 2024

I think this is a clearly defined need so marking as good first issue

@my-vegetable-has-exploded
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
2 participants