-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet statistics missing when reading Utf8
as Utf8View
#12123
Labels
bug
Something isn't working
Comments
This was referenced Aug 22, 2024
take |
I unassigned myself because I'm not very familiar with this topic (StringView). I'll keep digging into the issue, but if anyone has an idea for a solution, feel free to take over. |
Have an alternative solution, done in the process of fixing #12119. PR up shortly. |
take |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Part of #11752
Describe the bug
One of the last remaining issues causing test failures when we enable reading StringView by default in #12092 is as follows:
To Reproduce
#12092
And then run:
cargo test -p datafusion --lib -- file_format::parquet
Expected behavior
The tests should pass
Additional context
The problem is that table schema is configured to be UTF8View but the file schema is using Utf8 (so the stats are returned as Utf8) and the accumulators can't deal updating a Utf8View from Utf8.
@XiangpengHao solved this issue in #11862 (comment) to thread the parameter and then and cast the file schema appropriately.
The code isn't great to start with and adding a new parameter makes it worse.
I also think there are some bugs lurking there that maybe we could improve if the code was more testable
The text was updated successfully, but these errors were encountered: