-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-43594: [C++] Remove std::optional from arrow::ArrayStatistics::is_{min,max}_exact #43595
Conversation
|
@@ -47,14 +47,14 @@ struct ARROW_EXPORT ArrayStatistics { | |||
/// \brief The minimum value, may not be set | |||
std::optional<ValueType> min = std::nullopt; | |||
|
|||
/// \brief Whether the minimum value is exact or not, may not be set | |||
std::optional<bool> is_min_exact = std::nullopt; | |||
/// \brief Whether the minimum value is exact or not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but IMO is_min_exact = false
might still has exact statistics but the reader cannot gurantee that, since apache/parquet-format#216 is new in 2.10 :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the followings correct?
- Parquet 2.9 or earlier data don't have
is_min_value_exact
/is_max_value_exact
- Parquet 2.9 or earlier data use only exact min/max
- Parquet 2.10 or later data use exact min/max or non-exact min/max
- Parquet 2.10 or later data may use exact min/max without
is_min_value_exact
/is_max_value_exact
You're focusing on the 2. case, right? Can our Parquet reader detect Parquet version? If so, can we always set true
to is_min_exact
/is_max_exact
for Parquet 2.9 or earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for late replying
Parquet 2.9 or earlier data don't have is_min_value_exact/is_max_value_exact
Yes
Parquet 2.9 or earlier data use only exact min/max
I guess no, I'll send a mail to maillist to make it sure
Parquet 2.10 or later data use exact min/max or non-exact min/max
Yes
Parquet 2.10 or later data may use exact min/max without is_min_value_exact/is_max_value_exact
right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, can we always set true to is_min_exact/is_max_exact for Parquet 2.9 or earlier?
Hmmm I'll try to make it clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway we can mention "exact=false" can also means is exact, lol
Or we can denote that the parquet-c++ output is exact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Let's use exact=true
for Apache Parquet C++ output.
…s::is_{min,max}_exact We don't need "unknown" state. If they aren't set, we can process they are not exact.
abcfb72
to
8331bf5
Compare
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit b80a51a. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
We don't need "unknown" state. If they aren't set, we can process they are not exact.
What changes are included in this PR?
Remove
std::optional
fromarrow::ArrayStatistics::is_{min,max}_exact
.Are these changes tested?
Yes.
Are there any user-facing changes?
Yes.
std::optional
fromarrow::ArrayStatistics::is_{min,max}_exact
#43594