Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Min/Max references from AggregateStatistics #11153

Closed
Tracked by #11151
alamb opened this issue Jun 28, 2024 · 11 comments
Closed
Tracked by #11151

Remove Min/Max references from AggregateStatistics #11153

alamb opened this issue Jun 28, 2024 · 11 comments
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jun 28, 2024

Is your feature request related to a problem or challenge?

Part of #11151 (where we are removing special case uses of Min/Max)

Describe the solution you'd like

Reemove these cases:

if let Some(casted_expr) =
agg_expr.as_any().downcast_ref::<expressions::Min>()

if let Some(casted_expr) =
agg_expr.as_any().downcast_ref::<expressions::Min>()

if let Some(casted_expr) =
agg_expr.as_any().downcast_ref::<expressions::Max>()

if let Some(casted_expr) =
agg_expr.as_any().downcast_ref::<expressions::Max>()

Specifically, the idea is to remove the pattern

       expr.as_any().downcast_ref::<Max>()

and

       expr.as_any().downcast_ref::<Min>()

Describe alternatives you've considered

I suggest adding a general purpose function to AggregateExec and then implementing it for Min and Max (so we can do the same for Min/Max UDAF)

impl AggregateExpr {

 /// Return the value of the aggregate function, if known, given the number of input rows.
 /// 
 /// Return None if the value can not be determined solely from the input.
 /// 
 /// # Examples
 /// * The `COUNT` aggregate would return `Some(11)` given `num_rows = 11`
 /// * The `MIN` aggregate would return `Some(Null)` given `num_rows = 0
 /// * The `MIN` aggregate would return `None` given num_rows = 11
 fn output_from_rows(&self, num_rows: usize) -> Option<ScalarValue> { None }
...
}

### Additional context

_No response_
@Rachelint
Copy link
Contributor

take

@alamb
Copy link
Contributor Author

alamb commented Jul 1, 2024

Thanks @Rachelint -- I also have some time later this week to help with this issue (if it might help to have one or two examples as we work through #11151)

@Rachelint
Copy link
Contributor

Rachelint commented Jul 1, 2024

Hi @alamb , I have some questions about the alternatives mentioned in the issue.
It seems the interface should be like:

trait AggregateExpr {
	fn output_from_stats(&self, stats: &Statistics) -> Option<ScalarValue> { None }
	...
}

As I understand, it is a optimization takes effect when specific aggregate functions(such as max, min, count...) can get the results directly from stats?

@alamb
Copy link
Contributor Author

alamb commented Jul 1, 2024

As I understand, it is a optimization takes effect when specific aggregate functions(such as max, min, count...) can get the results directly from stats?

Yes that is the case

I think output_from_stats would also be reasonable and potentially more general. One challenge is that the AggregateExpr might not have access to its argument exprs (and thus it might not know what the argument is) 🤔

@Rachelint
Copy link
Contributor

Rachelint commented Jul 1, 2024

As I understand, it is a optimization takes effect when specific aggregate functions(such as max, min, count...) can get the results directly from stats?

Yes that is the case

I think output_from_stats would also be reasonable and potentially more general. One challenge is that the AggregateExpr might not have access to its argument exprs (and thus it might not know what the argument is) 🤔

It seems a function expressions exists, and can help?

fn expressions(&self) -> Vec<Arc<dyn PhysicalExpr>>;

and it is used to get min/max from stats now?

if casted_expr.expressions().len() == 1 {
// TODO optimize with exprs other than Column
if let Some(col_expr) = casted_expr.expressions()[0]
.as_any()
.downcast_ref::<expressions::Column>()
{
if let Precision::Exact(val) =
&col_stats[col_expr.index()].max_value

@alamb
Copy link
Contributor Author

alamb commented Jul 1, 2024

Sounds like it might work -- I think the only way to really know for sure would be to try it out

@Rachelint
Copy link
Contributor

Sounds like it might work -- I think the only way to really know for sure would be to try it out

Thanks, I try it now.

@edmondop
Copy link
Contributor

edmondop commented Jul 3, 2024

Sounds like it might work -- I think the only way to really know for sure would be to try it out

Thanks, I try it now.

@Rachelint if you stop working on this could you please let me know?

@Rachelint
Copy link
Contributor

Rachelint commented Jul 4, 2024

Sounds like it might work -- I think the only way to really know for sure would be to try it out

Thanks, I try it now.

@Rachelint if you stop working on this could you please let me know?

Sorry for delay, still working on now, will try to push codes today...

@edmondop
Copy link
Contributor

edmondop commented Sep 30, 2024

@alamb we can close this too as fixed via #12296 right?

@alamb alamb closed this as completed Sep 30, 2024
@alamb
Copy link
Contributor Author

alamb commented Sep 30, 2024

Thanks @edmondop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants