Use `struct` instead of `named_struct` when there are no aliases #9897

alamb · 2024-04-01T12:50:36Z

Which issue does this PR close?

Rationale for this change

The names of the columns created by callling struct changed in #9743

Before #9743

select struct(a, b) from t;
+-----------------+
| struct(t.a,t.b) |
+-----------------+
| {c0: 1, c1: 2}  |
| {c0: 3, c1: 4}  |
+-----------------+

After #9743

(note how the column name is different)

select struct(a as field_a, b) from t;
+--------------------------------------------------+
| named_struct(Utf8("field_a"),t.a,Utf8("c1"),t.b) |
+--------------------------------------------------+
| {field_a: 1, c1: 2}                              |
| {field_a: 3, c1: 4}                              |
+--------------------------------------------------+

This change caused #9891 as well as removed test coverage for struct

What changes are included in this PR?

Use struct when there are no named fields
reverts the test change in [CI] Use alias for table.struct #9894

Are these changes tested?

Yes, by existing tests

Are there any user-facing changes?

The column names are now the same as they were prior to #9894

This reverts commit 9487ca0.

alamb · 2024-04-01T12:51:31Z

datafusion/sqllogictest/test_files/expr.slt

@@ -2288,39 +2288,39 @@ select struct(time,load1,load2,host) from t1;

 # can have an aggregate function with an inner coalesce
 query TR
-select t2.info['c3'] as host, sum(coalesce(t2.info)['c1']) from (select struct(time,load1,load2,host) as info from t1) t2 where t2.info['c3'] IS NOT NULL group by t2.info['c3'] order by host;
+select t2."struct(t1.time,t1.load1,t1.load2,t1.host)"['c3'] as host, sum(coalesce(t2."struct(t1.time,t1.load1,t1.load2,t1.host)")['c1']) from (select struct(time,load1,load2,host) from t1) t2 where t2."struct(t1.time,t1.load1,t1.load2,t1.host)"['c3'] IS NOT NULL group by t2."struct(t1.time,t1.load1,t1.load2,t1.host)"['c3'] order by host;


This reverts the change from #9894

yyy1000 · 2024-04-01T17:48:14Z

datafusion/sql/src/expr/mod.rs

+            .iter()
+            .any(|value| matches!(value, SQLExpr::Named { .. }))
+        {
+            self.create_named_struct(values, input_schema, planner_context)


I think the values in create_named_struct will be SQLExpr::Named in this PR, so maybe we can also make a change here?
https://github.com/apache/arrow-datafusion/blob/d8d521ac8b90002fa0ba1f91456051a9775ae193/datafusion/sql/src/expr/mod.rs#L610-L631

Oh, ignore this. One case is only some values are Named so no change is needed.

comphead

thanks @alamb 1 improvement I can see here is we over iterating through values.
I mean 1 iteration is do identify named/not_named and when creating struct/named_struct its another one. prob it can be optimized

alamb · 2024-04-02T18:27:33Z

thanks @alamb 1 improvement I can see here is we over iterating through values. I mean 1 iteration is do identify named/not_named and when creating struct/named_struct its another one. prob it can be optimized

I agree it does two iterations over the value. I think the current approach basically requires two iterations because if we are doing named_struct we need to add names for each column that don't have a name.

So in other words, struct(1, 2, 3 as other_col) needs to become struct(1 as c1, 2 as c2, 3 as other_col) but we don't know until we see 3 as other_col that we need to add in c1 and c2) 🤔

I would like to spend my optimization budget on bigger things initially if that is ok (specifically #9637)

alamb · 2024-04-02T21:21:14Z

Thanks again for the reviews 🙏

…che#9897) * Revert "use alias (apache#9894)" This reverts commit 9487ca0. * Use `struct` instead of `named_struct` when there are no aliases * Update docs * fmt

alamb added 3 commits April 1, 2024 07:19

Revert "use alias (apache#9894)"

c33e2aa

This reverts commit 9487ca0.

Use struct instead of named_struct when there are no aliases

c0895b5

Update docs

10a40dd

github-actions bot added sql SQL Planner sqllogictest SQL Logic Tests (.slt) labels Apr 1, 2024

alamb commented Apr 1, 2024

View reviewed changes

fmt

792212c

yyy1000 reviewed Apr 1, 2024

View reviewed changes

comphead approved these changes Apr 1, 2024

View reviewed changes

Merge remote-tracking branch 'apache/main' into alamb/struct_followup

d562321

alamb merged commit a6ff1fe into apache:main Apr 2, 2024
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `struct` instead of `named_struct` when there are no aliases #9897

Use `struct` instead of `named_struct` when there are no aliases #9897

alamb commented Apr 1, 2024

alamb Apr 1, 2024

yyy1000 Apr 1, 2024

yyy1000 Apr 1, 2024

comphead left a comment

alamb commented Apr 2, 2024

alamb commented Apr 2, 2024

Use struct instead of named_struct when there are no aliases #9897

Use struct instead of named_struct when there are no aliases #9897

Conversation

alamb commented Apr 1, 2024

Which issue does this PR close?

Rationale for this change

Before #9743

After #9743

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb Apr 1, 2024

Choose a reason for hiding this comment

yyy1000 Apr 1, 2024

Choose a reason for hiding this comment

yyy1000 Apr 1, 2024

Choose a reason for hiding this comment

comphead left a comment

Choose a reason for hiding this comment

alamb commented Apr 2, 2024

alamb commented Apr 2, 2024

Use `struct` instead of `named_struct` when there are no aliases #9897

Use `struct` instead of `named_struct` when there are no aliases #9897