Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Utf8View for string function bit_length #13195

Closed
Tracked by #11790
jayzhan211 opened this issue Oct 31, 2024 · 3 comments · Fixed by apache/arrow-rs#6671 or #13221
Closed
Tracked by #11790

Support Utf8View for string function bit_length #13195

jayzhan211 opened this issue Oct 31, 2024 · 3 comments · Fixed by apache/arrow-rs#6671 or #13221
Labels
enhancement New feature or request

Comments

@jayzhan211
Copy link
Contributor

Is your feature request related to a problem or challenge?

Bit length for example doesn't support utf8view yet.

query error
select bit_length(arrow_cast('a', 'Utf8View'));
----
DataFusion error: Optimizer rule 'optimize_projections' failed
caused by
optimize_projections
caused by
Internal error: Failed due to a difference in schemas, original schema: DFSchema { inner: Schema { fields: [Field { name: "bit_length(arrow_cast(Utf8(\"a\"),Utf8(\"Utf8View\")))", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [None], functional_dependencies: FunctionalDependencies { deps: [] } }, new schema: DFSchema { inner: Schema { fields: [Field { name: "bit_length(arrow_cast(Utf8(\"a\"),Utf8(\"Utf8View\")))", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [None], functional_dependencies: FunctionalDependencies { deps: [] } }.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker


statement ok
create table test_source as values
  ('Andrew', 'X'),
  ('Xiangpeng', 'Xiangpeng'),
  ('Raphael', 'R'),
  (NULL, 'R');

# Table with the different combination of column types
statement ok
create table test as
SELECT
  arrow_cast(column1, 'Utf8View') as column1_utf8view
FROM test_source;

query error DataFusion error: Arrow error: Compute error: bit_length not supported for Utf8View
select bit_length(column1_utf8view) from test;

Describe the solution you'd like

I hope we can support utf8view for all the string functions

Describe alternatives you've considered

No response

Additional context

No response

@jayzhan211 jayzhan211 added the enhancement New feature or request label Oct 31, 2024
@austin362667
Copy link
Contributor

take

@austin362667 austin362667 removed their assignment Nov 1, 2024
@alamb
Copy link
Contributor

alamb commented Nov 1, 2024

Related epic is here: #11790

I renamed this ticket to be about supporting bit_length rather than all string functions

If we find other missing function implementations, let's add them as items ot #11790

@alamb alamb changed the title Support Utf8View for string function Support Utf8View for string function bit_length Nov 1, 2024
@alamb
Copy link
Contributor

alamb commented Nov 5, 2024

The fix upstream in arrow didn't actually fix this bug (yet) so reopening

@alamb alamb reopened this Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants