Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement arrow_cast support for StringView and BinaryView #10920

Closed
Tracked by #10918
alamb opened this issue Jun 14, 2024 · 1 comment · Fixed by #10925
Closed
Tracked by #10918

Implement arrow_cast support for StringView and BinaryView #10920

alamb opened this issue Jun 14, 2024 · 1 comment · Fixed by #10925
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jun 14, 2024

Is your feature request related to a problem or challenge?

Part of #10918, [StringViewArray](https://docs.rs/arrow/latest/arrow/array/type.StringViewArray.html) support in DataFusion

https://datafusion.apache.org/user-guide/sql/scalar_functions.html#arrow-cast is a function widely used in DataFusion testing to test with specific arrow data types

Under the covers it simply calls the appropriate arrow-cast kernel

Here is an example showing how this works

> select arrow_cast('foo', 'Dictionary(Int32, Utf8)');
+---------------------------------------------------------+
| arrow_cast(Utf8("foo"),Utf8("Dictionary(Int32, Utf8)")) |
+---------------------------------------------------------+
| foo                                                     |
+---------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.006 seconds.

> select arrow_typeof(arrow_cast('foo', 'Dictionary(Int32, Utf8)'));
+-----------------------------------------------------------------------+
| arrow_typeof(arrow_cast(Utf8("foo"),Utf8("Dictionary(Int32, Utf8)"))) |
+-----------------------------------------------------------------------+
| Dictionary(Int32, Utf8)                                               |
+-----------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

here is how to make a table with dictionary encoded values:

> create table foo as values ('Andrew', 'Xiangpeng', 'Raphael');
0 row(s) fetched.
Elapsed 0.002 seconds.

> create table dict_table as select arrow_cast(column1, 'Dictionary(Int32, Utf8)') column1 from foo;
0 row(s) fetched.
Elapsed 0.008 seconds.

> select column1, arrow_typeof(column1) from dict_table;
+---------+----------------------------------+
| column1 | arrow_typeof(dict_table.column1) |
+---------+----------------------------------+
| Andrew  | Dictionary(Int32, Utf8)          |
+---------+----------------------------------+
1 row(s) fetched.
Elapsed 0.002 seconds.

Describe the solution you'd like

I would like to be able to use ArrowCast to create StringView and BinaryView arrays for testing

This does not yet work:

> select arrow_cast('foo', 'StringView');
Error during planning: Unsupported type 'StringView'. Must be a supported arrow type name such as 'Int32' or 'Timestamp(Nanosecond, None)'. Error unrecognized word: StringView

Describe alternatives you've considered

No response

Additional context

No response

@alamb alamb added the enhancement New feature or request label Jun 14, 2024
@alamb alamb changed the title Implement arrow_cast support for StringView Implement arrow_cast support for StringView and BinaryView Jun 14, 2024
@XiangpengHao
Copy link
Contributor

Let me try it, can you assign me? @alamb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants