Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dictionary String (UTF8) type to String sqllogictests #12621

Merged
merged 5 commits into from
Sep 25, 2024

Conversation

goldmedal
Copy link
Contributor

Which issue does this PR close?

Part of #12415 .

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Add the test for dictionary Utf8

Are there any user-facing changes?

no

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Sep 25, 2024
@goldmedal
Copy link
Contributor Author

goldmedal commented Sep 25, 2024

I tried adding another test file for Dictionary(Int32,_ LargeUtf8), but I got a panic from arrow-cast. It's hard to trace which case failed 🤔. I might submit a PR to improve the error message on the Arrow side.

@goldmedal
Copy link
Contributor Author

I tried adding another test file for Dictionary(Int32,_ LargeUtf8), but I got a panic from arrow-cast. It's hard to trace which case failed 🤔. I might submit a PR to improve the error message on the Arrow side.

submit apache/arrow-rs#6456 for it.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great -- thank you @goldmedal

@@ -275,6 +279,17 @@ pub(crate) fn convert_schema_to_types(columns: &Fields) -> Vec<DFColumnType> {
| DataType::Time32(_)
| DataType::Time64(_) => DFColumnType::DateTime,
DataType::Timestamp(_, _) => DFColumnType::Timestamp,
DataType::Dictionary(key_type, value_type) => {
if key_type.is_integer() {
// mapping dictionary string types to Text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

pub fn cell_to_string(
col: &ArrayRef,
row: usize,
data_type: &DataType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit is that since this data type comes from col it seems like we could keep the signature of this function the same and call let data_type = col.data_type() in the function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops... my bad. It's my other experimental change. I forgot to roll back it. I'll recover it. Thanks for mentioning it.

#
# common test for string-like functions and operators
#
include ./string_query.slt.part
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really nice

@alamb
Copy link
Contributor

alamb commented Sep 25, 2024

🚀

@alamb alamb merged commit 12f55d8 into apache:main Sep 25, 2024
24 checks passed
bgjackma pushed a commit to bgjackma/datafusion that referenced this pull request Sep 25, 2024
* mapping DictionaryString to text

* disable and move out the fail case for dictionary string

* fix the schema for dictionary string

* rollback the unnecessary change

* cargo fmt
@goldmedal goldmedal deleted the chore/12415-dictionary-string branch September 26, 2024 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants