Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make supported types in dialect yaml files as a map #81

Merged
merged 5 commits into from
Sep 17, 2024

Conversation

scgkiran
Copy link
Contributor

@scgkiran scgkiran commented Jul 24, 2024

This change is needed to build a FunctionRegistry in substrait-go PR

@scgkiran
Copy link
Contributor Author

@richtia @westonpace can you please review this change?

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces the concept of a "native typename string" which I feel doesn't really exist in all systems (e.g. those that don't speak SQL like velox? / cuda / pandas / etc.). However, I see how it can be useful.

I kind of follow what is happening here but what are you using these values for? Are you trying to create SQL statements? Are you creating native plans from strings?

Comment on lines 37 to 38
ts: datetime64[s]
date: datetime64[s]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is datetime64[s] to cuda?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the mapping here, I thought this would be useful.

what should be the mapping for cuda case? Should we leave the mapping empty for now?

Comment on lines 37 to 40
date: timestamp
time: timestamp
ts: timestamp
tstz: timestamp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can these all map to the same type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I picked the mapping from here. Based on this should it be ?

date: date32
time: Time64
ts: timestamp
tstz: timestamp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with these all mapping to timestamp for now since that's what the runner does. I think the other date/time types weren't working properly with the date/time functions, so I added some handling based on timestamp length since timestamp could represent the others. We can open up an issue to fix this and do the proper mapping in a subsequent PR.

@scgkiran
Copy link
Contributor Author

I kind of follow what is happening here but what are you using these values for? Are you trying to create SQL statements? Are you creating native plans from strings?

Once I have a registry of functions, I want to be able to do two things,

  1. convert substrait plan to native sql and
  2. native sql to a substrait plan.

@@ -83,7 +83,7 @@ class DialectFile(NamedTuple):
scalar_functions: List[DialectFunction]
aggregate_functions: List[DialectFunction]
uri_to_func_prefix: Dict[str, str]
supported_types: List[str]
supported_types: Dict[str, str]
Copy link
Member

@EpsilonPrime EpsilonPrime Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the key is made up of a key and a value itself is its type more like a tuple or dict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed the value to dict

- interval
- dec
i8:
sql_type_name: TINYINT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the trailing comma needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it

fp64:
sql_type_name: float64
bool:
sql_type_name: bool_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the underscore in SQL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this from runner.py

str:
sql_type_name: text
date:
sql_type_name: date
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these type names lowercase while most of the other dialects are capitalized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed everything to lower case.

Copy link
Member

@richtia richtia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scgkiran can you look into the 2 duckdb test failures?

@richtia
Copy link
Member

richtia commented Sep 17, 2024

@scgkiran can you look into the 2 duckdb test failures?

The duckdb failures aren't relevant to this PR. We can make a fix in a separate one.

@richtia richtia merged commit 65351e5 into substrait-io:main Sep 17, 2024
5 of 6 checks passed
@scgkiran
Copy link
Contributor Author

@scgkiran can you look into the 2 duckdb test failures?

The duckdb failures aren't relevant to this PR. We can make a fix in a separate one.

Thanks @richtia
I was not hitting this failure locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants