-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: duplicate declaration of min:timestamp & max:timestamp #631
Conversation
ACTION NEEDED Substrait follows the Conventional Commits The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
The duplicated names prevent substrait-java from being updated. However, there is a question around whether names need to unique across ALL extensions, or just within a single extension file. While these functions were added in error (I think) you could make the case that:
should be treated as different functions. |
That might make sense from a certain point of view, but it would make very hard to resolve function invocations.
If you have to parse I think that introducing namespaces is not as simple as allowing duplicated declarations and requires a dedicated discussion. |
The case I'm think of is more like:
Which is a name collision with names outside of the core spec, because users can choose to provide their own functions. My engine might provide 2 and disallow 1 because it differs from the Substrait semantics for add somehow.
Within a Substrait plan, these two functions are distinguishable because we have access to the extension information. Outside of Substrait, less so. You're right that for something like
I agree with this, figuring out duplicating declarations is out of the scope of this PR. * I say relatively because the mapping doesn't include the name of the extension, yet. |
nullability: DECLARED_OUTPUT | ||
decomposable: MANY | ||
intermediate: timestamp_tz? | ||
return: timestamp_tz? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is technically a breaking change. Plans may exist that use this version of the function, especially because it predates the version in functions_datetime.yaml
.
API wise, it's probably better to remove these versions because the leaves all the time related min
and max functions in
functions_datetime.yaml`.
In terms of minimising breakage, it's probably better to remove the definition of these functions in functions_datetime.yaml
as those have been around for less time.
I have a preference for removing them from functions_datetime
, but it would be good to have other folks weigh in on this.
I've added my thoughts around function names and uniqueness in #634 |
From substrait sync, we've decided to remove the old ones or keep the new ones. |
Addresses a duplication of
min
andmax
function overloadsfor timestamp types.
The functions are declared in
arithmetic
extensions:min
-> https://github.com/amol-/substrait/blob/main/extensions/functions_arithmetic.yaml#L1217-L1230max
-> https://github.com/amol-/substrait/blob/main/extensions/functions_arithmetic.yaml#L1217-L1230but are also declared in
datetime
extensions:min
-> https://github.com/amol-/substrait/blob/main/extensions/functions_datetime.yaml#L807-L820max
-> https://github.com/amol-/substrait/blob/main/extensions/functions_datetime.yaml#L852-L865This seems to be a source of confusion for a system loading those extensions definition, which one of the two should be considered valid?
The PR addresses this by preserving only the definitions in
datetime
for those argument types.