Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Parameterized type #52

Merged

Conversation

anshuldata
Copy link
Contributor

@anshuldata anshuldata commented Aug 30, 2024

Enhancements:

  1. Added interface "FuncDefArgType". This represents argument of a function in substrait extension

  2. Function implementations input argument are feched from parser via method ArgType method. ArgType returns interface "FuncDefArgType". Added another method to parser expression "RetType" to return Type (concrete) as return type. This is temporary and should be fixed with Type Derivation. To keep behavior same for return type I kept "RetType" implementations same to return "Type"

  3. Added support for parameterized type

    • ParameterizedFixedChar/VarChar/FixedBinary/Decimal
    • Parameterized/PrecisionTimestamp/PrecisionTimestampTz
    • Parameterized/Decimal/List/Struct/Map Type
  4. Added interface "LeafParameter". This indicates parameter which are not nested
    * These can be of two type, concrete and abstract (name in code is "ConcreteIntParam" and "VariableIntParam")
    * This allows representation of a leaf parameter. This also helps in evaluating if a given parameter is leaf or not (a parameter can be FuncDefArgType too)

  5. Renamed "ParameterizedType" to "CompositeType". This was representing concrete serializable type so changed name to reflect so

  6. Fixed string representation of type to be uniform for fields having more than one type (map, struct) to have space between different parameters

  7. Enhanced existing PrecisionTimestampType, PrecisionTimestampTzType, StructType, ListType, MapType to indicate they are concrete parameterized type (added BaseString and ParameterString) method

  8. Separate "AnyType" from existing nonParamType. This will be helpful in match method in follow up PR to match argument against parameter

  9. Added API "HasSyncParams". This takes []FuncDefArgType as argument and returns true if any abstract leaf parameter is shared across different arguments. This will be used in Match API in follow up PR to act on such FunctionVariants

Copy link

codecov bot commented Aug 30, 2024

Codecov Report

Attention: Patch coverage is 76.26667% with 89 lines in your changes missing coverage. Please review.

Project coverage is 54.93%. Comparing base (58e4ba0) to head (7beaaeb).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
types/parser/type_parser.go 71.17% 40 Missing and 9 partials ⚠️
types/types.go 42.85% 16 Missing ⚠️
expr/functions.go 0.00% 7 Missing ⚠️
types/any_type.go 55.55% 4 Missing ⚠️
types/integer_parameters/variable_int_param.go 69.23% 4 Missing ⚠️
extensions/simple_extension.go 0.00% 3 Missing ⚠️
extensions/variants.go 90.00% 3 Missing ⚠️
types/precison_timestamp_types.go 66.66% 2 Missing ⚠️
types/integer_parameters/concrete_int_param.go 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #52      +/-   ##
==========================================
+ Coverage   54.53%   54.93%   +0.39%     
==========================================
  Files          27       35       +8     
  Lines        6355     6697     +342     
==========================================
+ Hits         3466     3679     +213     
- Misses       2697     2829     +132     
+ Partials      192      189       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@anshuldata anshuldata force-pushed the anshul/SupportParameterizedType branch from ff46f15 to 1a64c57 Compare August 30, 2024 10:43
* Separate AnyType. This will be helpful in match method
* Added support for ParameterizedFixedChar/VarChar/FixedBinary/Decimal
* Added parser support for Parameterized/PrecisionTimestamp/PrecisionTimestampTz
@anshuldata anshuldata force-pushed the anshul/SupportParameterizedType branch from 1a64c57 to 5beb08c Compare August 30, 2024 12:45
types/parameterized_decimal_type.go Show resolved Hide resolved
types/any_type.go Outdated Show resolved Hide resolved
Comment on lines 17 to 19
func (m *ParameterizedDecimalType) WithNullability(n Nullability) Type {
m.Nullability = n
return m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as previous comment, this should probably be a value receiver, not a pointer receiver

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this got converted back to a pointer receiver, please change it to a value receiver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deliberately converted it to pointer receiver. This is to catch error of using wrong type at compile time instead of run time.
Basically, if a interface is implemented with a pointer receiver. Value receiver won't implement this.
Since pointer of ParameterizedDecimalType implements "FuncDefArgType" so value receiver won't satisfy interface "FuncDefArgType"
I feel this is helpful if we have "FuncDefArgType" and need to switch case on "type". We can get compile error than
(sample program is here)

Let me know if this sounds ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that now, unlike with other types, calling WithNullability actually changes ParameterizedDecimalType as opposed to nearly all other cases where calling that just returns a copy with the nullability updated.

This inconsistency will be a problem for users

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In update PR, parameterized type don't implement WithNullability

Basically parametrised params implements separate interface "FundDefArgType" and som don't overload Type interface

types/parameterized_decimal_type.go Outdated Show resolved Hide resolved
types/parameterized_types.go Outdated Show resolved Hide resolved
types/parameterized_types.go Outdated Show resolved Hide resolved
types/parameterized_types.go Outdated Show resolved Hide resolved
types/any_type.go Outdated Show resolved Hide resolved
types/any_type.go Outdated Show resolved Hide resolved
@anshuldata anshuldata force-pushed the anshul/SupportParameterizedType branch from 2887a3c to 2a81d60 Compare September 3, 2024 08:02
Copy link
Contributor

@jacques-n jacques-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling with how this pattern extends to things like List, LIST, STRUCT<T1, T2>, etc.

types/parameterized_types.go Outdated Show resolved Hide resolved
@@ -3,6 +3,7 @@
package parser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally think we're just creating more debt by maintaining multiple parsers in the project rather than just using antlr everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will fix it in follow up PR to rely on antlr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an antlr implementation for Go that we can use which is easy to utilize and plug into the types?

Copy link
Contributor

@jacques-n jacques-n Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Antlr has generators for most languages, including go. Have used it extensively elsewhere and find it much more reliable/easy to maintain than writing parsers in multiple languages.

types/parameterized_types.go Outdated Show resolved Hide resolved
extensions/simple_extension.go Outdated Show resolved Hide resolved
@anshuldata anshuldata marked this pull request as draft September 10, 2024 05:21
@anshuldata anshuldata changed the title Support for Parameterized type <DRAFT>Support for Parameterized type Sep 10, 2024
@anshuldata anshuldata force-pushed the anshul/SupportParameterizedType branch 2 times, most recently from 647ad12 to a46c537 Compare September 10, 2024 10:09
@anshuldata anshuldata force-pushed the anshul/SupportParameterizedType branch from a46c537 to a6ef7c8 Compare September 10, 2024 11:24
@anshuldata
Copy link
Contributor Author

@jacques-n @zeroshade
Kindly re-review. I have overhauled this PR to support

  • nested parameterized type (struct, map, list)
  • support for parameterized decimal type where one of precision and scale may be concrete

I have captured changes in PR description. Kindly read it first to get an high level idea of changes.

Thanks

@anshuldata anshuldata changed the title <DRAFT>Support for Parameterized type Support for Parameterized type Sep 10, 2024
@anshuldata anshuldata marked this pull request as ready for review September 10, 2024 11:38
Copy link
Contributor

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a new pass on this, left a bunch of comments

types/parameter_types/abstract_parameter_type.go Outdated Show resolved Hide resolved
types/parameter_types/concrete_parameter_type.go Outdated Show resolved Hide resolved
types/parameter_types/leaf_parameter_type.go Outdated Show resolved Hide resolved
types/parameter_types/concrete_parameter_type.go Outdated Show resolved Hide resolved
types/parameter_types/leaf_parameter_type.go Outdated Show resolved Hide resolved
types/parameterized_single_integer_param_type.go Outdated Show resolved Hide resolved
types/parameterized_struct_type.go Outdated Show resolved Hide resolved
types/parameterized_struct_type.go Outdated Show resolved Hide resolved
types/parameterized_struct_type.go Outdated Show resolved Hide resolved
@@ -3,6 +3,7 @@
package parser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an antlr implementation for Go that we can use which is easy to utilize and plug into the types?

Copy link
Contributor

@jacques-n jacques-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overloading of Type here is wrong. You're upgrading Type to mean both concrete or parameterized. I think that has undesirable side effects and should be avoided.

A better way to approach is to keep Type to mean concrete types only. That's what users are going to be using in most of substrait-go. That's the only that exists in plans. Let's keep it that narrow.

We then have ParameterizedType. It could also be called FunctionDefinitionArgumentType since that it is the only place this concept needs to exist. This is a type that can be used in the definition of function arguments and can be parameterized or concrete. FunctionDefinitionArgumentType can't be used in plans and doesn't need to be converted to Protobuf binary values. In the protobuf files, this type is called ParameterizedType. Note that Parameterized types should not be confused with Compound Types which are types that are concrete types that have multiple components.

The third thing is what we call type expressions. Each function variant should have a type expression. This is mostly missing from substrait-go. That is the expression that is handed the concrete types and works with the FunctionDefinitionArgumentType values to figure things out.

Please update so that Type means what it means today and let's upgrade FunctionVariant to be defined using a new ParametertizedType or FunctionDefintitionArugmentType (or another name if you prefer).

A good test is all existing substrait-go type construction/manipulation code (outside of function variant construction) should have no changes. For example, this shouldn't be needed.

types/parameterized_struct_type.go Outdated Show resolved Hide resolved
* Made parameter of function argument as separate interface "FuncDefArgType"
@anshuldata
Copy link
Contributor Author

Please update so that Type means what it means today and let's upgrade FunctionVariant to be defined using a new ParametertizedType or FunctionDefintitionArugmentType (or another name if you prefer).

Updated PR to use interface "FuncDefArgType" for function parameters. Kindly check @jacques-n

Copy link
Contributor

@jacques-n jacques-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few small comments but looks pretty good from my pov.

extensions/simple_extension.go Outdated Show resolved Hide resolved
types/parameterized_single_integer_param_type.go Outdated Show resolved Hide resolved
types/leaf_parameters/leaf_parameter_type.go Outdated Show resolved Hide resolved
types/types.go Show resolved Hide resolved
@@ -369,14 +371,24 @@ type (
WithNullability(Nullability) Type
}

ParameterizedType interface {
// CompositeType this represents a concrete type having components
CompositeType interface {
Type
ParameterString() string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docs to these methods

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, how important is it that these methods are exported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added doc

these are used in function registry which is in separate package. Basically, to convert substrait type to local name and to initialize local type registry.

types/types.go Show resolved Hide resolved
@anshuldata anshuldata force-pushed the anshul/SupportParameterizedType branch from 60f3ad0 to e9e3cb7 Compare September 12, 2024 10:03
@anshuldata anshuldata force-pushed the anshul/SupportParameterizedType branch from e9e3cb7 to 7beaaeb Compare September 12, 2024 10:08
Copy link
Contributor

@jacques-n jacques-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the iterations!

@anshuldata
Copy link
Contributor Author

Kindly check updated PR @zeroshade.
If it gets merged I would like to rebase #53 with this PRs changes

Thanks

@jacques-n jacques-n merged commit 4265a21 into substrait-io:main Sep 12, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants