-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Add Variant data type #11324
base: main
Are you sure you want to change the base?
API: Add Variant data type #11324
Conversation
api/src/test/java/org/apache/iceberg/types/TestSerializableTypes.java
Outdated
Show resolved
Hide resolved
3a3282d
to
15d8ed6
Compare
15d8ed6
to
15b2b71
Compare
ca226ee
to
1d3056b
Compare
1d3056b
to
d4af2b3
Compare
@@ -534,6 +534,7 @@ private static int estimateSize(Type type) { | |||
case FIXED: | |||
return ((Types.FixedType) type).length(); | |||
case BINARY: | |||
case VARIANT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the rationale for this size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't have the accurate size for Variant similar to Binary. So I use the same value as Binary. I'm wondering how we come up with 80 for Binary.
@@ -562,7 +563,7 @@ private static String sanitize(Literal<?> literal, long now, int today) { | |||
} else if (literal instanceof Literals.DoubleLiteral) { | |||
return sanitizeNumber(((Literals.DoubleLiteral) literal).value(), "float"); | |||
} else { | |||
// for uuid, decimal, fixed, and binary, match the string result | |||
// for uuid, decimal, fixed, variant, and binary, match the string result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is okay but we may be missing information by not sanitizing based on the variant's type (i.e. date) and it would be nice to have some idea of the structure in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting something like
{
foo:
bar: 3
baz:
bozz: "flew"
}
to
{
(hash-foo):
(hash-bar) : (1 digit number)
(hash-baz) :
(hash-bozz) : (hash-xaxa)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be a nice feature but probably ok for it's own issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for helping me understand the concept. I have filed a followup issue #11479 and will work on separately.
d4af2b3
to
5c07698
Compare
5c07698
to
2bb5f77
Compare
2bb5f77
to
8d1df0b
Compare
ffb3347
to
db8dd1c
Compare
db8dd1c
to
327736b
Compare
… full implementation. Block Transforms, SortOrder
327736b
to
82f2e7b
Compare
@@ -38,6 +38,8 @@ class Identity<T> implements Transform<T, T> { | |||
*/ | |||
@Deprecated | |||
public static <I> Identity<I> get(Type type) { | |||
Preconditions.checkArgument(!type.isVariantType(), "Unsupported type for identity: %s", type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we need to fix "canTransform" as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm I see that's already covered since "variant" is not considered a primitive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preconditions.checkArgument(type.typeId() != Types.VariantType);
So we can avoid adding isVariantType to the interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Variant is not a primitive type, canTransform() will be false.
@@ -92,6 +93,10 @@ default boolean isListType() { | |||
return false; | |||
} | |||
|
|||
default boolean isVariantType() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only used in a deprecated method, do we have any other reason to add this? I think it probably doesn't need to be apart of the type interface. We could always just check if the type is VARIANT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. This is more a helper function instead of checking if the type is VARIANT. Actually it will be used in some other places later.
@@ -1687,6 +1687,44 @@ public void testV3TimestampNanoTypeSupport() { | |||
3); | |||
} | |||
|
|||
@Test | |||
public void testV3VariantTypeSupport() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is copying a lot of tests in this class but we should start future proofing a bit more imho. We also have tests around this sort of thing in TestSchema.java. I think it is probably ok to just keep all of our schema validation tests there, but it wouldn't hurt to have some redundancy here as well.
Refactoring the whole suite can come in another pr but I think we should build a templated test that's something like
@ParameterizedTest
@ValueSource(types = {Types.TimetstampNanos, Types.Variant, ....})
testTypeSupport(Type type) {
Schema schemaWithType = new Schema(
Types.NestedField.required(1, "id", Types.LongType.get()),
Types.NestedField.optional(2, type.name, type),
Types.NestedField.optional(3, "arr", Types.ListType.ofRequired(4, type)),
Types.NestedField.required(5, "struct",
Types.StructType.of(
Types.NestedField.optional(6, "inner_" + type.name, type),
Types.NestedField.required(7, "data", Types.StringType.get()))),
Types.NestedField.optional(8, "struct_arr",
Types.StructType.of(
Types.NestedField.optional(9, "ts", type))));
//Psuedo code here
from 0 -> MIN_FORMAT_VERSION.get(type)
fail to make metadata
from MIN_FORMAT_VERSION.get(type) -> SUPPORTED_TABLE_VERSION
succeed
}
The most important part about this is that we wouldn't have to continually update tests every time a new valid metadata version is added. It also would be much easier to test type compatibility. (I'm thinking that Geo is going to need the exact same thing soon)
For this PR I think it is enough to write a parameterized version for just Variant, then we could raise another PR to add in nanos and remove the redundant tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not this is part of my goal to remove all tests that have V3 or V2 in their title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also todo add Variant to Schema.MIN_FORMAT_VERSION
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @RussellSpitzer for already creating a PR to address that. I will include that when it's merged.
Add Variant data type to API module. We only add limited required type to API for now and we will promote stable methods to API after full implementation to the interface
VariantLike
later.Fix #11178.