-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Add Variant data type #11324
base: main
Are you sure you want to change the base?
API: Add Variant data type #11324
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -38,6 +38,8 @@ class Identity<T> implements Transform<T, T> { | |
*/ | ||
@Deprecated | ||
public static <I> Identity<I> get(Type type) { | ||
Preconditions.checkArgument(!type.isVariantType(), "Unsupported type for identity: %s", type); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't we need to fix "canTransform" as well? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nvm I see that's already covered since "variant" is not considered a primitive There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Preconditions.checkArgument(type.typeId() != Types.VariantType); So we can avoid adding isVariantType to the interface There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since Variant is not a primitive type, canTransform() will be false. |
||
|
||
return new Identity<>(type); | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -45,7 +45,8 @@ enum TypeID { | |
DECIMAL(BigDecimal.class), | ||
STRUCT(StructLike.class), | ||
LIST(List.class), | ||
MAP(Map.class); | ||
MAP(Map.class), | ||
VARIANT(Object.class); | ||
|
||
private final Class<?> javaClass; | ||
|
||
|
@@ -92,6 +93,10 @@ default boolean isListType() { | |
return false; | ||
} | ||
|
||
default boolean isVariantType() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is only used in a deprecated method, do we have any other reason to add this? I think it probably doesn't need to be apart of the type interface. We could always just check if the type is VARIANT There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah. This is more a helper function instead of checking if the type is VARIANT. Actually it will be used in some other places later. |
||
return false; | ||
} | ||
|
||
default boolean isMapType() { | ||
return false; | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -534,6 +534,7 @@ private static int estimateSize(Type type) { | |
case FIXED: | ||
return ((Types.FixedType) type).length(); | ||
case BINARY: | ||
case VARIANT: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the rationale for this size? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can't have the accurate size for Variant similar to Binary. So I use the same value as Binary. I'm wondering how we come up with 80 for Binary. |
||
return 80; | ||
case DECIMAL: | ||
// 12 (header) + (12 + 12 + 4) (BigInteger) + 4 (scale) = 44 bytes | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1687,6 +1687,44 @@ public void testV3TimestampNanoTypeSupport() { | |
3); | ||
} | ||
|
||
@Test | ||
public void testV3VariantTypeSupport() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know this is copying a lot of tests in this class but we should start future proofing a bit more imho. We also have tests around this sort of thing in TestSchema.java. I think it is probably ok to just keep all of our schema validation tests there, but it wouldn't hurt to have some redundancy here as well. Refactoring the whole suite can come in another pr but I think we should build a templated test that's something like @ParameterizedTest
@ValueSource(types = {Types.TimetstampNanos, Types.Variant, ....})
testTypeSupport(Type type) {
Schema schemaWithType = new Schema(
Types.NestedField.required(1, "id", Types.LongType.get()),
Types.NestedField.optional(2, type.name, type),
Types.NestedField.optional(3, "arr", Types.ListType.ofRequired(4, type)),
Types.NestedField.required(5, "struct",
Types.StructType.of(
Types.NestedField.optional(6, "inner_" + type.name, type),
Types.NestedField.required(7, "data", Types.StringType.get()))),
Types.NestedField.optional(8, "struct_arr",
Types.StructType.of(
Types.NestedField.optional(9, "ts", type))));
//Psuedo code here
from 0 -> MIN_FORMAT_VERSION.get(type)
fail to make metadata
from MIN_FORMAT_VERSION.get(type) -> SUPPORTED_TABLE_VERSION
succeed
} The most important part about this is that we wouldn't have to continually update tests every time a new valid metadata version is added. It also would be much easier to test type compatibility. (I'm thinking that Geo is going to need the exact same thing soon) For this PR I think it is enough to write a parameterized version for just Variant, then we could raise another PR to add in nanos and remove the redundant tests There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not this is part of my goal to remove all tests that have V3 or V2 in their title. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also todo add Variant to Schema.MIN_FORMAT_VERSION There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @RussellSpitzer for already creating a PR to address that. I will include that when it's merged. |
||
Schema v3Schema = | ||
new Schema( | ||
Types.NestedField.required(3, "id", Types.LongType.get()), | ||
Types.NestedField.required(4, "data", Types.StringType.get()), | ||
Types.NestedField.required( | ||
5, | ||
"struct", | ||
Types.StructType.of(Types.NestedField.optional(6, "v", Types.VariantType.get())))); | ||
|
||
for (int unsupportedFormatVersion : ImmutableList.of(1, 2)) { | ||
assertThatThrownBy( | ||
() -> | ||
TableMetadata.newTableMetadata( | ||
v3Schema, | ||
PartitionSpec.unpartitioned(), | ||
SortOrder.unsorted(), | ||
TEST_LOCATION, | ||
ImmutableMap.of(), | ||
unsupportedFormatVersion)) | ||
.isInstanceOf(IllegalStateException.class) | ||
.hasMessage( | ||
"Invalid schema for v%s:\n" | ||
+ "- Invalid type for struct.v: variant is not supported until v3", | ||
unsupportedFormatVersion); | ||
} | ||
|
||
// should be allowed in v3 | ||
TableMetadata.newTableMetadata( | ||
v3Schema, | ||
PartitionSpec.unpartitioned(), | ||
SortOrder.unsorted(), | ||
TEST_LOCATION, | ||
ImmutableMap.of(), | ||
3); | ||
} | ||
|
||
@Test | ||
public void onlyMetadataLocationIsUpdatedWithoutTimestampAndMetadataLogEntry() { | ||
String uuid = "386b9f01-002b-4d8c-b77f-42c3fd3b7c9b"; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is okay but we may be missing information by not sanitizing based on the variant's type (i.e. date) and it would be nice to have some idea of the structure in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting something like
to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be a nice feature but probably ok for it's own issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for helping me understand the concept. I have filed a followup issue #11479 and will work on separately.