-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Fix] Allow Partition data to be nullable in ManifestEntry #509
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
pyiceberg/manifest.py
Outdated
@@ -308,6 +308,7 @@ def data_file_with_partition(partition_type: StructType, format_version: Literal | |||
field_id=field.field_id, | |||
name=field.name, | |||
field_type=partition_field_to_data_file_partition_field(field.field_type), | |||
required=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of carrying this over from the field?
required=False, | |
required=field.required, |
I would prefer this because Avro schema's are translated to Iceberg, and then used in reading the files. In Avro reading an optional field is different than reading a required field (in the case of an optional field, it will first read a boolean checking if the value is there).
I checked locally in tests, and there the fields are not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good @Fokko . I've made the corresponding change to partition_type to carry this over from the partition field as suggested 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small suggestion, apart from that, this looks great! Thanks for catching this @syun64 and @jqin61 and splitting it out 👍
Similar to how the partition fields are already initialized with
required=False
inpartition_type
, we should setrequired=False
in the manifest_entry schema within the ManifestFile.This PR introduces this fix first from @jqin61 's WIP PR, so other PRs that add partition values to manifest files can be unblocked.
Without this change, we cannot serialize None partition values into the ManifestEntry in the avro file.
Below WIP PRs require this change
Partitioned Write: #353
Add Files: #506