Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Fix] Allow Partition data to be nullable in ManifestEntry #509

Merged
merged 2 commits into from
Mar 11, 2024

Conversation

sungwy
Copy link
Collaborator

@sungwy sungwy commented Mar 9, 2024

Similar to how the partition fields are already initialized with required=False in partition_type, we should set required=False in the manifest_entry schema within the ManifestFile.

This PR introduces this fix first from @jqin61 's WIP PR, so other PRs that add partition values to manifest files can be unblocked.

Without this change, we cannot serialize None partition values into the ManifestEntry in the avro file.

Below WIP PRs require this change
Partitioned Write: #353
Add Files: #506

@sungwy sungwy requested a review from Fokko March 9, 2024 22:52
Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

@@ -308,6 +308,7 @@ def data_file_with_partition(partition_type: StructType, format_version: Literal
field_id=field.field_id,
name=field.name,
field_type=partition_field_to_data_file_partition_field(field.field_type),
required=False,
Copy link
Contributor

@Fokko Fokko Mar 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of carrying this over from the field?

Suggested change
required=False,
required=field.required,

I would prefer this because Avro schema's are translated to Iceberg, and then used in reading the files. In Avro reading an optional field is different than reading a required field (in the case of an optional field, it will first read a boolean checking if the value is there).

I checked locally in tests, and there the fields are not required.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good @Fokko . I've made the corresponding change to partition_type to carry this over from the partition field as suggested 👍

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small suggestion, apart from that, this looks great! Thanks for catching this @syun64 and @jqin61 and splitting it out 👍

@Fokko Fokko merged commit 0ab3262 into apache:main Mar 11, 2024
7 checks passed
@sungwy sungwy deleted the data-file-partition-nullable branch March 11, 2024 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants