Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TableMetadataBuilder #587

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Conversation

c-thiel
Copy link
Collaborator

@c-thiel c-thiel commented Aug 26, 2024

This PR is now ready for first reviews.
Some Remarks:

  • In add_sort_order and add_partition_spec the Java code re-builds the added sort-order against the current schema by matching column names. This implementation currently does not do this. Adding this feature would require PartitionSpec (bound) to store the schema it was bound against (probably a good idea anyway) and split SortOrder in bound and unbound, where the bound SortOrder also stores the schema it was bound against. Instead, this implementation assumes that provided sort-orders and partition-specs are valid for the current schema. Compatibility with the current schema is tested.
  • In contrast to java, our add_schema method does not require a new_last_column_id argument. In java there is a todo to achieve the same. I put my reasoning in a comment in the code, feel free to comment on it.
  • I have a few specifics that would be good to discuss with a Java Maintainer with a Java Maintainer. Its not ready to be merged yet - please wait for my OK too :)
  • I had to change a few tests that used the builder under the hood - mostly due to the changed new() behaviour that now re-assignes field-ids to start from 0. Some tests started from 1 before. Re-assigning the ids, just like in Java, ensures that fresh metadata always has fresh and correct ids even if they are created manually or re-used from another metadata.

@c-thiel c-thiel marked this pull request as draft August 26, 2024 18:48
@c-thiel
Copy link
Collaborator Author

c-thiel commented Aug 30, 2024

ToDos:

  • First PR as discussion basis
  • Reserved Property handling
  • Finish discussion about SortOrder and PartitionSpec re-binding
  • Review from 1-2 Java Maintainers
  • Review from 1-2 Rust Maintainers

@c-thiel c-thiel marked this pull request as ready for review August 30, 2024 11:33
@c-thiel
Copy link
Collaborator Author

c-thiel commented Sep 3, 2024

Fixes #232

@c-thiel c-thiel changed the title WIP: TableMetadataBuilder TableMetadataBuilder Sep 3, 2024
@liurenjie1024
Copy link
Contributor

Thanks @c-thiel for this pr, I've skimmed through it and it looks great to me. However this pr is too huge to review(3k lines), would you mind to split them into smaller onces? For example, we can add one pr for methods involved in one TableUpdate action and add enough tests for it? Also it would be better to put refactoring TableMetadataBuilder in a standalone module a pr?

@c-thiel
Copy link
Collaborator Author

c-thiel commented Sep 6, 2024

Thanks for your Feedback @liurenjie1024. This isn't really a refactoring of the builder, it's more a complete rewrite. The old builder allowed to create corrupt metadata in various ways. Splitting it up by TableUpdate would not be straight forward - many tests also in other modules depend on building Metadata. Creating Metadata now always goes through builder methods - it's a different architecture that requires basic support for all methods from the beginning just to keep tests running.

I would currently prefer to keep it as a larger block mainly because:

  • I don't have much time currently and its going to be more effort
  • We would need to write auxiliary code to provide non-checked methods so that crate tests don't fail
  • The total timespan of merging 10 or so PRs is expected to be much larger than putting ~2 full days effort in from a Rust Maintainer and a Java Maintainer to review it as a block.
  • Patterns are repetitive and can be reviewed together in many cases
  • A lot of it are tests - the core builder are 1145 lines. Its long - but doable :)

We now have a vision of what it could look like in the end. Before putting any more effort in, we should answer the following questions:

  • Is the overall structure OK or should we head in a different direction?
  • My first point of the opening statement: Do we re-write our SortOrder and add the schema to PartitionSpec so that we can match on names like Java does or not?
  • My second point from the opening statement: How do we handle new_last_column_id

Those points might change the overall design quite a bit and might require a re-write of SortOrder first (split to bound and unbound).

After we answered those questions, and we still think splitting makes sense, I can try to find time to build stacked-PRs. Maybe just splitting normalization / validation in table_metadata.rs from the actual builder would be a leaner option than splitting every single TableUpdate?

@c-thiel
Copy link
Collaborator Author

c-thiel commented Sep 8, 2024

@liurenjie1024 I tried to cut a few things out - but not along the lines of TalbeUpdate. I hope that's OK?

  1. Feat: Normalize TableMetadata #611
  2. feat: partition compatibility #612
  3. feat: SortOrder methods should take schema ref if possible #613
  4. fix: Less Panics for Snapshot timestamps #614
  5. feat: Reassign field ids for schema #615

After they are all merged, I'll rebase this PR for the actual builder.

@liurenjie1024
Copy link
Contributor

Hi, @c-thiel Sorry for late reply.

Is the overall structure OK or should we head in a different direction?

I've went through the new builder and I think this is your design is the right direction.

My first point of the opening statement: Do we re-write our SortOrder and add the schema to PartitionSpec so that we can match on names like Java does or not?

To be honest, I don't quite understand the use case. We can ask for background of this in dev channel, but I think this is not a blocker of this pr, we can always add this later.

My second point from the opening statement: How do we handle new_last_column_id

I've took a look at the comments of these two prs: apache/iceberg#6701 apache/iceberg#7445

And I think the reason behavior is the last_column_id is optional, and we calculate it from highest field id when missing. But allowing user to pass last_column_id should be added to be compatible with current behavior, but this should be a feature which could be added later.

@liurenjie1024
Copy link
Contributor

Those points might change the overall design quite a bit and might require a re-write of SortOrder first (split to bound and unbound).

I agree that this should be required, as I mentioned in #550

@liurenjie1024
Copy link
Contributor

After we answered those questions, and we still think splitting makes sense, I can try to find time to build stacked-PRs. Maybe just splitting normalization / validation in table_metadata.rs from the actual builder would be a leaner option than splitting every single TableUpdate?

That sound reasonable to me. If one pr per table update is too much burden, could we split them by components, for example sort oder, partition spec, schema changes?

@c-thiel
Copy link
Collaborator Author

c-thiel commented Sep 9, 2024

@liurenjie1024 thanks for the Feedback!

My first point of the opening statement: Do we re-write our SortOrder and add the schema to PartitionSpec so that we can match on names like Java does or not?

To be honest, I don't quite understand the use case. We can ask for background of this in dev channel, but I think this is not a blocker of this pr, we can always add this later.

The problem in changing it later is that it changes the semantic of the function. Right now we expect source_id to match the current_schema() (which is also the reason why I expose it during build). Java doesn't do this, instead, it looks up the name by id in the schema bound to a SortOrder or PartitionSpec, and then searches for the same name in the new schema to use it.

In my opinion ids are much cleaner than names (we might have dropped and re-added a column with the same name in the meantime), so I am OK with going forward. However, moving over to java semantics will require new endpoints (i.e. add_migrate_partition_spec or so), which takes a bound partition spec in contrast to the unbound spec we currently pass in.

Give me a thumbs up if that's OK for you. I'll also open a discussion in the dev channel to get some more opinions.

@c-thiel
Copy link
Collaborator Author

c-thiel commented Sep 9, 2024

My second point from the opening statement: How do we handle new_last_column_id

I've took a look at the comments of these two prs: apache/iceberg#6701 apache/iceberg#7445

And I think the reason behavior is the last_column_id is optional, and we calculate it from highest field id when missing. But allowing user to pass last_column_id should be added to be compatible with current behavior, but this should be a feature which could be added later.

I don't think we should add the argument to be honest. My reasoning is as follows:
If specified, it could be a way to artificially increase last_assigned_field_id of a TableMetadata. I can't see any motivation behind that. Its just adds a source of confusion of what to specify here - and what to do if its wrong.
The only useful thing to do with it is to check for outdated TableMetadata at the time of constructing the Schema.
I added this check here:
https://github.com/apache/iceberg-rust/pull/587/files#diff-04f26c83b3c614be6f6d6cfb6c4cefef9e01ec2d31395ac487cdcdff2dbae729R442-R451

Maybe add @nastra or @Fokko could add some comments on the intention of that parameter?

@Xuanwo
Copy link
Member

Xuanwo commented Sep 9, 2024

I have reviewed most PRs that I am confident can be merged. The only one left is #615, for which I need more input.

@c-thiel
Copy link
Collaborator Author

c-thiel commented Nov 7, 2024

@Xuanwo, @liurenjie1024 this PR is ready for another round of review. It's now rebased on the 6 PRs we merged during the last months. The core logic is ~1100 lines of code, including quite a bit of comments.
Its not very straightforward to split it up further, as even the new function already requires a lot of the overall logic.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c-thiel I left a few comments, and suggestions for improvement, LMKWYT. Apart from that it looks great and good to go 👍

crates/iceberg/src/spec/table_metadata_builder.rs Outdated Show resolved Hide resolved
crates/iceberg/src/spec/table_metadata_builder.rs Outdated Show resolved Hide resolved
crates/iceberg/src/spec/table_metadata_builder.rs Outdated Show resolved Hide resolved
crates/iceberg/src/spec/table_metadata_builder.rs Outdated Show resolved Hide resolved
return Ok(self);
}

// ToDo Discuss: Java builds a fresh spec here:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just do this once, after that the schema is evolved using the UpdateSchema, that's implemented by the SchemaUpdate. There you pass in the last-column-id, and new fields will increment from that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is actually a very important point, and I think we should fix rust.
Currently in rust we use the SchemaBuilder for everything.
To get to it, we use an existing Schema and use the into_builder(self) method.

I believe we should modify this to into_builder(self, last_column_id: Option<i64>) so that users are forced to think about the last_column_id. Most likely they will want to use the last_column_id of the metadata they got the Schema from.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@Fokko Fokko Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning toward the other way, where the last-field-id is tracked by Iceberg-rust itself and nog exposed by the user. The last-schema-id is a monotonically increasing ID that keeps track of which field-IDs are already being used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do it completely internally, then we should probably not allow or discourage into_builder on schema.

I am unsure how an alternative API should look like. @liurenjie1024 some input would be very helpful.
My thoughts:

  1. Introduce TableMetadata.update_schema that returns the current SchemaBuilder, however with a fixed last-column-id (set to the TableMetadata value). We document that add_schema should only be used for schemas modified in this way to ensure that last_column_id is correct.
  2. If documentation is not enough, we could Introduce a new type SchemaUpdate that can only be created by calling TableMetadata.update_schema. It would hold the TableMetadataBuilder internally and have methods like add_field or remove_field. Upon build, it would return the TableMetadataBuilder.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we create a new table, we also reassign ids with this builder (and also in Lakekeeper).

I think we agree then that we should have a SchemaUpdate in iceberg-rust as well. The issue has already been created by Fokko. Because we don't have the SchemaUpdate now, we should keep into_builder() method of schemas for now.

REST Catalogs will need to use add_schema anyway even if the schema might be incompatible. This is why I need this method to stay public.

Going forward I propose:

  • Keep add_schema for now as there is no alternative
  • Implement Add SchemaUpdate logic to Iceberg-Rust #697, add corresponding better methods to the TableMetadataBuilder to mutate schema. Discourage the use of add_schema in comments, but leave the method public.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I agree that we should keep add_schema for now.
  2. I agree that we should implement the issue proposed by @Fokko , which will be part of our trable transaction api.
  3. I don't think we should discourage the use of add_schema since they have different responsibilities. TableMetadataBuilder do some necessary checks to avoid broke table metadata, but it doesn't guarantee that behavior. We should be careful with our transaction, which should guarantee to produce valid table metadata.

In fact, I'm thinking about make all TableMetadataBuilder methods crate private. Java makes them public since java's access modifier is limited and doesn't support complex path visiblity like rust. But I think these methods should be called by transaction api or applying TableUpdate. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When adding a word of caution to the method, I'm fine with keeping it public for now.

I know that this is quite late in the process here, but I think it might be complimentary to the builder we're seeing here. For PyIceberg we've taken a different approach, where we apply the MetadataUpdates (AddSchema, SetCurrentSchema, etc) to the metadata, rather than having a builder where you can modify the TableMetadata. Code can be found here: https://github.com/apache/iceberg-python/blob/60800d88c7e44fe2ed58a25f1b0fcd5927156adf/pyiceberg/table/update/__init__.py#L215-L494

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TableUpdate is currently only part of the rest catalog - not of the core spec.
I think we should keep them public unless we go switch to the python approach.

I added a word of warning that we do not check compatibility. Let me know if this is sufficient.
We should extend it with a link to the new method that is implemented in #697 .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TableUpdate is currently only part of the rest catalog - not of the core spec.

Just to mention that it's part of core crate is rust:

pub enum TableUpdate {
.

I think we should keep them public unless we go switch to the python approach.

In fact, the original design was following java/python approach, see

pub struct Transaction<'a> {
. I'm hesitating to make it public as making it crate private in future would be a breaking change. This method would be a foundation of implementing update schema transaction api.

Fokko added a commit to Fokko/iceberg that referenced this pull request Nov 11, 2024
I've added this to the spec a while ago:

apache#7445

But I think this was a mistake, and we should not expose this
to the public APIs, as it is much better to track this internally.

I noticed this while reviewing apache/iceberg-rust#587

Removing this as part of the APIs in Java, and the Open-API
update makes it much more resilient, and don't require the
clients to compute this value
Fokko added a commit to Fokko/iceberg that referenced this pull request Nov 11, 2024
Okay, I've added this to the spec a while ago:

apache#7445

But I think this was a mistake, and we should not expose this
to the public APIs, as it is much better to track this internally.

I noticed this while reviewing apache/iceberg-rust#587

Removing this as part of the APIs in Java, and the Open-API
update makes it much more resilient, and don't require the
clients to compute this value. For example. when there are two conflicting
schema changes, the last-column-id must be recomputed correctly when doing
the retry operation.
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @c-thiel for this great pr, generally LGTM! I've left some comments to improve, but overall looks great! Last but not least, I would highly suggest to split this into smaller one so that we can have more careful review about this. As you can see, we have some discussions around it which don't seem to have clear answer.

Comment on lines +191 to +205
pub fn current_schema(&self) -> &SchemaRef {
self.metadata.current_schema()
}

/// Get the current last column id
#[inline]
pub fn last_column_id(&self) -> i32 {
self.metadata.last_column_id
}

/// Get the current last updated timestamp
#[inline]
pub fn last_updated_ms(&self) -> i64 {
self.metadata.last_updated_ms
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As TableMetadataBuilder, I don't think we should expose these functions as public apis.

Copy link
Collaborator Author

@c-thiel c-thiel Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My motivation here was actually add_schema.
It's very hard to build a compatible schema if there is no way to determine the current schema. We could argue however that users should just track the schema externally.

I am happy with making it private, just wanted to give the reason why I added them.
If you still think that's better, let me know, then I remove them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make them private. Users are not supposed to call them directly, and are expected to go through transaction api, see java example here: https://github.com/apache/iceberg/blob/9aeea63f8b74a6e0a468baaf7c705c19df0ebfb7/core/src/main/java/org/apache/iceberg/SchemaUpdate.java#L48

I think pyiceberg follow similar approach.

crates/iceberg/src/spec/table_metadata_builder.rs Outdated Show resolved Hide resolved
crates/iceberg/src/spec/table_metadata_builder.rs Outdated Show resolved Hide resolved
crates/iceberg/src/spec/table_metadata_builder.rs Outdated Show resolved Hide resolved
self.metadata.snapshot_log.clear();
}

if self.metadata.refs.remove(ref_name).is_some() || ref_name == MAIN_BRANCH {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.metadata.refs.remove(ref_name).is_some() || ref_name == MAIN_BRANCH {
if self.metadata.refs.remove(ref_name).is_some() {

Though it's correct, I think we should not add this check?

Copy link
Collaborator Author

@c-thiel c-thiel Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep it actually.

A few lines above we have if ref_name == MAIN_BRANCH {}. Thus, if ref_name is main, then we mutate the internal state of the builder - even if the branch does not exist. Because we mutate the state of the builder, we should record the change.

So I think we have two choices:

  1. Leave the || ref_name == MAIN_BRANCH
  2. Only clear snapshot log & set last snapshot to None if main branch actually exists. To do this we modify the if above and do the change you proposed.

@Fokko I guess its a nit, but Java has the same problem:
If main does not exist the internal state of the builder is mutated but the change not recorded in changes:
https://github.com/apache/iceberg/blob/daa24f9c3a56e18d188097deb2dd79cc991c9a78/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1269-L1271

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that if ref_name == MAIN_BRANCH, we have changed it. My assumpation is that we always have MAIN_BRANCH since it's default branch? Please correct me if I'm wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check on the Java side, thanks for the pointer 👍

crates/iceberg/src/spec/table_metadata_builder.rs Outdated Show resolved Hide resolved
Fokko added a commit to Fokko/iceberg that referenced this pull request Nov 15, 2024
Okay, I've added this to the spec a while ago:

apache#7445

But I think this was a mistake, and we should not expose this
to the public APIs, as it is much better to track this internally.

I noticed this while reviewing apache/iceberg-rust#587

Removing this as part of the APIs in Java, and the Open-API
update makes it much more resilient, and don't require the
clients to compute this value. For example. when there are two conflicting
schema changes, the last-column-id must be recomputed correctly when doing
the retry operation.
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished second round review. There are some files I haven't check yet, need another review.

Comment on lines +192 to +206
pub fn current_schema(&self) -> &SchemaRef {
self.metadata.current_schema()
}

/// Get the current last column id
#[inline]
pub fn last_column_id(&self) -> i32 {
self.metadata.last_column_id
}

/// Get the current last updated timestamp
#[inline]
pub fn last_updated_ms(&self) -> i64 {
self.metadata.last_updated_ms
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make all these three methods as private.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment here: #587 (comment)

/// # Errors
/// - The ref is unknown.
/// - Any of the preconditions of `self.add_snapshot` are not met.
pub fn append_snapshot(self, snapshot: Snapshot, ref_name: Option<&str>) -> Result<Self> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. In fact that methods is a wrong design for me. The implementation lacks some necessary check, and it's inconsistent with other parts, e.g. TableMetadata is designed to be immutable, and all modifications should go throught TableMetadataBuilder. I think we should make that api as deprecated, and remove it in future.
  2. Yes, but it's weird when it has a branch argument. And the reason I suggest this is that I want to keep api consistent with java implementation.
  3. Similar to 2, I want to keep api name consistent to java. And you are right the name could be either branch or tag. I'm ok with argument name, but I prefer to have function name consistent with java api.


/// Append a snapshot to the specified branch.
/// If branch is not specified, the snapshot is appended to the main branch.
/// The `ref` must already exist. Retention settings from the `ref` are re-used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in iceberg's term tag is a special branch, while ref typicall means SnapshotRef which includes more things like retention time, etc. I've updated it to "branch or tag", what do you think?

crates/iceberg/src/spec/table_metadata_builder.rs Outdated Show resolved Hide resolved
self.metadata.snapshot_log.clear();
}

if self.metadata.refs.remove(ref_name).is_some() || ref_name == MAIN_BRANCH {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that if ref_name == MAIN_BRANCH, we have changed it. My assumpation is that we always have MAIN_BRANCH since it's default branch? Please correct me if I'm wrong.

return Ok(self);
}

// ToDo Discuss: Java builds a fresh spec here:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I agree that we should keep add_schema for now.
  2. I agree that we should implement the issue proposed by @Fokko , which will be part of our trable transaction api.
  3. I don't think we should discourage the use of add_schema since they have different responsibilities. TableMetadataBuilder do some necessary checks to avoid broke table metadata, but it doesn't guarantee that behavior. We should be careful with our transaction, which should guarantee to produce valid table metadata.

In fact, I'm thinking about make all TableMetadataBuilder methods crate private. Java makes them public since java's access modifier is limited and doesn't support complex path visiblity like rust. But I think these methods should be called by transaction api or applying TableUpdate. What do you think?

@c-thiel
Copy link
Collaborator Author

c-thiel commented Nov 15, 2024

@liurenjie1024 for set_branch_snapshot I was missing an easy way to determine the type of a ref.
I introduced an enum ReferenceType, but if you don't like that, I can also just implement is_branch for SnapshotReference.

@c-thiel
Copy link
Collaborator Author

c-thiel commented Nov 16, 2024

@liurenjie1024 for set_branch_snapshot I was missing an easy way to determine the type of a ref. I introduced an enum ReferenceType, but if you don't like that, I can also just implement is_branch for SnapshotReference.

Slept over it and ReferenceType felt wrong - removed it.

/// Remove snapshots by its ids from the table metadata.
/// Does nothing if a snapshot id is not present.
/// Keeps as changes only the snapshots that were actually removed.
pub fn remove_snapshots(mut self, snapshot_ids: &[i64]) -> Self {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently when removing snapshots we might have parent_snapshot_ids of other snapshots pointing to non-existing snapshots. This follows the behavior in Java.

Is this desirable? Or would it be more correct to set them to None as there is no parent available anymore?
@Fokko

@Fokko Fokko mentioned this pull request Nov 18, 2024
30 tasks
/// # Errors
/// - The ref is unknown.
/// - Any of the preconditions of `self.add_snapshot` are not met.
pub fn append_snapshot(self, snapshot: Snapshot, ref_name: Option<&str>) -> Result<Self> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for misclarification, I mean deprecating the append_snapshot method in TableMetadata, and planning to remove it in future release. I think adding #[deprecate] annotation would be enough since cargo will warn about this. Also I think we should remove this method in TableMetadataBuilder?

@@ -912,14 +950,14 @@ mod tests {
#[test]
fn test_check_last_assigned_partition_id() {
let metadata = metadata();

println!("{:?}", metadata.last_partition_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this debug statement?

@@ -376,6 +375,24 @@ impl Schema {
pub fn accessor_by_field_id(&self, field_id: i32) -> Option<Arc<StructAccessor>> {
self.field_id_to_accessor.get(&field_id).cloned()
}

/// Check if this schema is identical to another schema semantically - excluding schema id.
pub(crate) fn is_same_schema(&self, other: &SchemaRef) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not implement PartialEq, Eq trait?

}
}

impl From<TableMetadataBuildResult> for TableMetadata {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need this? metadata field already exposed as public field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants