feat: support create partition table for non REST catalog #577

FANNG1 · 2024-08-23T15:20:13Z

fixes: #578

liurenjie1024 · 2024-08-30T02:36:00Z

Hi, @FANNG1 Thanks for your contribution. The reason why we use UnboundPartitionSpec rather PartitionSpec is to simplify the usage of this method. PartitionSpec is bound to a schema, and spec id, field ids are supposed to be meaningful. These ids are not supposed to be passed by user, but by the catalog implementation.

FANNG1 · 2024-08-30T10:36:54Z

Hi, @FANNG1 Thanks for your contribution. The reason why we use UnboundPartitionSpec rather PartitionSpec is to simplify the usage of this method. PartitionSpec is bound to a schema, and spec id, field ids are supposed to be meaningful. These ids are not supposed to be passed by user, but by the catalog implementation.

I agree that PartitionSpec is bound to a schema which is not user friendly, but when building UnboundPartitionSpec, we need source column id not name, it seems not friendly too? How about Using a separate TableRequestBuilder to build the TableCreation, in which PartitionSpec is generated by a more user friendly interfaces, WDYT?

pub struct TableCreation {
    /// The name of the table.
    pub name: String,
    /// The location of the table.
    #[builder(default, setter(strip_option))]
    pub location: Option<String>,
    /// The schema of the table.
    pub schema: Schema,
    /// The partition spec of the table, could be None.
    #[builder(default, setter(strip_option, into))]
    pub partition_spec: Option<PartitionSpec>,
    /// The sort order of the table.
    #[builder(default, setter(strip_option))]
    pub sort_order: Option<SortOrder>,
    /// The properties of the table.
    #[builder(default)]
    pub properties: HashMap<String, String>,
}

liurenjie1024 · 2024-09-02T02:31:28Z

The reason we use PartitionSpec was that some fields, such as spec id, partition field id should not be passed by user when creating a table. But I think you are right when a user wants to create partition spec of a table, it would be more nature to build against a schema. This maybe confusing to user because their spec id and partition field id will be lost, it seems that there is no type safe approach to ensure, and we need to add some doc to explain it.

liurenjie1024

Thanks @FANNG1 for this pr. However I think there are some prepartion work before we can actually finished this pr. If we can narrow down the goal of this pr to change type for UnboundPartitonSpec to PartitionSpec only, we can merge this.

liurenjie1024 · 2024-09-02T08:06:49Z

crates/iceberg/src/spec/table_metadata.rs

-                    "Can't create table with partition spec now",
-                ))
-            }
+            Some(p) => HashMap::from([(p.spec_id(), Arc::new(p))]),


This is incorrect, we can't simply add a spec. We should use AddPartitionSpec transaction api.

sorry, I couldn't understand AddPartitionSpec transaction api, could you provide more context?

You can refer to iceberg-python's implementation The whole table creation process is treated as a transaction, similar to transactions.

liurenjie1024 · 2024-09-02T08:09:16Z

crates/iceberg/tests/partition_table_test.rs

+    }
+
+    #[tokio::test]
+    async fn test_partition_table() {


The tests dir is integration tests for iceberg crate. I think this is create partitioned table should apply to all catalogs, rather memory catalog only.

Do you mean we should test again for all catalogs, like Hive, REST, JDBC?

Yes, I think it would be easier if we could finish #519 first.

liurenjie1024 · 2024-09-02T08:11:29Z

crates/examples/src/rest_catalog_table.rs

Some suggestions:

Please don't remove ANCHOR comments, those are for websites.

It would be better to add another create partitioned table example rather modifying current one.

FANNG1 · 2024-09-02T10:55:03Z

Thanks @FANNG1 for this pr. However I think there are some prepartion work before we can actually finished this pr. If we can narrow down the goal of this pr to change type for UnboundPartitonSpec to PartitionSpec only, we can merge this.

@liurenjie1024 , thanks for your review, let's keep consistent with what means change type for UnboundPartitonSpec to PartitionSpec only. The PR consistent serval parts:

Change UnboundPartitonSpec to PartitionSpec in TableCreation
Use PartitionSpec to build the table metadata
Integration tests about creating and showing partition table
Examples about create partition table and read data from table.

What you mean is just keeping the first part?

liurenjie1024 · 2024-09-04T09:26:24Z

Thanks @FANNG1 for this pr. However I think there are some prepartion work before we can actually finished this pr. If we can narrow down the goal of this pr to change type for UnboundPartitonSpec to PartitionSpec only, we can merge this.

@liurenjie1024 , thanks for your review, let's keep consistent with what means change type for UnboundPartitonSpec to PartitionSpec only. The PR consistent serval parts:

Change UnboundPartitonSpec to PartitionSpec in TableCreation

Use PartitionSpec to build the table metadata

Integration tests about creating and showing partition table

Examples about create partition table and read data from table.

What you mean is just keeping the first part?

Yes, I think keeping only part 1 is a good start. Add PartitionSpec is sth how similar to a transaction with two actions:

Add partition spec
Set default partition spec.

I think it's better to implement them in transaction api.

FANNG1 marked this pull request as draft August 23, 2024 15:20

FANNG1 added 2 commits August 23, 2024 23:46

support partition table

f56bb4f

xx

5309d25

FANNG1 force-pushed the partition branch from 022c953 to 5309d25 Compare August 24, 2024 04:06

FANNG1 added 3 commits August 24, 2024 15:03

xx

1580194

xx

44e9292

xx

62a6559

FANNG1 marked this pull request as ready for review August 24, 2024 09:00

FANNG1 changed the title ~~[SIP] support create partition table for non REST catalog~~ feat: support create partition table for non REST catalog Aug 24, 2024

liurenjie1024 reviewed Sep 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support create partition table for non REST catalog #577

feat: support create partition table for non REST catalog #577

FANNG1 commented Aug 23, 2024 •

edited

Loading

liurenjie1024 commented Aug 30, 2024

FANNG1 commented Aug 30, 2024 •

edited

Loading

liurenjie1024 commented Sep 2, 2024

liurenjie1024 left a comment

liurenjie1024 Sep 2, 2024

FANNG1 Sep 2, 2024

liurenjie1024 Sep 4, 2024

liurenjie1024 Sep 2, 2024

FANNG1 Sep 2, 2024

liurenjie1024 Sep 4, 2024

liurenjie1024 Sep 2, 2024

FANNG1 commented Sep 2, 2024

liurenjie1024 commented Sep 4, 2024

feat: support create partition table for non REST catalog #577

Are you sure you want to change the base?

feat: support create partition table for non REST catalog #577

Conversation

FANNG1 commented Aug 23, 2024 • edited Loading

liurenjie1024 commented Aug 30, 2024

FANNG1 commented Aug 30, 2024 • edited Loading

liurenjie1024 commented Sep 2, 2024

liurenjie1024 left a comment

Choose a reason for hiding this comment

liurenjie1024 Sep 2, 2024

Choose a reason for hiding this comment

FANNG1 Sep 2, 2024

Choose a reason for hiding this comment

liurenjie1024 Sep 4, 2024

Choose a reason for hiding this comment

liurenjie1024 Sep 2, 2024

Choose a reason for hiding this comment

FANNG1 Sep 2, 2024

Choose a reason for hiding this comment

liurenjie1024 Sep 4, 2024

Choose a reason for hiding this comment

liurenjie1024 Sep 2, 2024

Choose a reason for hiding this comment

FANNG1 commented Sep 2, 2024

liurenjie1024 commented Sep 4, 2024

FANNG1 commented Aug 23, 2024 •

edited

Loading

FANNG1 commented Aug 30, 2024 •

edited

Loading