Support custom partitioning schemes based on "patterns" in the OPTIONS files parameter of CREATE FOREIGN TABLE #79

pdpark · 2024-08-14T21:11:41Z

What feature are you requesting?

The ability to specify a custom partitioning scheme through the use of a pattern in the files option when creating foreign tables, like this:

CREATE FOREIGN TABLE my_table ()
SERVER parquet_server
OPTIONS (files 's3://bucket/data_{id_1}_{id_2}.parquet')

Why are you requesting this feature?

To support existing custom partitioning scheme.

What is your proposed implementation for this feature?

Foreign tables could be created like this:

CREATE FOREIGN TABLE my_table ()
SERVER parquet_server
OPTIONS (files 's3://bucket/data_{id_1}_{id_2}.parquet')

...or this:

CREATE FOREIGN TABLE my_table ()
SERVER parquet_server
OPTIONS (files 's3://bucket/data_{id_1}_*.parquet')

The values in brackets must correspond with column names defined in the referenced parquet files or the statement will fail.

When running a query like this on the first table defined above:

select *
from my_table
where id_1 = '1234'
and id_2 = '0987'

...the id_1 and id_2 column values from the sql where clause will be substituted into the files pattern producing a string that must correspond with an actual parquet file at the specified s3 location:

s3://bucket/data_1234_0987.parquet

A query on the second table table defined above:

select *
from my_table
where id_1 = '1234'

...will produce a files pattern after substitution that looks like this:

s3://bucket/data_1234_*.parquet

Full Name:

Patrick Park

Affiliation:

Payzer

The text was updated successfully, but these errors were encountered:

Weijun-H · 2024-09-22T11:06:32Z

~~It seems that we could introduce hive_partitioning setting to fix this ticket.~~

philippemnoel · 2024-09-22T11:28:57Z

~~It seems that we could introduce hive_partitioning setting to fix this ticket.~~

@shamb0 has made a PR to document hive partitioned, we just need to review and merge it. As for custom partitioning scheme that is not Hive, ~~I'm not convinced we want to expose that as it is probably an edge case. Unless you have an idea that we haven't considered~~

EDIT: We're still open to considering this, but are waiting for more user requests

Weijun-H · 2024-09-22T12:50:26Z

~~It seems that we could introduce hive_partitioning setting to fix this ticket.~~

@shamb0 has made a PR to document hive partitioned, we just need to review and merge it. As for custom partitioning scheme that is not Hive, I'm not convinced we want to expose that as it is probably an edge case. Unless you have an idea that we haven't considered

I see, it makes sense to me.
Btw we should add an example in https://docs.paradedb.com/ingest/import/parquet#parquet-options for the hive partitioned

philippemnoel · 2024-09-22T13:07:05Z

~~It seems that we could introduce hive_partitioning setting to fix this ticket.~~

@shamb0 has made a PR to document hive partitioned, we just need to review and merge it. As for custom partitioning scheme that is not Hive, I'm not convinced we want to expose that as it is probably an edge case. Unless you have an idea that we haven't considered

I see, it makes sense to me. Btw we should add an example in https://docs.paradedb.com/ingest/import/parquet#parquet-options for the hive partitioned

Agreed

philippemnoel added feature New feature or request good first issue Good for newcomers priority-high High priority issue labels Aug 23, 2024

philippemnoel added priority-low Low priority issue question Further information is requested user-request This issue was directly requested by a user and removed good first issue Good for newcomers priority-high High priority issue labels Sep 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support custom partitioning schemes based on "patterns" in the OPTIONS files parameter of CREATE FOREIGN TABLE #79

Support custom partitioning schemes based on "patterns" in the OPTIONS files parameter of CREATE FOREIGN TABLE #79

pdpark commented Aug 14, 2024 •

edited

Loading

Weijun-H commented Sep 22, 2024 •

edited

Loading

philippemnoel commented Sep 22, 2024 •

edited

Loading

Weijun-H commented Sep 22, 2024

philippemnoel commented Sep 22, 2024

Support custom partitioning schemes based on "patterns" in the OPTIONS files parameter of CREATE FOREIGN TABLE #79

Support custom partitioning schemes based on "patterns" in the OPTIONS files parameter of CREATE FOREIGN TABLE #79

Comments

pdpark commented Aug 14, 2024 • edited Loading

What feature are you requesting?

Why are you requesting this feature?

What is your proposed implementation for this feature?

Full Name:

Affiliation:

Weijun-H commented Sep 22, 2024 • edited Loading

philippemnoel commented Sep 22, 2024 • edited Loading

Weijun-H commented Sep 22, 2024

philippemnoel commented Sep 22, 2024

pdpark commented Aug 14, 2024 •

edited

Loading

Weijun-H commented Sep 22, 2024 •

edited

Loading

philippemnoel commented Sep 22, 2024 •

edited

Loading