Tracking: Ad-hoc(batch) ingestion #18583

st1page · 2024-09-18T07:59:08Z

kwannoel · 2024-09-24T05:04:17Z

Hi, I will help with this issue, starting with TVFs.

xxchan · 2024-09-27T08:09:03Z

Have we reached consensus to support TVFs? To me, their use cases are duplicated with Sources, so they seem to be unnecessary.

I’d like to see rationales and examples where they are more useful than sources before adding them

st1page · 2024-09-27T09:09:13Z

Have we reached consensus to support TVFs? To me, their use cases are duplicated with Sources, so they seem to be unnecessary.

I’d like to see rationales and examples where they are more useful than sources before adding them

The original discussion of the discussion here https://risingwave-labs.slack.com/archives/C036F5Z3EMD/p1720166956241589
Additionally, One case is for the Ad-hoc ingestion from databases. Currently we only support the CDC table and can not create a source on a external databases's table. So only TVF is clear defined method to do ad hoc ingest from Databases. We can refer to the grammer of duckDB for the cases
- https://duckdb.org/docs/extensions/postgres.html#the-postgres_query-table-function
- https://duckdb.org/docs/extensions/mysql#the-mysql_query-table-function

xxchan · 2024-09-27T09:15:54Z

Thanks for the explanation!

Currently we only support the CDC table and can not create a source on a external databases's table.

Makes me think whether also related with other shared source e.g., Kafka?

We can refer to the grammer of duckDB for the cases

Compared with duckDB

They don't have source at all. So it might be a little different
Their syntax contains a ATTACH, which looks like CREATE CONNECTION we might have in the future. So maybe we should design that first.

ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE POSTGRES);
SELECT * FROM postgres_query('postgres_db', 'SELECT * FROM cars LIMIT 3');

st1page · 2024-09-27T09:18:19Z

Currently we only support the CDC table and can not create a source on a external databases's table.

Makes me think whether also related with other shared source e.g., Kafka?

The issue is not related to "shared" but it is beacuse the CDC source contains multiple tables' changes. Actually that is a "CONNECTION"

st1page · 2024-09-27T09:19:19Z

Compared with duckDB

They don't have source at all. So it might be a little different

Their syntax contains a ATTACH, which looks like CREATE CONNECTION we might have in the future. So maybe we should design that first.
ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE POSTGRES);
SELECT * FROM postgres_query('postgres_db', 'SELECT * FROM cars LIMIT 3');

Agree with that. cc @chenzl25. do we have plan to simplify the syntax of the TVF with connection?

chenzl25 · 2024-09-27T09:42:01Z

After the connection is supported, in my mind connection can be used in TVF directly like:

read_parquet(s3_connection, 's3://bucket/path/xxxx.parquet')
read_csv(s3_connection, 's3://bucket/path/xxxx.parquet')
read_json(s3_connection, 's3://bucket/path/xxxx.parquet')
iceberg_scan(iceberg_connection, 'database_name.table_name')
postgres_query(pg_connection, 'select * from t')
mysql_quert(my_connection, 'select * from t')

Connections contain the necessary information to allow TVF to query the external system.
I think @tabVersion will support Connection in this Q.

github-actions bot added this to the release-2.1 milestone Sep 18, 2024

st1page assigned kwannoel Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking: Ad-hoc(batch) ingestion #18583

Tracking: Ad-hoc(batch) ingestion #18583

st1page commented Sep 18, 2024 •

edited by kwannoel

Loading

kwannoel commented Sep 24, 2024

xxchan commented Sep 27, 2024

st1page commented Sep 27, 2024

xxchan commented Sep 27, 2024

st1page commented Sep 27, 2024

st1page commented Sep 27, 2024

chenzl25 commented Sep 27, 2024 •

edited

Loading

Tracking: Ad-hoc(batch) ingestion #18583

Tracking: Ad-hoc(batch) ingestion #18583

Comments

st1page commented Sep 18, 2024 • edited by kwannoel Loading

Streaming storage

lake

file source(object store)

Database

kwannoel commented Sep 24, 2024

xxchan commented Sep 27, 2024

st1page commented Sep 27, 2024

xxchan commented Sep 27, 2024

st1page commented Sep 27, 2024

st1page commented Sep 27, 2024

chenzl25 commented Sep 27, 2024 • edited Loading

st1page commented Sep 18, 2024 •

edited by kwannoel

Loading

chenzl25 commented Sep 27, 2024 •

edited

Loading