You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In ETL pipelines, updating the existing records in data warehouses is a critical requirement. Currently, the ibis.TableDataset connector in Kedro does not support Upsert() into Ibis backends like Postgres in our case. After discussions in the community , I found that ibis doesn't offers Upsert() natively for any of the backends .
Context
Why is this change important to me?
We are developing ETL pipelines in our organization, and updating existing records in sql backend like Postgres is an essential requirement. At present, without support for upsert(), we must bypass the Kedro DataCatalog and rely on external ORM tools to handle native data storage operations, such as SQLAlchemy , dataset etc .
How would I use it?
Supporting data upsert() in ibis.TableDataset would allow us to maintain a clean and consistent pipeline, avoiding the need for custom load operations within nodes. This would simplify the workflow and allow Kedro to manage the complete I/O process.
How can it benefit other users?
By enabling this feature, users could avoid writing custom loading logic for update operations, thereby keeping their pipelines cleaner and more efficient. This would enhance Kedro's usability in scenarios where heavy I/O operations are involved, particularly for teams working with data warehouses or similar storage backends.
The text was updated successfully, but these errors were encountered:
Makes sense! Cleanly supporting this will definitely require ibis-project/ibis#5391 (will see if this can be prioritized on the Ibis side in the coming quarter, as there has definitely been a number of requests for this functionality now); otherwise, will need to do something hacky with raw_sql in the interim.
If somebody needs this functionality before Ibis implements it, we can probably find a way to hack it in with raw_sql, as mentioned above, or at least share instructions for creating a custom version of this dataset to do so.
Description
In ETL pipelines, updating the existing records in data warehouses is a critical requirement. Currently, the
ibis.TableDataset
connector in Kedro does not supportUpsert
() into Ibis backends likePostgres
in our case. After discussions in the community , I found that ibis doesn't offersUpsert()
natively for any of the backends .Context
Why is this change important to me?
We are developing ETL pipelines in our organization, and updating existing records in sql backend like
Postgres
is an essential requirement. At present, without support forupsert()
, we must bypass the KedroDataCatalog
and rely on external ORM tools to handle native data storage operations, such asSQLAlchemy
,dataset
etc .How would I use it?
Supporting data
upsert()
inibis.TableDataset
would allow us to maintain a clean and consistent pipeline, avoiding the need for custom load operations within nodes. This would simplify the workflow and allow Kedro to manage the complete I/O process.How can it benefit other users?
By enabling this feature, users could avoid writing custom loading logic for update operations, thereby keeping their pipelines cleaner and more efficient. This would enhance Kedro's usability in scenarios where heavy I/O operations are involved, particularly for teams working with data warehouses or similar storage backends.
The text was updated successfully, but these errors were encountered: