-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(datasets): Created table_args to pass to create_table
, create_view
, and table
methods
#909
base: main
Are you sure you want to change the base?
feat(datasets): Created table_args to pass to create_table
, create_view
, and table
methods
#909
Conversation
create_table
, create_view
, and table
methodscreate_table
, create_view
, and table
methods
create_table
, create_view
, and table
methodscreate_table
, create_view
, and table
methods
…o avoid breaking changes Signed-off-by: Mark Druffel <[email protected]>
Signed-off-by: Mark Druffel <[email protected]>
47331ff
to
ef3712e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leaving initial comments; happy to review later once it's ready.
|
||
def save(self, data: ir.Table) -> None: | ||
if self._table_name is None: | ||
raise DatasetError("Must provide `table_name` for materialization.") | ||
|
||
writer = getattr(self.connection, f"create_{self._materialized}") | ||
writer(self._table_name, data, **self._save_args) | ||
writer(self._table_name, data, **self._table_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this right? I think the table args should only apply to the table
call, but haven't looked into it deeply before commenting now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deepyaman Sorry this is a little confusing so just adding a bit more context.
This PR
The table
method takes the database
argument, butcreate_table
& create_view
methods both take the database
and overwrite
arguments. The overwrite
argument is already in save_args
, but I'm assuming save_args
will be removed from TableDataset
in version 6. To avoid breaking changes, but also minimize change between this release and version 6 I just added the new parameters (database
) to table_args
and left the old parameters alone. is already in the save_args
they both also have overwrite
which is already in _save_args
.
To avoid breaking changes but still allow create_table
and create_view
arguments to flow through, I combined _save_args
and _table_args
here.
Version 6
I am assuming that save_args
& load_args
will be dropped from TableDataset
in version 6. In that change, I'd assume the arguments still used from load_args
and save_args
would be added to table_args
. To make TableDataset and FileDataset look / feel similar, we could consider just making a commensurate file_args
. I've not used 5.1 enough yet to say with certainty, but I can't think of a reason a user would want different values in load_args
than save_args
now that it's split from TableDataset (i.e. the filepath
, file_type
, sep
, etc. would be same for load and save)? I may be totally overlooking some things though 🤷♂️
bronze_tracks:
type: ibis.FileDataset # use `to_<file_format>` (write) & `read_<file_format>` (read)
connection:
backend: pyspark
file_args:
filepath: hf://datasets/maharshipandya/spotify-tracks-dataset/dataset.csv
file_format: csv
materialized: view
overwrite: True
table_name: tracks #`to_<file_format>` in ibis has no database parameter so there's no ability to write to a specific catalog / db schema atm, `to_<file_format>` just writes to w/e is active
sep: ","
silver_tracks:
type: ibis.TableDataset # would use `create_<materialized>` (write) & `table` (read)
connection:
backend: pyspark
table_args:
name: tracks
database: spotify.silver
overwrite: True
create_table
, create_view
, and table
methodscreate_table
, create_view
, and table
methods
Signed-off-by: Mark Druffel <[email protected]>
Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: Mark Druffel <[email protected]>
Signed-off-by: Mark Druffel <[email protected]>
…ark-druffel/kedro-plugins into fix/datasets/ibis-TableDataset
@deepyaman I changed this to ready for review, but I'm failing a bunch of steps. I tried to follow the guidelines, but when I run the Aside from the failing checks, I tested this version of table_dataset.py on a duckdb pipeline, a pyspark pipeline, and a pyspark pipeline on databricks and it seems to be working. My only open question relates to my musing above about the expected format of |
@jakepenzak For visibility |
Signed-off-by: Mark Druffel <[email protected]>
Sorry, I saw this yesterday and started drafting an apology. 🙈
I will review it later today. 🤞
…On Wed, Nov 13, 2024, 6:16 AM Merel Theisen ***@***.***> wrote:
@merelcht <https://github.com/merelcht> requested your review on: #909
<#909> feat(datasets):
Created table_args to pass to create_table, create_view, and table
methods.
—
Reply to this email directly, view it on GitHub
<#909 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADK3W3SOIHESNW3FEMOTGED2ANGKTAVCNFSM6AAAAABQUDWM3CVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJVGI4DGMBQGQYTQMQ>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
No worries @deepyaman, really appreciate your help! Let me know what I can do to support, just trying to make sure the yaml changes I'm introducing make sense and figure out how to get through the PR process :) Regarding my issues with For testing, unfortunately I don't think the tests will work on my personal machine because I'm on an old processor that doesn't support |
Description
Development notes
Checklist
RELEASE.md
file