Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New dataset databricks.ExternalTableDataset #349

Closed
KrzysztofDoboszInpost opened this issue Sep 26, 2023 · 4 comments
Closed

New dataset databricks.ExternalTableDataset #349

KrzysztofDoboszInpost opened this issue Sep 26, 2023 · 4 comments
Labels
Community Issue/PR opened by the open-source community

Comments

@KrzysztofDoboszInpost
Copy link

Description

Already existing dataset databricks.ManagedTableDataset doesn't allow to specify the location of the stored files, which in some setups is crucial. There's already PR #251 for it, but it seems to be stale.

Context

I develop a number of kedro projects that are deployed to Databricks. Having a single dataset that handles both pandas and spark DFs, and can write into (and read from) DBX database would be a lifesaver, as long as I could specify the path.

Possible Implementation

In spark, it suffices to add path option to make table external. I'm not sure if it would be as simple here though.

Possible Alternatives

Adding an argument to ManagedTableDataset is also an option, but then the table wouldn't really be Managed - it might cause some confusion

@astrojuanlu
Copy link
Member

Hi @KrzysztofDoboszInpost, thanks for opening this issue. Do you want to take over #251? Checking out the branch and opening a new PR should suffice.

@KrzysztofDoboszInpost
Copy link
Author

Sure, as soon as I'll be able to :)
In the meantime: would you rather create a separate ExternalTableDataset, with a lot of common code with ManagedTableDataset (possibly inherited?), or just add an option to set path (like in current PR) and risk a little confusion among Databricks users?

@astrojuanlu
Copy link
Member

This deserves some investigation indeed :) Let's continue the discussion here until we're clear on the path forward. I'll add this to our backlog.

@astrojuanlu astrojuanlu added the Community Issue/PR opened by the open-source community label Sep 29, 2023
@merelcht
Copy link
Member

merelcht commented Nov 1, 2024

An experimental databricks.ExternalTableDataset was added in #885

@merelcht merelcht closed this as completed Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community
Projects
None yet
Development

No branches or pull requests

3 participants