Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] support inline session python submission method #1062

Open
3 tasks done
dkruh1 opened this issue Jul 17, 2024 · 1 comment
Open
3 tasks done

[Feature] support inline session python submission method #1062

dkruh1 opened this issue Jul 17, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@dkruh1
Copy link

dkruh1 commented Jul 17, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-spark functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Summary:
Introduce an option to run Python models within an existing session, similar to the session option available for SQL models.

Description:
Currently, users must choose between an all-purpose cluster or a job cluster to run Python models (see docs). This requirement limits the ability to execute dbt models inline within an existing notebook, forcing model execution to be triggered outside of Databricks.

In contrast, SQL models in dbt can leverage the session connection method, allowing them to be executed as part of an existing session. This separation of model logic from job cluster definitions enables orchestration systems to define clusters based on different considerations.

Request:
We propose introducing a similar session option for Python models. This feature would allow users to submit Python models to be executed within a given session, thereby decoupling model definitions from job cluster specifications.

Describe alternatives you've considered

For job clusters, there isn't a viable alternative that leverages the same Databricks API and costs. A possible, but problematic, option is to create an all-purpose cluster, provide the model with its cluster ID, and destroy the cluster after use. However, this approach is significantly more expensive (due to the cost difference between all-purpose clusters and job clusters) and disrupts the existing architecture that uses the session method to execute models within a job cluster.

Who will this benefit?

All dbt users currently leveraging the session method and considering adopting dbt Python models will benefit from this feature. Additionally, users who use third-party tools to define job cluster specifications based on AI or other methods will be able to decouple model logic from cluster spec configuration, allowing for greater flexibility and efficiency.

Are you interested in contributing this feature?

yes - I'm preparing a pull request

Anything else?

No response

@amychen1776
Copy link
Contributor

@dkruh1 are you using the adapter with Databricks? If so, is there a reason why you're not using the dbt-databricks adapter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants