[Feature] support inline session python submission method #1062

dkruh1 · 2024-07-17T08:47:36Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt-spark functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Summary:
Introduce an option to run Python models within an existing session, similar to the session option available for SQL models.

Description:
Currently, users must choose between an all-purpose cluster or a job cluster to run Python models (see docs). This requirement limits the ability to execute dbt models inline within an existing notebook, forcing model execution to be triggered outside of Databricks.

In contrast, SQL models in dbt can leverage the session connection method, allowing them to be executed as part of an existing session. This separation of model logic from job cluster definitions enables orchestration systems to define clusters based on different considerations.

Request:
We propose introducing a similar session option for Python models. This feature would allow users to submit Python models to be executed within a given session, thereby decoupling model definitions from job cluster specifications.

Describe alternatives you've considered

For job clusters, there isn't a viable alternative that leverages the same Databricks API and costs. A possible, but problematic, option is to create an all-purpose cluster, provide the model with its cluster ID, and destroy the cluster after use. However, this approach is significantly more expensive (due to the cost difference between all-purpose clusters and job clusters) and disrupts the existing architecture that uses the session method to execute models within a job cluster.

Who will this benefit?

All dbt users currently leveraging the session method and considering adopting dbt Python models will benefit from this feature. Additionally, users who use third-party tools to define job cluster specifications based on AI or other methods will be able to decouple model logic from cluster spec configuration, allowing for greater flexibility and efficiency.

Are you interested in contributing this feature?

yes - I'm preparing a pull request

Anything else?

No response

amychen1776 · 2024-08-01T21:04:55Z

@dkruh1 are you using the adapter with Databricks? If so, is there a reason why you're not using the dbt-databricks adapter?

dkruh1 added enhancement New feature or request triage labels Jul 17, 2024

dkruh1 mentioned this issue Jul 17, 2024

add python models session submission method #1063

Open

2 tasks

amychen1776 removed the triage label Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] support inline session python submission method #1062

[Feature] support inline session python submission method #1062

dkruh1 commented Jul 17, 2024

amychen1776 commented Aug 1, 2024

[Feature] support inline session python submission method #1062

[Feature] support inline session python submission method #1062

Comments

dkruh1 commented Jul 17, 2024

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

amychen1776 commented Aug 1, 2024