Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

databricks workflow failing with 'too many 503 error responses' #892

Open
mkjain1982 opened this issue Dec 27, 2024 · 3 comments
Open

databricks workflow failing with 'too many 503 error responses' #892

mkjain1982 opened this issue Dec 27, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@mkjain1982
Copy link

Describe the bug

A workflow dbt job terminates with an error Max retries exceeded with url: ... (Caused by ResponseError('too many 503 error responses')) just when it starts sending SQL commands to the cluster.
No changes has been made to the code or yml files.
This occurs only with SQL Warehouse, not with SQL Warehouse Serverless

Steps To Reproduce

The problem occurs in a workflow in a databricks workspace with the following settings
Running on Azure, Databricks Premium, not Unity Catalog
Job cluster single node Standard_DS3_v2
Work cluster SQL Warehouse Pro X-Small, Cluster count: Active 0 Min 1 Max 1, Channel Current, Cost optimized
git source Azure Devops
Settings for library version dbt-databricks>=1.0.0,<2.0.0

Start the workflow. After the job cluster has been created and the SQL Warehouse has been started an error is shown in the log:

  • dbt deps --profiles-dir ../misc/misc/ -t prod
    10:14:06 Running with dbt=1.9.1
    10:14:07 Updating lock file in file path: /tmp/tmp-dbt-run-126728701395951/piab/dbt/package-lock.yml
    10:14:07 Installing calogica/dbt_expectations
    10:14:07 Installed from version 0.10.4
    10:14:07 Up to date!
    10:14:07 Installing dbt-labs/dbt_utils
    10:14:08 Installed from version 1.1.1
    10:14:08 Updated version available: 1.3.0
    10:14:08 Installing calogica/dbt_date
    10:14:08 Installed from version 0.10.1
    10:14:08 Up to date!
    10:14:08
    10:14:08 Updates available for packages: ['dbt-labs/dbt_utils']
    Update your versions in packages.yml, then run dbt deps

  • dbt build --profiles-dir ../misc/misc/ -t prod -f
    10:14:10 Running with dbt=1.9.1
    10:14:11 Registered adapter: databricks=1.9.1
    10:14:12 Unable to do partial parsing because saved manifest not found. Starting full parse.

  • dbt build --profiles-dir ../misc/misc/ -t prod -f
    10:14:10 Running with dbt=1.9.1
    10:14:11 Registered adapter: databricks=1.9.1
    10:14:12 Unable to do partial parsing because saved manifest not found. Starting full parse.
    10:14:31 Found 435 models, 103 snapshots, 1 analysis, 8 seeds, 1559 data tests, 123 sources, 8 exposures, 999 macros
    10:14:32
    10:14:32 Concurrency: 12 threads (target='prod')
    10:14:32
    10:14:58
    10:14:58 Finished running in 0 hours 0 minutes and 26.49 seconds (26.49s).
    10:14:58 Encountered an error:
    Database Error
    HTTPSConnectionPool(host='adb-130132662866554.14.azuredatabricks.net', port=443): Max retries exceeded with url: /sql/1.0/warehouses/660a2880f1cab4fb (Caused by ResponseError('too many 503 error responses'))

Expected behavior

dbt-databricks workflow start with any error as shown below

  • dbt deps --profiles-dir ../misc/misc/ -t prod
    10:20:37 Running with dbt=1.9.1
    10:20:37 Updating lock file in file path: /tmp/tmp-dbt-run-355636934123336/piab/dbt/package-lock.yml
    10:20:37 Installing calogica/dbt_expectations
    10:20:38 Installed from version 0.10.4
    10:20:38 Up to date!
    10:20:38 Installing dbt-labs/dbt_utils
    10:20:38 Installed from version 1.1.1
    10:20:38 Updated version available: 1.3.0
    10:20:38 Installing calogica/dbt_date
    10:20:38 Installed from version 0.10.1
    10:20:38 Up to date!
    10:20:38
    10:20:38 Updates available for packages: ['dbt-labs/dbt_utils']
    Update your versions in packages.yml, then run dbt deps

  • dbt build --profiles-dir ../misc/misc/ -t prod -f
    10:20:41 Running with dbt=1.9.1
    10:20:42 Registered adapter: databricks=1.9.1
    10:20:42 Unable to do partial parsing because saved manifest not found. Starting full parse.
    10:21:01 Found 435 models, 103 snapshots, 1 analysis, 8 seeds, 1559 data tests, 123 sources, 8 exposures, 999 macros
    10:21:02
    10:21:02 Concurrency: 12 threads (target='prod')
    10:21:02
    10:21:19 1 of 1986 START sql table model staging.rollup12helper ......................... [RUN]
    10:21:19 2 of 1986 START sql table model staging.rollup24helper ......................... [RUN]

Screenshots and log output

NA

System information

The output of dbt --version:

dbt=1.9.1
Registered adapter: databricks=1.9.1

image

The operating system you're using:
NA
The output of python --version:
NA

Additional context

Add any other context about the problem here.

@mkjain1982 mkjain1982 added the bug Something isn't working label Dec 27, 2024
@KristoRSparkle
Copy link

We have the same issue with dbt-databricks==1.9.1
Downgraded to 1.8.7 and that works.

@mkjain1982
Copy link
Author

It was working fine till last week.. suddenly we started getting this error. Temporarily we have changed the SQL Warehouse to Serverless instead of Pro. and its working. However I want to know the root cause of the issue.

@spenaustin
Copy link

My team has been noticing this too. Here's what we found:

This only occurs when our SQL Warehouse is in the "Stopped" status.

This is extremely similar to #570 , which was solved by Pull Request #578 , which pinned the databricks-sql-connector package back to an older version. I think this is likely to be a similar problem: databricks-sql-connector just received an upgrade to version 3.7 on December 23rd, and we started seeing this issue on December 24th.

Looking into it further, it looks like version 3.7 altered the library's retry backoff behavior, which was also the issue in #570. Pinning our version of databricks-sql-connector to version 3.6 seems to have solved the problem for us, but leaving it unspecified will let pip default to installing the newest version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants