The alter data step of the new image data refresh process could not finish #5290
Labels
💻 aspect: code
Concerns the software code in the repository
🛠 goal: fix
Bug fix
🟧 priority: high
Stalls work on the project or its dependents
python
Pull requests that update Python code
🧱 stack: catalog
Related to the catalog and Airflow DAGs
🔧 tech: airflow
Involves Apache Airflow
Description
The
alter_data_batch
of the newstaging_image_data_refresh
DAG causes the Airflow instance to crash. #5145 attempted to fix it by increasing the size of batches, creating fewer tasks to expand, but the result didn't change. This process consumes a lot of the Airflow instance memory, even with the number of active tasks restricted to 2 (#5125), so it needs to be optimized. An alternative is to convert it to an iterative task, similar to how thebatched_update
operates.We know the rest of the steps work, given that the
staging_audio_data_refresh
DAG ran successfully, but the alter process is exclusive to the image table.Reproduction
Take special care and monitor closely when testing this DAG.
staging_image_data_refresh
and trigger itAdditional context
Part of #3925.
The text was updated successfully, but these errors were encountered: