Investigate and remediate the cause of the Airflow scheduling daemon not starting up with the stack #20522
Labels
Impact: 2-Major
Causes a major interruption of TPW service delivery
Need: 2-Should Have
May be painful to leave out, but the solution is still viable
Service: Dev
Infrastructure and engineering
Type: Bug Report
Something is not right
Type: DevOps
Continuous integration pipeline operations and infrastructure
Workgroup: DTS
Data and Technology Services
Semi-reliably, the scheduling daemon portion of the airflow stack does not come up as expected. This leaves airflow dead in the water until that slice of the stack is manually restarted. Doing so is pretty straight forward, as shown below, but ...
..., until this is resolved, we are not in a position where we can trust airflow to come back up after a CTM automated update or other outage.
This ticket is to track the investigation and remediation of this problem so that we can move forward on being comfortable with automated updates.
@mddilley -- can we bring this to our next refinement meeting? I'd venture to call it a genuine
3
(at least) due to the uncertainty around the investigation portion of the task. The remediation, I expect, will be straight forward once we understand what's going on.The text was updated successfully, but these errors were encountered: