Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate and remediate the cause of the Airflow scheduling daemon not starting up with the stack #20522

Open
frankhereford opened this issue Jan 7, 2025 · 5 comments
Labels
Impact: 2-Major Causes a major interruption of TPW service delivery Need: 2-Should Have May be painful to leave out, but the solution is still viable Service: Dev Infrastructure and engineering Type: Bug Report Something is not right Type: DevOps Continuous integration pipeline operations and infrastructure Workgroup: DTS Data and Technology Services

Comments

@frankhereford
Copy link
Member

frankhereford commented Jan 7, 2025

image

Semi-reliably, the scheduling daemon portion of the airflow stack does not come up as expected. This leaves airflow dead in the water until that slice of the stack is manually restarted. Doing so is pretty straight forward, as shown below, but ...

docker compose -f docker-compose.yaml -f docker-compose-production.yaml restart airflow-scheduler

..., until this is resolved, we are not in a position where we can trust airflow to come back up after a CTM automated update or other outage.

This ticket is to track the investigation and remediation of this problem so that we can move forward on being comfortable with automated updates.

@mddilley -- can we bring this to our next refinement meeting? I'd venture to call it a genuine 3 (at least) due to the uncertainty around the investigation portion of the task. The remediation, I expect, will be straight forward once we understand what's going on.

@frankhereford frankhereford added Workgroup: DTS Data and Technology Services Type: Bug Report Something is not right Impact: 2-Major Causes a major interruption of TPW service delivery Service: Dev Infrastructure and engineering Need: 2-Should Have May be painful to leave out, but the solution is still viable Type: DevOps Continuous integration pipeline operations and infrastructure [Type] 🚨 MISSING E.g. Bug Report, Feature, Enhancement, Map Request, Documentation, Open Data, IT Support, Meeting labels Jan 7, 2025
@frankhereford
Copy link
Member Author

Additionally - I tagged with with the Missing tag. Just because it's airflow, you could make a case that there are about 20 tags in here that are applicable -- but all this to say, I think it'd be really helpful if we had a tag for airflow itself -- like the stack, not the stuff it does.

@chiaberry
Copy link
Member

maybe the label "Type: Integration" ?

@frankhereford
Copy link
Member Author

@chiaberry wrote:

maybe the label "Type: Integration" ?

Yea ... I'm digging that; it's a world better than what we have. I guess I'm really wanting is something that represents the Airflow project itself, and not the integrations that it provides. Like, if Type: Integration and Type: DevOps could be mashed into one. Project: Airflow perhaps?

@mddilley
Copy link

mddilley commented Jan 8, 2025

@frankhereford thanks for creating this 🙏, and I added it to the refinement doc yesterday so we won't let it slide by.

I'm open to a new label, but I want to make sure we have a good reason to create a new label since we have almost 300 already and only have 63 open issues that fall under the DevOps label as of today. DevOps has been really useful for me to track Airflow + other infrastructure-related work but I'm happy to propose a new one if we are finding that it doesn't cover what we need as a team.

Also, this is the description for the Type: DevOps label which i think covers a lot of what were talking about in this thread:

Continuous integration pipeline operations and infrastructure

@frankhereford
Copy link
Member Author

frankhereford commented Jan 8, 2025

Thanks @mddilley -- passing on the label sounds reasonable, esp given the clear reasoning and understandable hesitation to add to our bajillion labels. Thanks for giving that some thought. 🙏

@frankhereford frankhereford removed the [Type] 🚨 MISSING E.g. Bug Report, Feature, Enhancement, Map Request, Documentation, Open Data, IT Support, Meeting label Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Impact: 2-Major Causes a major interruption of TPW service delivery Need: 2-Should Have May be painful to leave out, but the solution is still viable Service: Dev Infrastructure and engineering Type: Bug Report Something is not right Type: DevOps Continuous integration pipeline operations and infrastructure Workgroup: DTS Data and Technology Services
Projects
None yet
Development

No branches or pull requests

3 participants