Replies: 7 comments
-
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
Beta Was this translation helpful? Give feedback.
-
@pvieito -
|
Beta Was this translation helpful? Give feedback.
-
Hi @kunaljubce!
with DAG(
dag_id=dag_id,
schedule=dag_schedule,
start_date=pendulum.parse(dag_schedule_start_date),
tags=dag_deployment_tags,
catchup=False,
) as dag:
batch_job = BatchOperator(
task_id="batch_job",
job_name=job_name,
job_definition=job_definition_arn,
job_queue=job_queue_name,
tags=job_tags,
aws_conn_id=job_aws_connection_id,
container_overrides=dict(
environment=aws_batch_job_environment_variables,
))
log_batch_job() >> batch_job
|
Beta Was this translation helpful? Give feedback.
-
@pvieito - Thanks for these, nothing apparent seems to jump out from these logs. Would it be possible for you to look at all logs for triggerer, scheduler etc.? Maybe try clearing the tasks and then follow the system logs to see if it prints something referring to your deferred tasks. If you can add those logs below, we can try looking for a possible issue because given your information that nothing changed on the airflow platform side, there doesn't seem much for us to go on. cc: @andrewgodwin I see a similar issue reported long back (#25630), hence tagging you here. |
Beta Was this translation helpful? Give feedback.
-
Hi @kunaljubce! I attach the full logs from the tasks, workers & scheduler from
|
Beta Was this translation helpful? Give feedback.
-
@pvieito I looked through the logs and it does not seem to be giving us an awful lot in terms of an actual issue. I would probably have to point you back to your platform/devops team who maintains your Airflow environment to check if this is memory related? Because from the logs, I can see some of the tasks being submitted, executed, and completed. Only after a period of time does your triggerer start behaving weirdly and starts indefinitely deferring the tasks. |
Beta Was this translation helpful? Give feedback.
-
This is difficult to reproduce and debug. There are optimizations related to triggers in Airflow 2.10.3, so try upgrading and check if any issues persist. Moving this to discussions. |
Beta Was this translation helpful? Give feedback.
-
Apache Airflow version
2.10.1
If "Other Airflow 2 version" selected, which one?
No response
What happened?
We are trying to migrate all our DAGs to use the deferreable operators. I set it as the default in out MWAA configuration:
This seemed to work properly for a week, but today all jobs started being stuck in the DEFERRED status. For example:
Typical / expected run:
All runs since today at around 10:00 UTC:
Without the expected logs from
waiter_with_logging
etc.The tasks seem to be indefinetly stuck at DEFERRED.
What you think should happen instead?
No response
How to reproduce
Use Airflow 2.10 with deferrable operators.
Operating System
MWAA
Versions of Apache Airflow Providers
No response
Deployment
Amazon (AWS) MWAA
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions