Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web UI task logging with CloudWatch has a 60 second time delay #45554

Open
2 tasks done
walter9388 opened this issue Jan 10, 2025 · 1 comment
Open
2 tasks done

Web UI task logging with CloudWatch has a 60 second time delay #45554

walter9388 opened this issue Jan 10, 2025 · 1 comment
Labels
area:core area:logging area:UI Related to UI/UX. For Frontend Developers. kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet provider:amazon AWS/Amazon - related issues

Comments

@walter9388
Copy link

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.10.1

What happened?

When using CloudWatch logging there seems to be a 60 second time delay between the logging output updating. Please see the video below and observe:

  1. Initially Airflow can't find the remote logs (as there are none).
  2. Airflow detects local logs.
  3. Nothing happens in the UI for 60 seconds.
  4. After 60 seconds logging appears, and the top of the printout states it is from CloudWatch logs.
  5. It then periodically updates the logs every 60 seconds after until the task is completed.

Please skip ahead in the video as most of it is static!

Recording.2025-01-10.134419.mp4

As a second minor point, you can also see that grouping now longer works with the logs read from CloudWatch. However, this doesn't concern me as much.

What you think should happen instead?

I'm not sure if I have configured something incorrectly, but I expected the same behaviour as local logging, i.e. tailing of the log file.

I struggled to find the default behaviour documented, but what I expect to happen was that Airflow would use the local logs if they were available and only use the remote logs if no local logs were found.
I found this logic in previous documentation (<2.0), although this may now be outdated:

In the Airflow Web UI, remote logs take precedence over local logs when remote logging is enabled. If remote logs can not be found or accessed, local logs will be displayed. Note that logs are only sent to remote storage once a task is complete (including failure); In other words, remote logs for running tasks are unavailable (but local logs are available).

Can you confirm that this is the expected behaviour and what is in the video above is a bug?

Alternatively, I can see in the browser that a request is made every second to update the logs, and I can confirm that the logs are only being written to CloudWatch every 60 seconds or when the task is complete.
Is this expected behaviour? or should logs be written to cloudwatch at a higher rate?

If the behaviour in the video is actually what is expected, I would like to suggest one of the following options as we need a <60 second refresh window in our logging setup:

  1. A configuration variable to use local logging first if available (e.g. local_logging_prefer = True).
  2. A configuration variable for the update frequency of the logging when using remote logging (e.g. remote_logging_refresh_period = 60).

Let me know your thoughts.

How to reproduce

The remote logging config was copied from here:

[logging]
# Airflow can store logs remotely in AWS Cloudwatch. Users must supply a log group
# ARN (starting with 'cloudwatch://...') and an Airflow connection
# id that provides write and read access to the log location.
remote_logging = True
remote_base_log_folder = cloudwatch://arn:aws:logs:<region name>:<account id>:log-group:<group name>
remote_log_conn_id = MyCloudwatchConn

The demo DAG used in the video above prints to logging every 10 seconds and is as follows:

import logging
from datetime import datetime
from time import sleep

from airflow.models import DAG
from airflow.operators.python import task

with DAG(
    dag_id="dev__cloudwatch_logging_testing",
    start_date=datetime(2024, 1, 1),
    schedule=None,
):

    @task
    def task1():
        sleeptime = 10
        for i in range(0, 300, sleeptime):
            logging.info(i)
            sleep(sleeptime)

    task1()

Operating System

NAME="Ubuntu" VERSION="20.04.6 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.6 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal

Versions of Apache Airflow Providers

apache-airflow==2.10.1
apache-airflow-providers-amazon==8.28.0
apache-airflow-providers-celery==3.8.1
apache-airflow-providers-cncf-kubernetes==8.4.1
apache-airflow-providers-common-compat==1.2.0
apache-airflow-providers-common-io==1.4.0
apache-airflow-providers-common-sql==1.16.0
apache-airflow-providers-docker==3.13.0
apache-airflow-providers-elasticsearch==5.5.0
apache-airflow-providers-fab==1.3.0
apache-airflow-providers-ftp==3.11.0
apache-airflow-providers-google==10.22.0
apache-airflow-providers-grpc==3.6.0
apache-airflow-providers-hashicorp==3.8.0
apache-airflow-providers-http==4.13.0
apache-airflow-providers-imap==3.7.0
apache-airflow-providers-microsoft-azure==10.4.0
apache-airflow-providers-mysql==5.7.0
apache-airflow-providers-odbc==4.7.0
apache-airflow-providers-openlineage==1.11.0
apache-airflow-providers-postgres==5.12.0
apache-airflow-providers-redis==3.8.0
apache-airflow-providers-sendgrid==3.6.0
apache-airflow-providers-sftp==4.11.0
apache-airflow-providers-slack==8.9.0
apache-airflow-providers-smtp==1.8.0
apache-airflow-providers-snowflake==5.7.0
apache-airflow-providers-sqlite==3.9.0
apache-airflow-providers-ssh==3.13.1

Deployment

Other Docker-based deployment

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@walter9388 walter9388 added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Jan 10, 2025
Copy link

boring-cyborg bot commented Jan 10, 2025

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@dosubot dosubot bot added area:logging area:UI Related to UI/UX. For Frontend Developers. provider:amazon AWS/Amazon - related issues labels Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core area:logging area:UI Related to UI/UX. For Frontend Developers. kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet provider:amazon AWS/Amazon - related issues
Projects
None yet
Development

No branches or pull requests

1 participant