Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ContinuousTimetable false triggering when last run ends in future #45175

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions airflow/timetables/simple.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,14 +137,18 @@ def next_dagrun_info(
) -> DagRunInfo | None:
if restriction.earliest is None: # No start date, won't run.
return None

current_time = timezone.coerce_datetime(timezone.utcnow())

if last_automated_data_interval is not None: # has already run once
start = last_automated_data_interval.end
end = timezone.coerce_datetime(timezone.utcnow())
end = current_time

if start > end: # Skip scheduling if the last run ended in the future
return None
Comment on lines 144 to +148
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if last_automated_data_interval.end > current_time:
    return None
start = ...
end = ...

Exactly the same logic, but more readable IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is returning None the right thing to do here? It would make this DAG not run at all in the future, even after the previous run is no longer in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that return None prevents the DAG from running anymore. So, how about applying start and end as shown below when the start_date is in the future(instead of None)?

        if last_automated_data_interval is not None:  # has already run once
            if last_automated_data_interval.end > current_time:  # start date is future
                start = restriction.earliest
                elapsed = last_automated_data_interval.end - last_automated_data_interval.start # elapsed already run

                end = start + elapsed.as_timedelta()
            else:
                start = last_automated_data_interval.end
                end = current_time

start is set to restriction.earliest, and end is calculated by adding the previous execution's time difference to start. This way, even if it's set in the future, expect the DAG to run. Could this approach cause any problems? (Also, since date calculations are involved, would it be better to put it in a separate function?)

else: # first run
start = restriction.earliest
end = max(
restriction.earliest, timezone.coerce_datetime(timezone.utcnow())
) # won't run any earlier than start_date
end = max(restriction.earliest, current_time)

if restriction.latest is not None and end > restriction.latest:
return None
Expand Down
12 changes: 12 additions & 0 deletions tests/timetables/test_continuous_timetable.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,3 +89,15 @@ def test_no_runs_after_end_date(timetable, restriction):
)

assert next_info is None


@time_machine.travel(DURING_DATE)
def test_no_false_triggering_with_future_start_date_after_run(timetable, restriction):
FUTURE_DATE = DURING_DATE.add(days=1)

next_info = timetable.next_dagrun_info(
last_automated_data_interval=DataInterval(START_DATE, FUTURE_DATE),
restriction=restriction,
)

assert next_info is None