Overhauling retry machinery #10

erewok · 2024-12-12T20:15:55Z

One of the elements of this library that we haven't evaluated closely enough has been the retry machinery.

Boilermaker allows specifying a default retry policy when registering a task, but how does that default policy get used? Does it get used? If we raise a RetryException with a different policy than the task was registered with, does the different policy get used?

Current problems are:

If we raise a RetryException with a different policy than the task was registered with, this new policy will get ignored in favor of the existing default on the task.
We can't publish a task with a different policy than it was registered with.
The apply_async method has a confusing retries kwarg: this has nothing to do with the RetryPolicy on the task.

This library is confused and confusing in this area, so this PR is attempting to sort this out.

There are three scenarios now where retry policies can be set and respected by the app. The following is excerpted from the Updated Readme (see files changed):

Retries

Retries are configured in Boilermaker using a RetryPolicy...

Retries are scheduled only by raising a RetryException...

Note: all tasks must be registered with a RetryPolicy, but Boilermaker will only retry tasks that have raised a RetryException.

If you do not want your tasks to be retried:

Do not throw a RetryException from inside of your task.

Register them with a NoRetry policy: worker.register_async(..., policy=retries.NoRetry())

NoRetry() is simply a special policy where max_tries=1.

Note: on_failure callbacks will only be run after all retries have been exhausted.

There are three places where retries for a task can be configured:

When initially registering a task (the task default).

When scheduling a task (override the default).

When raising a RetryException in order to trigger a retry (also overrides the default).

Potential Sources of Confusion in the Future

It's possible that a task could continually set a different max_tries each time it raises, which could continually raise the bar for max_tries. (It should always respect and preserve the attempts.)

Additional Changes

Modify apply_async(..., retries=3...) kwarg to be apply_async(..., publish_retries=3...) for multiple publish attempts.

…ce of bugs. Someone should have to override.

tests/test_app.py

boilermaker/app.py

boilermaker/retries.py

rminderhoud · 2024-12-13T00:20:03Z

boilermaker/app.py

+            if retry.policy and retry.policy != task.policy:
+                # This will publish the *next* instance of the task using *this* policy
+                task.policy = retry.policy
+                logger.warning(f"Task policy updated to retry policy {retry.policy}")


This is surprising usage. I can't imagine a scenario where in which a variable retry like this would be necessary. Was there a use case that motivated this? This feels like a footgun waiting to happen but I might be missing something.

This is specifically for scenarios like this:

async def a_background_task(state, param: str): # In some scenarios, from within the function I want a different policy if param == "throwing": raise retries.RetryExceptionDefaultExponential("We are thrown", max_tries=42) return "OK"

The above function will use the policy associated with the exception thrown and not the one it was registered with. I had originally intended this behavior and in one project I'm trying to throw with custom retries, but before this PR there's no way to respect that (it doesn't actually work in my existing project 😞).

One other note: this assignment doesn't change the value in the task registry. It changes only this published task to have a different policy, which should carry throw for all future retry-invocations for this task, unless someone does something weird with dynamically shifting the policy within the same handler on each raised RetryException.

I'm trying to wrap my head around when one would want a different retry policy than the retry policy in the original task. Do you have a concrete example where you needed it? That might help me understand this scenario better. In my mind it still feels like it would be more sensible to spawn a new task with the new retry policy rather than dynamically change it from within the existing task?

async def my_task_with_policy_1(state, param: str): result = do_normal_thing() if result.retry_state: raise RetryException() # Just retry according to policy for this task elif result.weird_state: # For some reason here, we need a different retry than original publish_task(new_task_with_new_policy)

Oh, it's possible that there are different types of errors we can hit from within a task:

def some_task(): # Do something sorta reliable, but it fails raise RetryException(...) # Do something highly reliable, but if it's failing in this case, something super weird must have occurred.... raise RetryExceptionOfADifferentKind(...)

The task has knowledge and may wish to adjust how it's responding to conditions around it.

okay, I'm going to merge for now: we can revisit how Retries work in the future. We may want to stop using exceptions for control-flow at that time...

boilermaker/retries.py

…n trying to allow for higher max

github-actions · 2024-12-13T01:35:57Z

Package	Line Rate	Health
.	85%	✔
Summary	85% (307 / 362)	✔

Overhauling retry machinery

411bc67

erewok requested review from rminderhoud, kschneiderman, jpeterson-mf and akeller-mf December 12, 2024 20:15

erewok added 9 commits December 12, 2024 12:24

Use model_validate for safer instantiation of pydantic model RetryPolicy

c5e14e6

Add retry-testing task

515f99c

sample sample

3e3bbf4

Try for cleaner messaging when retry is published

e6a96ca

Update for sample retry task

7a7ccdb

Reformat

614279a

Still tweaking the publishing retry message

b0e56e2

More clarity around retries in the README

f1dd7fe

We should publish once by default; this is a confusing potential sour…

b9e3bca

…ce of bugs. Someone should have to override.

jpeterson-mf reviewed Dec 12, 2024

View reviewed changes

tests/test_app.py Outdated Show resolved Hide resolved

Fix typo in test comment

0fe0280

jpeterson-mf reviewed Dec 12, 2024

View reviewed changes

boilermaker/app.py Outdated Show resolved Hide resolved

Amend message around publishing retries

4ca6c55

jpeterson-mf reviewed Dec 12, 2024

View reviewed changes

boilermaker/retries.py Show resolved Hide resolved

jpeterson-mf approved these changes Dec 12, 2024

View reviewed changes

rminderhoud reviewed Dec 13, 2024

View reviewed changes

erewok added 2 commits December 12, 2024 17:24

Typos

8506a5b

Fix the default max-tries; accidentally set the default very high whe…

ad66cbc

…n trying to allow for higher max

jpeterson-mf approved these changes Dec 13, 2024

View reviewed changes

rminderhoud approved these changes Dec 13, 2024

View reviewed changes

erewok merged commit 1ca7b62 into main Dec 13, 2024
4 checks passed

erewok deleted the cleaner-retries-on-publish branch December 13, 2024 18:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhauling retry machinery #10

Overhauling retry machinery #10

erewok commented Dec 12, 2024 •

edited

Loading

rminderhoud Dec 13, 2024

erewok Dec 13, 2024 •

edited

Loading

erewok Dec 13, 2024 •

edited

Loading

rminderhoud Dec 13, 2024

erewok Dec 13, 2024

erewok Dec 13, 2024

github-actions bot commented Dec 13, 2024

Overhauling retry machinery #10

Overhauling retry machinery #10

Conversation

erewok commented Dec 12, 2024 • edited Loading

Retries

Potential Sources of Confusion in the Future

Additional Changes

rminderhoud Dec 13, 2024

Choose a reason for hiding this comment

erewok Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

erewok Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

rminderhoud Dec 13, 2024

Choose a reason for hiding this comment

erewok Dec 13, 2024

Choose a reason for hiding this comment

erewok Dec 13, 2024

Choose a reason for hiding this comment

github-actions bot commented Dec 13, 2024

erewok commented Dec 12, 2024 •

edited

Loading

erewok Dec 13, 2024 •

edited

Loading

erewok Dec 13, 2024 •

edited

Loading