Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhauling retry machinery #10

Merged
merged 14 commits into from
Dec 13, 2024
Merged

Overhauling retry machinery #10

merged 14 commits into from
Dec 13, 2024

Conversation

erewok
Copy link
Member

@erewok erewok commented Dec 12, 2024

One of the elements of this library that we haven't evaluated closely enough has been the retry machinery.

Boilermaker allows specifying a default retry policy when registering a task, but how does that default policy get used? Does it get used? If we raise a RetryException with a different policy than the task was registered with, does the different policy get used?

Current problems are:

  • If we raise a RetryException with a different policy than the task was registered with, this new policy will get ignored in favor of the existing default on the task.
  • We can't publish a task with a different policy than it was registered with.
  • The apply_async method has a confusing retries kwarg: this has nothing to do with the RetryPolicy on the task.

This library is confused and confusing in this area, so this PR is attempting to sort this out.

There are three scenarios now where retry policies can be set and respected by the app. The following is excerpted from the Updated Readme (see files changed):

Retries

Retries are configured in Boilermaker using a RetryPolicy...

Retries are scheduled only by raising a RetryException...

Note: all tasks must be registered with a RetryPolicy, but Boilermaker will only retry tasks that have raised a RetryException.

If you do not want your tasks to be retried:

  • Do not throw a RetryException from inside of your task.
  • Register them with a NoRetry policy: worker.register_async(..., policy=retries.NoRetry())

NoRetry() is simply a special policy where max_tries=1.

Note: on_failure callbacks will only be run after all retries have been exhausted.

There are three places where retries for a task can be configured:

  • When initially registering a task (the task default).
  • When scheduling a task (override the default).
  • When raising a RetryException in order to trigger a retry (also overrides the default).

Potential Sources of Confusion in the Future

It's possible that a task could continually set a different max_tries each time it raises, which could continually raise the bar for max_tries. (It should always respect and preserve the attempts.)

Additional Changes

  • Modify apply_async(..., retries=3...) kwarg to be apply_async(..., publish_retries=3...) for multiple publish attempts.

tests/test_app.py Outdated Show resolved Hide resolved
boilermaker/app.py Outdated Show resolved Hide resolved
Comment on lines +288 to +291
if retry.policy and retry.policy != task.policy:
# This will publish the *next* instance of the task using *this* policy
task.policy = retry.policy
logger.warning(f"Task policy updated to retry policy {retry.policy}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is surprising usage. I can't imagine a scenario where in which a variable retry like this would be necessary. Was there a use case that motivated this? This feels like a footgun waiting to happen but I might be missing something.

Copy link
Member Author

@erewok erewok Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specifically for scenarios like this:

async def a_background_task(state, param: str):
    # In some scenarios, from within the function I want a different policy
    if param == "throwing":
        raise retries.RetryExceptionDefaultExponential("We are thrown", max_tries=42)
    return "OK"

The above function will use the policy associated with the exception thrown and not the one it was registered with. I had originally intended this behavior and in one project I'm trying to throw with custom retries, but before this PR there's no way to respect that (it doesn't actually work in my existing project 😞).

Copy link
Member Author

@erewok erewok Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other note: this assignment doesn't change the value in the task registry. It changes only this published task to have a different policy, which should carry throw for all future retry-invocations for this task, unless someone does something weird with dynamically shifting the policy within the same handler on each raised RetryException.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to wrap my head around when one would want a different retry policy than the retry policy in the original task. Do you have a concrete example where you needed it? That might help me understand this scenario better. In my mind it still feels like it would be more sensible to spawn a new task with the new retry policy rather than dynamically change it from within the existing task?

async def my_task_with_policy_1(state, param: str):
    result = do_normal_thing()
    if result.retry_state:
        raise RetryException() # Just retry according to policy for this task
    elif result.weird_state:
        # For some reason here, we need a different retry than original 
        publish_task(new_task_with_new_policy)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it's possible that there are different types of errors we can hit from within a task:

def some_task():
  # Do something sorta reliable, but it fails
  raise RetryException(...)
  # Do something highly reliable, but if it's failing in this case, something super weird must have occurred....
  raise RetryExceptionOfADifferentKind(...)

The task has knowledge and may wish to adjust how it's responding to conditions around it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I'm going to merge for now: we can revisit how Retries work in the future. We may want to stop using exceptions for control-flow at that time...

boilermaker/retries.py Outdated Show resolved Hide resolved
boilermaker/retries.py Show resolved Hide resolved
boilermaker/retries.py Show resolved Hide resolved
Copy link

Code Coverage

Package Line Rate Health
. 85%
Summary 85% (307 / 362)

@erewok erewok merged commit 1ca7b62 into main Dec 13, 2024
4 checks passed
@erewok erewok deleted the cleaner-retries-on-publish branch December 13, 2024 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants