Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry functionality #3135

Draft
wants to merge 45 commits into
base: series/3.x
Choose a base branch
from
Draft

Retry functionality #3135

wants to merge 45 commits into from

Conversation

SystemFw
Copy link
Contributor

@SystemFw SystemFw commented Sep 4, 2022

Fixes #1459

I had originally started with a direct port of cats-retry, but since then I've changed the design significantly enough to warrant a proper review of the api before I port tests and docs.
Everything is in Retry.scala, here are some of the relevant decisions:

  • The time scheduling part of the problem is mostly the same as cats-retry, except with fewer types.
  • The retry api itself has changed significantly, cats-retry has several combinators, each of which takes several callbacks, whereas I've folded that into the Retry abstraction itself. I've found the deluge a similar combinators in cats-retry to be a bit hard to navigate, plus you can't have default implementations for the callbacks because they are of shape X => F[Y].
  • I've removed the pretty printing of retries, we don't do that anywhere else in cats-effect.
  • I've removed the ability to retry with a predicate on the result of the action, with the rationale that you can use failures and ensure instead.
  • The errors are typed at Throwable (could be given the Gen* treatment though I'm not especially keen).

TODO:

  • Syntax. Should be minimal given the reduction to a single retry combinator. I'm afraid usage with IO would require a syntax import since this lives in std.
  • Pass at the scaladoc.
  • Porting tests.

@armanbilge
Copy link
Member

armanbilge commented Sep 5, 2022

I'm afraid usage with IO would require a syntax import since this lives in std.

The core module depends on std, so syntax imports shouldn't be necessary, we can add methods directly to IO. Unless I misunderstood what you are getting at :)

baseDelay: FiniteDuration
): Retry[F] =
Retry.lift[F] { status =>
val delay = safeMultiply(baseDelay, Math.pow(2, status.retriesSoFar).toLong)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make the base (2) configurable? It could still be the default as many people will want to use it as their base.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can probably add it as a param with a default, yeah

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An overload is better for bincompat 🙏

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a builder? Overloads tend to lead to combinatorial explosion

@SystemFw
Copy link
Contributor Author

SystemFw commented Sep 5, 2022

By the way, I'm explicitly seeking feedback on the interaction between selectError and and/or with timed retry policies. I think in common usage it's fine, but I also think it's possible to build confusing behaviour in certain cases. However, the alternative is writing lots of overloads for retry combinators with callbacks, and forcing the user to bundle timing logic and error selection logic manually

@armanbilge
Copy link
Member

I also think it's possible to build confusing behaviour in certain cases

would you mind showing an example of this?

@SystemFw
Copy link
Contributor Author

SystemFw commented Sep 6, 2022

would you mind showing an example of this?

For example capDelay used to be implemented as _.or(constantDelay(cap)). But if you use that with a policy that works on a specific error, e.g.

exponentialBackoff(1.ms)
  .selectError { 
       case NullPointerException => true
       case _ => false
    }.capDelay(500.ms) // with the old implementation of capDelay

the resulting policy will retry all errors, not just NullPointerException. Another example would be anding policies that select non-overlapping errors, like constantDelay.selectError(..ExA..).and(constantDelay.selectError(...ExB...), which means you never retry (though this one should be quite obvious with the rename from join to and).
All in all I think common usage is fine, since I'd imagine you first write the code that specifies the "scheduling" , and then selectError at the end, and also by following the simple rule of always having non-overlapping error selection for or, and overlapping error selection for and, but I think it's worth getting other people to think about it.

def mapK[G[_]: Monad](f: F ~> G): Retry[G]
}
object Retry {
final case class Status(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case classes tend to be fiddly from a bincompat perspective - could this be package-private and abstracted behind a sealed trait?

Copy link
Member

@djspiewak djspiewak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General thoughts…

I like where this is headed. It does feel slightly odd to me that we have all these combinators on Retry itself, but at the same time the only way to make a stateful decision chain is to do so using Retry.lift (which should be renamed but that's another topic). It feels like the set of combinators is incomplete somehow. Maybe the decision calculus is what's incomplete? I haven't given it too much thought.

@@ -0,0 +1,666 @@
package cats.effect.std
package retry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming OldRetry is only here for comparison purposes?

Comment on lines +104 to +109
def liftF[F[_]: Monad](
nextRetry: Retry.Status => F[Retry.Decision]
): Retry[F] =
Retry((status, _) => nextRetry(status))

def lift[F[_]: Monad](
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like an odd use of the lift nomenclature. I probably would have gone with apply or something like that, though I'm not sure what to do about the non-functor variant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe def byStatus?

/**
* Each delay is twice as long as the previous one. Never give up.
*/
def exponentialBackoff[F[_]: Monad](
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL people actually use exponential backoff without jitter. I had no idea that was even a useful thing to do…

* "Full jitter" backoff algorithm. See
* https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/
*/
def fullJitter[F[_]: Monad](baseDelay: FiniteDuration): Retry[F] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing on my thought from above… It feels like this should be the algorithm that we're encouraging people to reach for first, but the name definitely does not hint in that direction. I had to actually look up the AWS Engineering blog to figure out that this is actually the algorithm that I generally identify with "exponential backoff".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a little odd to me that these are methods rather than some kind of Strategy that encapsulates the (previousBackoff: FiniteDuration) => (nextBackoff: F[FiniteDuration])

If they were structured in that way, we could provide easier guidance by having the strategies be available on object Strategy with def default being this recommended one


val noRetriesYet = Retry.Status(0, Duration.Zero, None)

def retry[F[_]: Temporal, A](r: Retry[F], action: F[A]): F[A] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not a member of Retry?

Comment on lines +13 to +14
def retry(r: Retry[F])(implicit F: Temporal[F]): F[A] =
Retry.retry(r, wrapped)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add some convenience syntax for exponential backoff?

@djspiewak djspiewak modified the milestones: v3.4.0, v3.5.0 Oct 29, 2022
Retry.lift[F] { status =>
val e = Math.pow(2, status.retriesSoFar.toDouble).toLong
val maxDelay = safeMultiply(baseDelay, e)
val delayNanos = (maxDelay.toNanos * Random.nextDouble()).toLong
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use cats.effect.std.Random?

@djspiewak djspiewak modified the milestones: v3.5.0, v3.6.0 Feb 14, 2023
@djspiewak djspiewak modified the milestones: v3.6.0, v3.7.0 May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fold cats-retry into CE 3?
7 participants