Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle CancellationToken for Retry #2396

Closed

Conversation

peter-csala
Copy link
Contributor

@peter-csala peter-csala commented Nov 18, 2024

Pull Request

The issue or feature being addressed

#2375

Details on the issue fix or feature implementation

  • Retry: handle cancelled CancellationToken as OperationCanceledException
  • Assessment
    • Hedging: handles cancelled CancellationToken as OperationCanceledException
    • Timeout: does not handle cancelled CancellationToken as OperationCanceledException
      • We might want to tackle that as a separate PR

Confirm the following

  • I started this PR by branching from the head of the default branch
  • I have targeted the PR to merge into the default branch
  • I have included unit tests for the issue/feature
  • I have successfully run a local build

@peter-csala
Copy link
Contributor Author

It seems like the timeout strategy also does not handle the linked CancellationToken's cancellation as OperationCanceledException

The following code prints 1 as a result

var cts = new CancellationTokenSource();
var p = new ResiliencePipelineBuilder<int>().AddTimeout(TimeSpan.FromSeconds(1)).Build();
var r = await p.ExecuteAsync(async token =>
{
	await Task.Delay(500, token);
	cts.Cancel();
	return 1;
}, cts.Token);

Console.WriteLine(r); // 1

@peter-csala
Copy link
Contributor Author

peter-csala commented Nov 18, 2024

For hedging it seems like we are getting OperationCanceledException at the next attempt.

var options = new HedgingStrategyOptions<int>
{
	ShouldHandle = _ => PredicateResult.True(),
	MaxHedgedAttempts = 3,
	Delay = TimeSpan.FromMilliseconds(100),
	OnHedging = static args =>
	{
		Console.WriteLine("Hedging...");
		return default;
	}
};

var cts = new CancellationTokenSource();
var p = new ResiliencePipelineBuilder<int>().AddHedging(options).Build();
var r = await p.ExecuteAsync(async token =>
{
	await Task.Delay(200);
	cts.Cancel();
	return 3;
}, cts.Token);

Console.WriteLine(r);

Output:

Hedging...
Unhandled exception. System.OperationCanceledException: The operation was canceled.
   at Polly.Utils.ExceptionUtilities.TrySetStackTrace[T](T exception) in /_/src/Polly.Core/Utils/ExceptionUtilities.cs:line 23
   ...

@peter-csala peter-csala marked this pull request as ready for review November 18, 2024 14:36
@peter-csala peter-csala changed the title [DRAFT] Handle CancellationToken for Retry Handle CancellationToken for Retry Nov 18, 2024
Copy link

codecov bot commented Dec 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.40%. Comparing base (543cd50) to head (802eb93).
Report is 19 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff            @@
##             main    #2396    +/-   ##
========================================
  Coverage   85.39%   85.40%            
========================================
  Files         312      312            
  Lines        7464     7466     +2     
  Branches     1121     1122     +1     
========================================
+ Hits         6374     6376     +2     
- Misses        745      905   +160     
+ Partials      345      185   -160     
Flag Coverage Δ
linux 85.40% <100.00%> (+<0.01%) ⬆️
macos 85.37% <100.00%> (-0.03%) ⬇️
windows 85.28% <100.00%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@martincostello martincostello added this to the v8.5.1 milestone Dec 3, 2024
@@ -53,6 +53,11 @@ protected internal override async ValueTask<Outcome<T>> ExecuteCore<TState>(Func
{
var startTimestamp = _timeProvider.GetTimestamp();
var outcome = await StrategyHelper.ExecuteCallbackSafeAsync(callback, context, state).ConfigureAwait(context.ContinueOnCapturedContext);
if (context.CancellationToken.IsCancellationRequested)
{
outcome = Outcome.FromException<T>(new OperationCanceledException(context.CancellationToken));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to acknowledge cancellation only when the retry policy would handle the outcome. It's important that a valid outcome be surfaced by the policy, since the choice to continue processing may well have been deliberate (cf. item three of MS recommendations).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR:

  • If the cancellation is requested before the execution of a given attempt (either the original or any handled retry attempt) then it will short cut the execution with an OCE.
  • If the cancellation is requested during the OnRetry then it will short cut the execution with an OCE.
  • If the cancellation is requested during the waiting of the retry delay then it will short cut the execution with an OCE.

This PR handles the following case:

  • If the cancellation is requested during the execution of the user callback then regardless of the outcome then it will short cut the execution with an OCE.

Just to clarify: are you asking to

  • return the outcome of the user callback regardless of the cancellation was requested (or not) if the strategy won't handle the outcome of the user callback
  • return the outcome of the user callback if the strategy would handle the outcome and it was the last attempt
  • short cut the execution with an OCE if the strategy would handle the outcome it was not the last attempt

Is my understanding correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. Only a handled outcome that is not the last outcome should throw, since this is the only scenario where more work remains to be done by the policy. Under no circumstances should a handled outcome that is not the last outcome be returned (as is currently possible).

@@ -53,6 +53,11 @@ protected internal override async ValueTask<Outcome<T>> ExecuteCore<TState>(Func
{
var startTimestamp = _timeProvider.GetTimestamp();
var outcome = await StrategyHelper.ExecuteCallbackSafeAsync(callback, context, state).ConfigureAwait(context.ContinueOnCapturedContext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to be aware. If the outcome with disposable result is produced and then you are replacing it with exception, it might lead to a memory leak.

I think it might be good idea to encapsulate all this handling into ExecuteCallbackSafeAsync method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we would move the code inside the ExecuteCallbackSafeAsync then we don't have access to the isLastAttempt. @kmcclellan suggested to respect the cancellation request only under certain circumstances.

Please see the other comment section for more details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we would move the code inside the ExecuteCallbackSafeAsync then we don't have access to the isLastAttempt.

The ExecuteCallbackSafeAsync is internal, we could pass that information to the function. Wdyt?

Copy link
Contributor Author

@peter-csala peter-csala Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That function is used by almost all strategies, where the "retry" concept may or may not make any sense. Of course we can pass true as a default for isLastAttempt but it feels a bit awkward to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you still suggesting to add a new isLastAttempt parameter with default value? I'm fine with either approach, I just want to close this PR this year if possible 😁

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah sorry about this, got busy with other stuff. I would say meh, we can add it, it's internal detail anyway. At least this logic will be encapsulated there.

So a new default value with isLastAttempt = true. Retry strategy will provide its own value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I am the reporter of the bug. Hoping I can help move this forward to the right solution!

The concern about disposable outcomes being thrown away is not really relevant to this change. Abiding by the rules of cancellation, we should not replace an outcome unless we have more work to do (in which case it was probably an exception and won't have a result to dispose).

Don’t throw OperationCanceledException after you’ve completed the work, just because the token was signaled

It is only handled outcomes that are not the last outcome which could have a disposable result thrown away. This is true whether we attempt another execution or acknowledge cancellation. It's fair to say that retry policies currently don't support handling disposable outcomes.

I'm not even sure what ExecuteCallbackSafeAsync would supposedly do with isLastAttempt. In order to dispose, it would also need to know that the strategy would handle the outcome.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strategy requires the operation's outcome to decide whether to perform a retry attempt or not.

If we want to respect the cancellation request after the strategy has decided to handle the outcome or not and whether this was a last attempt or not then we have the following situation:

  • ShouldHandle's args receives the outcome of the operation
  • Then we might replace the outcome with an OCE depending on the conditions

That means inside the ShouldHandle delegate users would see the operation's outcome but inside the telemetry and at the Execute{Async}'s result they might see an OCE. This also feels a bit inconsistent behavior for me.

@peter-csala
Copy link
Contributor Author

It seems like the conversation get stuck. If we can't find a solution which is acceptable for everyone then I will close this PR at the end of this week and let others give it a try to tackle the issue

@peter-csala
Copy link
Contributor Author

The discussion got stuck so, I'm closing this PR to let others give a try.

@martincostello martincostello removed this from the v8.5.1 milestone Jan 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants