Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Internal] Per Partition Automatic Failover: Fixes Gateway 503 Cold Start Issue #4073

Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
53ef5f3
Code changes to add retry logic for GW returned 503.9002.
kundadebdatta Sep 7, 2023
ef78559
Revert "Code changes to add retry logic for GW returned 503.9002."
kundadebdatta Sep 7, 2023
fdffdd8
Code changes to clean up the PPAF retry logic fix.
kundadebdatta Sep 7, 2023
9cd3d97
Code changes to add retry logic for GW returned 503.9002.
kundadebdatta Sep 7, 2023
9165885
Revert "Code changes to add retry logic for GW returned 503.9002."
kundadebdatta Sep 7, 2023
9d61eae
Code changes to clean up the PPAF retry logic fix.
kundadebdatta Sep 7, 2023
d05bc19
Code changes to revert location cache changes.
kundadebdatta Sep 8, 2023
4c68e31
Merge branch 'users/kundadebdatta/fix_gateway_503_cold_start_issue' o…
kundadebdatta Sep 8, 2023
cea2b17
Code changes ro revert location cache changes.
kundadebdatta Sep 8, 2023
5355d42
Merge branch 'master' into users/kundadebdatta/fix_gateway_503_cold_s…
kundadebdatta Sep 8, 2023
e23c50c
Code changes to fix some of the failing tests.
kundadebdatta Sep 15, 2023
12d1193
Code changes to fix unit tests.
kundadebdatta Sep 18, 2023
757ca01
Code changes to add unit tests for client options.
kundadebdatta Sep 19, 2023
20b3d21
Code changes to draft docs for PPAF design approach.
kundadebdatta Sep 20, 2023
5f86260
Code changes to add SDK side design docs for PPAF.
kundadebdatta Sep 26, 2023
f740c9d
Code changes to modify the PPAF design.
kundadebdatta Sep 26, 2023
237acc0
Code changes to fix unit test.
kundadebdatta Sep 27, 2023
3dd5678
Code changes to rename test name.
kundadebdatta Sep 27, 2023
045f337
Merge branch 'master' into users/kundadebdatta/fix_gateway_503_cold_s…
kundadebdatta Sep 27, 2023
0d59eb8
Code changes to add some cosmetic changes.
kundadebdatta Sep 27, 2023
07c644a
Code changes to enable retry on write for all regions in single maste…
kundadebdatta Oct 9, 2023
48a1348
Code changes to add code comments.
kundadebdatta Oct 11, 2023
232a950
Code changes to clean up and handle endpoints in location cache.
kundadebdatta Oct 12, 2023
c32809b
Code changes to fix unit tests. Added detailed code comments.
kundadebdatta Oct 12, 2023
3e2aba2
Code changes to clean up the account read endpoints generation logic.
kundadebdatta Oct 13, 2023
466545c
Code changes to fix unit tests.
kundadebdatta Oct 16, 2023
b8b8b0b
Merge branch 'master' into users/kundadebdatta/fix_gateway_503_cold_s…
kundadebdatta Oct 16, 2023
aabbe5e
Code changes to disable retry when ppaf is not enabled. Also validate…
kundadebdatta Oct 20, 2023
a7c17fe
Code changes to fix unit tests.
kundadebdatta Oct 23, 2023
9215f0c
Code changes to update md file.
kundadebdatta Oct 23, 2023
7760f22
Merge branch 'master' into users/kundadebdatta/fix_gateway_503_cold_s…
kundadebdatta Oct 23, 2023
813ea2d
Code changes to remove chache expiry check for account read endpoints.
kundadebdatta Oct 23, 2023
3cb1053
Code changes to fix unit test.
kundadebdatta Oct 24, 2023
8f782fc
Code changes to fix more tests.
kundadebdatta Oct 24, 2023
1e94fc1
Code changes to address review comments.
kundadebdatta Oct 24, 2023
4180263
Code changes to fix verbaige in design document.
kundadebdatta Oct 26, 2023
c6ee981
Merge branch 'master' into users/kundadebdatta/fix_gateway_503_cold_s…
kundadebdatta Oct 26, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 2 additions & 16 deletions Microsoft.Azure.Cosmos/src/ClientRetryPolicy.cs
Original file line number Diff line number Diff line change
Expand Up @@ -247,8 +247,7 @@ private async Task<ShouldRetryResult> ShouldRetryInternalAsync(
}

// Received 503 due to client connect timeout or Gateway
if (statusCode == HttpStatusCode.ServiceUnavailable
FabianMeiswinkel marked this conversation as resolved.
Show resolved Hide resolved
&& ClientRetryPolicy.IsRetriableServiceUnavailable(subStatusCode))
if (statusCode == HttpStatusCode.ServiceUnavailable)
ealsur marked this conversation as resolved.
Show resolved Hide resolved
{
DefaultTrace.TraceWarning("ClientRetryPolicy: ServiceUnavailable. Refresh cache and retry. Failed Location: {0}; ResourceAddress: {1}",
this.documentServiceRequest?.RequestContext?.LocationEndpointToRoute?.ToString() ?? string.Empty,
Expand All @@ -265,12 +264,6 @@ private async Task<ShouldRetryResult> ShouldRetryInternalAsync(
return null;
}

private static bool IsRetriableServiceUnavailable(SubStatusCodes? subStatusCode)
{
return subStatusCode == SubStatusCodes.Unknown ||
(subStatusCode.HasValue && subStatusCode.Value.IsSDKGeneratedSubStatus());
}

private async Task<ShouldRetryResult> ShouldRetryOnEndpointFailureAsync(
bool isReadRequest,
bool markBothReadAndWriteAsUnavailable,
Expand Down Expand Up @@ -390,7 +383,7 @@ private ShouldRetryResult ShouldRetryOnSessionNotAvailable()

/// <summary>
/// For a ServiceUnavailable (503.0) we could be having a timeout from Direct/TCP locally or a request to Gateway request with a similar response due to an endpoint not yet available.
/// We try and retry the request only if there are other regions available.
/// We try and retry the request only if there are other regions available. The retry logic is applicable for single master write accounts as well.
/// </summary>
private ShouldRetryResult ShouldRetryOnServiceUnavailable()
{
Expand All @@ -400,13 +393,6 @@ private ShouldRetryResult ShouldRetryOnServiceUnavailable()
return ShouldRetryResult.NoRetry();
}

if (!this.canUseMultipleWriteLocations
ealsur marked this conversation as resolved.
Show resolved Hide resolved
&& !this.isReadRequest)
{
// Write requests on single master cannot be retried, no other regions available
return ShouldRetryResult.NoRetry();
}

int availablePreferredLocations = this.globalEndpointManager.PreferredLocationCount;

if (availablePreferredLocations <= 1)
Expand Down
2 changes: 1 addition & 1 deletion Microsoft.Azure.Cosmos/src/Routing/LocationCache.cs
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,7 @@ public Uri ResolveServiceEndpoint(DocumentServiceRequest request)
}
else
{
ReadOnlyCollection<Uri> endpoints = request.OperationType.IsWriteOperation() ? this.WriteEndpoints : this.ReadEndpoints;
ReadOnlyCollection<Uri> endpoints = this.ReadEndpoints;
locationEndpointToRoute = endpoints[locationIndex % endpoints.Count];
}

Expand Down
Loading