Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Retry Logic to Only Retry on Server-Side HTTP Errors #1390

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

VishalGawade1
Copy link

Changes Implemented

Fixes #861

Selective Retrying in osv.go:

Before: The retry logic did not differentiate between server-side and client-side HTTP errors, potentially leading to unnecessary retries on HTTP 4xx responses.
After: Updated the retry mechanism to only retry when the response status code is in the 500 range (HTTP 5xx). This prevents the system from retrying requests that are likely to fail due to client-side issues, thereby optimizing performance and reducing redundant network calls.

osv_test.go:
Verified that the updated retry logic correctly differentiates between HTTP 5xx and HTTP 4xx responses.
Ensured that retries are only attempted for HTTP 5xx errors by running and passing the TestRetryOn5xx test case.

image

Copy link

google-cla bot commented Nov 9, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Collaborator

@G-Rath G-Rath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've not reviewed it in full as I'm on mobile, but first we should refactor the tests to not use a third party assertion library

"testing"
"time"

"github.com/stretchr/testify/assert"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the std libs for asserting rather than pull in a new package

Copy link
Author

@VishalGawade1 VishalGawade1 Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've switched to standard library assertions for simplicity. Let me know if there’s anything more to address. Thanks!

Copy link
Collaborator

@G-Rath G-Rath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a good start - can you also make sure you've run scripts/run_lints.sh and address anything it brings up?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: this needs to be reverted

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the scripts/run_lints.sh script, and the only issues flagged were spacing warnings outside of the code I wrote. Let me know if you'd like me to address those too.

func makeRetryRequest(action func() (*http.Response, error)) (*http.Response, error) {
var resp *http.Response
var err error

for i := range maxRetryAttempts {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: I think you've changed this loop more than you need to

  • for i := range maxRetryAttempts should be equivalent to what you've got currently
  • you've removed the random jitter, which is intentionally present to smooth the back-off

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you were right. Sorry for removing Jitter in the function. Added back!

pkg/osv/osv.go Outdated
@@ -158,7 +158,7 @@ func chunkBy[T any](items []T, chunkSize int) [][]T {

// checkResponseError checks if the response has an error.
func checkResponseError(resp *http.Response) error {
if resp.StatusCode == http.StatusOK {
if resp.StatusCode >= 200 && resp.StatusCode < 300 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I think we shouldn't change this line, since it's not needed for addressing #861

even though 2xx are success codes, we're only expecting the API currently to return a 200 and since this change isn't strictly needed for the rest of the change to work, I would lean towards not changing it for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been cargo-culting the HTTP retry logic from the examples in https://github.com/sethvargo/go-retry/blob/main/retry_test.go, for what it's worth, which treats 5xx errors as retryable, only. That said, I also recently saw https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429 in the wild, which could be retried with an exponential backoff approach.

Comment on lines 19 to 22
// Override the QueryEndpoint for testing
originalQueryEndpoint := QueryEndpoint
QueryEndpoint = server.URL
defer func() { QueryEndpoint = originalQueryEndpoint }()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: there's no reason to do this, because QueryEndpoint isn't used directly in the function you're testing

(that'll then mean you can make this test a parallel one)

"time"
)

func TestRetryOn5xx(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: we should be testing for a couple of different status codes and orders

I'd recommend refactoring this into a table-based test, and we should have cases for at least:

  • 200
  • 4xx
  • 5xx
  • 5xx, then a 2xx


log.Printf("TestRetryOn5xx: resp = %v, err = %v", resp, err)

// Assertion: resp should be nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these comments feel pretty redundant

@VishalGawade1
Copy link
Author

Hi @G-Rath,
While running run_lints.sh, I found couple of issues, so I made some updates to the makeRetryRequest function and adjusted the osv_test.go file based on your feedback. Now, the script runs without any issues on my end, aside from a few spacing warnings in other functions. Just let me know if you’d like me to fix those up too.

@G-Rath
Copy link
Collaborator

G-Rath commented Nov 10, 2024

aside from a few spacing warnings in other functions

would you mind posting a couple of those here? as there shouldn't be any output, and my local doesn't give any

@VishalGawade1
Copy link
Author

I'm not sure if this is what you are doing but If you're trying to run the script in PowerShell, it likely won’t work and will just return a blank output. Try using Git Bash instead.
here is the command from script:

go install github.com/golangci/golangci-lint/cmd/[email protected]
golangci-lint run ./... --max-same-issues 0

and here are the warnings:
pkg\osv\osv.go:160:6: func checkResponseError is unused (unused)
pkg\osv\osv.go:322:28: G404: Use of weak random number generator (math/rand or math/rand/v2 instead of crypto/rand)
pkg\osv\osv_test.go:53:3: The copy of the 'for' variable "tc" can be deleted (Go 1.22+) (copyloopvar)
pkg\spdx\verify.go:1: File is not gofmt-ed with -s (gofmt)

I'm fairly certain that running go fmt should resolve these warnings. I'll go ahead and fix the warnings related to osv_test.go. Let me know if you'd like me to address the other warnings as well.

@G-Rath
Copy link
Collaborator

G-Rath commented Nov 11, 2024

@VishalGawade1 only the last one is about spacing, the others are legitimate issues to be addressed in your pull request, along with other changes from my review that you're yet to make such as restoring the for i := range loop.

The pkg\spdx\verify.go warning is interesting - I'm happy if you want to just apply that and push it up so we can take a look at what actually gets changed, even though it won't belong in this PR

If you're trying to run the script in PowerShell

I'm not since the script is a shell script not a powershell script 😅

edit: right I misread - yeah I develop in WSL so it's all going through linux; by "blank output" I mean it will give you this:

osv-scanner on  main [$?] via 🐹 v1.22.7 via  v20.11.0 via 🐍 v3.10.12 took 2s
❯ scripts/run_lints.sh
+ go run github.com/golangci/golangci-lint/cmd/[email protected] run ./... --max-same-issues 0

osv-scanner on  main [$?] via 🐹 v1.22.7 via  v20.11.0 via 🐍 v3.10.12 took 10s
❯

@VishalGawade1
Copy link
Author

@VishalGawade1 only the last one is about spacing, the others are legitimate issues to be addressed in your pull request, along with other changes from my review that you're yet to make such as restoring the for i := range loop.

The pkg\spdx\verify.go warning is interesting - I'm happy if you want to just apply that and push it up so we can take a look at what actually gets changed, even though it won't belong in this PR

If you're trying to run the script in PowerShell

I'm not since the script is a shell script not a powershell script 😅

@VishalGawade1 only the last one is about spacing, the others are legitimate issues to be addressed in your pull request, along with other changes from my review that you're yet to make such as restoring the for i := range loop.

The pkg\spdx\verify.go warning is interesting - I'm happy if you want to just apply that and push it up so we can take a look at what actually gets changed, even though it won't belong in this PR

If you're trying to run the script in PowerShell

I'm not since the script is a shell script not a powershell script 😅

edit: right I misread - yeah I develop in WSL so it's all going through linux; by "blank output" I mean it will give you this:

osv-scanner on  main [$?] via 🐹 v1.22.7 via  v20.11.0 via 🐍 v3.10.12 took 2s
❯ scripts/run_lints.sh
+ go run github.com/golangci/golangci-lint/cmd/[email protected] run ./... --max-same-issues 0

osv-scanner on  main [$?] via 🐹 v1.22.7 via  v20.11.0 via 🐍 v3.10.12 took 10s
❯

yeah, I assumed just to be safe 😄
Sure, I will take look at other warnings and will update you.

@VishalGawade1
Copy link
Author

@VishalGawade1 only the last one is about spacing, the others are legitimate issues to be addressed in your pull request, along with other changes from my review that you're yet to make such as restoring the for i := range loop.
The pkg\spdx\verify.go warning is interesting - I'm happy if you want to just apply that and push it up so we can take a look at what actually gets changed, even though it won't belong in this PR

If you're trying to run the script in PowerShell

I'm not since the script is a shell script not a powershell script 😅

@VishalGawade1 only the last one is about spacing, the others are legitimate issues to be addressed in your pull request, along with other changes from my review that you're yet to make such as restoring the for i := range loop.
The pkg\spdx\verify.go warning is interesting - I'm happy if you want to just apply that and push it up so we can take a look at what actually gets changed, even though it won't belong in this PR

If you're trying to run the script in PowerShell

I'm not since the script is a shell script not a powershell script 😅
edit: right I misread - yeah I develop in WSL so it's all going through linux; by "blank output" I mean it will give you this:

osv-scanner on  main [$?] via 🐹 v1.22.7 via  v20.11.0 via 🐍 v3.10.12 took 2s
❯ scripts/run_lints.sh
+ go run github.com/golangci/golangci-lint/cmd/[email protected] run ./... --max-same-issues 0

osv-scanner on  main [$?] via 🐹 v1.22.7 via  v20.11.0 via 🐍 v3.10.12 took 10s
❯

yeah, I assumed just to be safe 😄 Sure, I will take look at other warnings and will update you.

@G-Rath
I need your suggestion, what should I do with the checkResponse function? Should I just remove it? Is that a good Idea?

@G-Rath
Copy link
Collaborator

G-Rath commented Nov 11, 2024

I need your suggestion, what should I do with the checkResponse function? Should I just remove it? Is that a good Idea?

Generally speaking I think your current implementation is a bit ... lets say nil-y (I don't want to say "messy" as I wouldn't go that far, but.. there are a lot of != nil checks in there), and I would recommend you review that to see how you could clean it up.

I believe that's why the checkResponse function existed in the first place, and it's why you've added getStatusCode - ultimately, I think the end solution will probably benefit from one utility function that is likely a combo of both of those functions, and that's why I recommend you review the whole implementation of the loop.

(that is to say, technically "yes it should go because it's not being used anymore", but I might as well frontfoot the introduction of a new similar function while we're at it 🙂)

@VishalGawade1
Copy link
Author

I need your suggestion, what should I do with the checkResponse function? Should I just remove it? Is that a good Idea?

Generally speaking I think your current implementation is a bit ... lets say nil-y (I don't want to say "messy" as I wouldn't go that far, but.. there are a lot of != nil checks in there), and I would recommend you review that to see how you could clean it up.

I believe that's why the checkResponse function existed in the first place, and it's why you've added getStatusCode - ultimately, I think the end solution will probably benefit from one utility function that is likely a combo of both of those functions, and that's why I recommend you review the whole implementation of the loop.

(that is to say, technically "yes it should go because it's not being removed anymore", but I might as well frontfoot the introduction of a new similar function while we're at it 🙂)

I totally understand, sorry about that, and I really appreciate your feedback. I'll work on making the code cleaner and more readable. I'll remove getStatusCode and checkResponse and incorporate error handling directly into makeRetryRequest. I’ll also make the test cases a bit more comprehensive and address the warnings. I'll update you soon!

@VishalGawade1
Copy link
Author

Hi @G-Rath ,
I’ve removed both checkResponse and getStatusCode functions and consolidated error handling directly within makeRetryRequest. Additionally, there are no warnings when running run_lints. Let me know if this approach works for you. 😄

go.mod Outdated
@@ -26,6 +26,7 @@ require (
github.com/package-url/packageurl-go v0.1.3
github.com/pandatix/go-cvss v0.6.2
github.com/spdx/tools-golang v0.5.5
github.com/stretchr/testify v1.9.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still necessary here? It looks like this is no longer being used.

pkg/osv/osv.go Outdated
@@ -16,7 +16,7 @@ import (
"golang.org/x/sync/errgroup"
)

const (
var (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change intentional?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed these from const to var so previously I could adjust the endpoint, I will revert back to const

pkg/osv/osv.go Outdated
func makeRetryRequest(action func() (*http.Response, error)) (*http.Response, error) {
var resp *http.Response
var err error
const maxRetryAttempts = 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two variables already exist at the top const section

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I kept changing the variable so i wanted to use it locally. I will revert these changes as well

pkg/osv/osv.go Outdated
break
}
backoff := time.Duration(i*i) * time.Second
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear why jitter is changed here, the new implementation requires duplicating the jitter code? Can you revert this to the previous implementation

…ormat, and restored jitter implementation to avoid code duplication in makeRetryRequest function
@VishalGawade1
Copy link
Author

VishalGawade1 commented Nov 11, 2024

Please review the changes. Due to the merge conflict, some modifications were added back during the restore in previous commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve retry logic to only retry for appropriate HTTP error codes
4 participants