Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Retry Logic to Only Retry on Server-Side HTTP Errors #1390

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
3 changes: 3 additions & 0 deletions go.mod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: this needs to be reverted

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the scripts/run_lints.sh script, and the only issues flagged were spacing warnings outside of the code I wrote. Let me know if you'd like me to address those too.

Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ require (
github.com/package-url/packageurl-go v0.1.3
github.com/pandatix/go-cvss v0.6.2
github.com/spdx/tools-golang v0.5.5
github.com/stretchr/testify v1.9.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still necessary here? It looks like this is no longer being used.

github.com/tidwall/gjson v1.18.0
github.com/tidwall/pretty v1.2.1
github.com/tidwall/sjson v1.2.5
Expand Down Expand Up @@ -57,6 +58,7 @@ require (
github.com/containerd/stargz-snapshotter/estargz v0.15.1 // indirect
github.com/cpuguy83/go-md2man/v2 v2.0.5 // indirect
github.com/cyphar/filepath-securejoin v0.2.5 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/dlclark/regexp2 v1.11.0 // indirect
github.com/docker/distribution v2.8.3+incompatible // indirect
github.com/docker/docker-credential-helpers v0.8.1 // indirect
Expand Down Expand Up @@ -84,6 +86,7 @@ require (
github.com/opencontainers/go-digest v1.0.0 // indirect
github.com/opencontainers/image-spec v1.1.0-rc3 // indirect
github.com/pjbgf/sha1cd v0.3.0 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/rivo/uniseg v0.4.7 // indirect
github.com/rogpeppe/go-internal v1.12.0 // indirect
github.com/russross/blackfriday/v2 v2.1.0 // indirect
Expand Down
43 changes: 26 additions & 17 deletions pkg/osv/osv.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import (
"encoding/json"
"fmt"
"io"
"math/rand"

"net/http"
"time"

Expand All @@ -16,7 +16,7 @@ import (
"golang.org/x/sync/errgroup"
)

const (
var (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change intentional?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed these from const to var so previously I could adjust the endpoint, I will revert back to const

// QueryEndpoint is the URL for posting queries to OSV.
QueryEndpoint = "https://api.osv.dev/v1/querybatch"
// GetEndpoint is the URL for getting vulnerabilities from OSV.
Expand Down Expand Up @@ -158,7 +158,7 @@ func chunkBy[T any](items []T, chunkSize int) [][]T {

// checkResponseError checks if the response has an error.
func checkResponseError(resp *http.Response) error {
if resp.StatusCode == http.StatusOK {
if resp.StatusCode >= 200 && resp.StatusCode < 300 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I think we shouldn't change this line, since it's not needed for addressing #861

even though 2xx are success codes, we're only expecting the API currently to return a 200 and since this change isn't strictly needed for the rest of the change to work, I would lean towards not changing it for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been cargo-culting the HTTP retry logic from the examples in https://github.com/sethvargo/go-retry/blob/main/retry_test.go, for what it's worth, which treats 5xx errors as retryable, only. That said, I also recently saw https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429 in the wild, which could be retried with an exponential backoff approach.

return nil
}

Expand Down Expand Up @@ -306,26 +306,35 @@ func HydrateWithClient(resp *BatchedResponse, client *http.Client) (*HydratedBat
return &hydrated, nil
}

// makeRetryRequest will return an error on both network errors, and if the response is not 200
// makeRetryRequest will return an error on both network errors, and if the response is not 2xx
func makeRetryRequest(action func() (*http.Response, error)) (*http.Response, error) {
var resp *http.Response
var err error

for i := range maxRetryAttempts {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: I think you've changed this loop more than you need to

  • for i := range maxRetryAttempts should be equivalent to what you've got currently
  • you've removed the random jitter, which is intentionally present to smooth the back-off

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you were right. Sorry for removing Jitter in the function. Added back!

// rand is initialized with a random number (since go1.20), and is also safe to use concurrently
// we do not need to use a cryptographically secure random jitter, this is just to spread out the retry requests
// #nosec G404
jitterAmount := (rand.Float64() * float64(jitterMultiplier) * float64(i))
time.Sleep(time.Duration(i*i)*time.Second + time.Duration(jitterAmount*1000)*time.Millisecond)

for i := 0; i < maxRetryAttempts; i++ {
resp, err = action()
if err == nil {
// Check the response for HTTP errors
err = checkResponseError(resp)
if err == nil {
break
}
if err != nil {

sleepDuration := time.Duration(i*jitterMultiplier) * time.Second
time.Sleep(sleepDuration)
continue
}

if resp.StatusCode >= 500 {

resp.Body.Close()
sleepDuration := time.Duration(i*jitterMultiplier) * time.Second
time.Sleep(sleepDuration)
continue
}

// Success or client error, do not retry
break
}

if resp != nil && resp.StatusCode >= 500 {
resp.Body.Close()
return nil, fmt.Errorf("received %d status code after %d attempts", resp.StatusCode, maxRetryAttempts)
}

return resp, err
Expand Down
50 changes: 50 additions & 0 deletions pkg/osv/osv_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
package osv

import (
"log"
"net/http"
"net/http/httptest"
"testing"
"time"
)

func TestRetryOn5xx(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: we should be testing for a couple of different status codes and orders

I'd recommend refactoring this into a table-based test, and we should have cases for at least:

  • 200
  • 4xx
  • 5xx
  • 5xx, then a 2xx

attempt := 0
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
attempt++
w.WriteHeader(http.StatusInternalServerError) // 500
}))
defer server.Close()

// Override the QueryEndpoint for testing
originalQueryEndpoint := QueryEndpoint
QueryEndpoint = server.URL
defer func() { QueryEndpoint = originalQueryEndpoint }()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: there's no reason to do this, because QueryEndpoint isn't used directly in the function you're testing

(that'll then mean you can make this test a parallel one)


client := &http.Client{
Timeout: 2 * time.Second,
}

resp, err := makeRetryRequest(func() (*http.Response, error) {
req, _ := http.NewRequest(http.MethodPost, QueryEndpoint, nil)
req.Header.Set("Content-Type", "application/json")
return client.Do(req)
})

log.Printf("TestRetryOn5xx: resp = %v, err = %v", resp, err)

// Assertion: resp should be nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these comments feel pretty redundant

if resp != nil {
t.Errorf("Expected response to be nil after retries on 5xx errors, but got: %v", resp)
}

// Assertion: err should not be nil
if err == nil {
t.Errorf("Expected an error after retries on 5xx errors, but got none")
}

// Assertion: number of attempts should equal maxRetryAttempts
if attempt != maxRetryAttempts {
t.Errorf("Expected number of attempts to equal maxRetryAttempts (%d), but got: %d", maxRetryAttempts, attempt)
}
}