uploader: Improve upload retry policy #61

victorges · 2024-06-21T22:34:11Z

Implement 2 tiers of retries:

one shorter cycle, when trying to upload to primary or backup, where
we retry a couple times up to 30s
- This solves for transient errors when saving to primary or backup storage
a longer cycle wrapping those 2, to sustain potentially long running
crisis. We wait longer between each retry and keep trying for up to 1h
- This solves for longer running incidents, for which it's better to keep trying
  for a long time instead of dropping the process and lose the segment.

The inner retry loop also avoids waiting too long before falling back to the backup storage.
We need to avoid waiting too long otherwise the recordings could start to get processed
while segments were still trying to upload to the storage.

In order to be able to hold the upload attempt for so long I also changed the logic to
save the file in a temporary local file instead of keeping it in memory. This will keep the
memory usage low in case we have a long-running incident.

Have 2 tiers of retries: - one shorter cycle, when trying to upload to primary or backup, where we retry a couple times up to 1m - a longer cycle wrapping those 2, to sustain potentially long running crisis. We wait longer between each retry and keep trying for up to 1h The first loop solves for transient errors when saving to primary or backup storage, while the second loop solves for longer running incidents, for which it's probably better to keep trying for a long time instead of dropping the process and lose the recording saving.

victorges · 2024-07-30T15:20:16Z

core/uploader.go

+}
+
+func SingleRequestRetryBackoff() backoff.BackOff {
+	return newExponentialBackOffExecutor(5*time.Second, 10*time.Second, 30*time.Second)
 }

 const segmentWriteTimeout = 5 * time.Minute


Reminder that we need to lower this value.

Should take this in consideration. Looks like Storj writes are really slow.
https://eu-metrics-monitoring.livepeer.live/grafana/d/JZNaFMv4z/vod-monitoring?orgId=1&refresh=5s&from=now-24h&to=now&viewPanel=7

I've added a timetaken field to the "succeeded" log line so that we can get an idea of how long these uploads are taking. So then i'll lower it in a further PR does that sound ok?

WDYT of making this an env var config, like the backup storage? So at least we can tune it better without a new code change

ah good idea 👍

mjh1 · 2024-08-12T09:43:48Z

@victorges I've just tweaked the outer retries down a bit, does that look ok to you?

victorges

LGTM!

victorges changed the title ~~core: Improve upload retry policy~~ uploader: Improve upload retry policy Jun 21, 2024

victorges force-pushed the vg/feat/better-retries branch 3 times, most recently from 4c56dc9 to 4a43662 Compare June 24, 2024 19:10

victorges force-pushed the vg/feat/backup-storage branch from 4f56725 to d509819 Compare June 24, 2024 19:11

victorges force-pushed the vg/feat/better-retries branch 3 times, most recently from c37cc5a to b4c376a Compare June 25, 2024 17:32

Base automatically changed from vg/feat/backup-storage to main June 25, 2024 19:06

victorges added 3 commits July 25, 2024 16:35

core: Avoid keeping files in memory

bab205f

uploader: Fix error returned when backup URL can't be built

8e1577c

mjh1 force-pushed the vg/feat/better-retries branch from ce8d72c to 8e1577c Compare July 25, 2024 15:37

victorges commented Jul 30, 2024

View reviewed changes

mjh1 added 2 commits August 9, 2024 17:38

Merge branch 'main' into vg/feat/better-retries

bdd1bf3

tweak retries and add time taken log

38ab542

mjh1 approved these changes Aug 21, 2024

View reviewed changes

victorges commented Aug 21, 2024

View reviewed changes

mjh1 added 2 commits August 22, 2024 12:49

Add segment write timeout config param

02957ca

fix test

5d8ffd0

mjh1 merged commit 3517b44 into main Aug 22, 2024
8 checks passed

mjh1 deleted the vg/feat/better-retries branch August 22, 2024 11:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uploader: Improve upload retry policy #61

uploader: Improve upload retry policy #61

victorges commented Jun 21, 2024 •

edited by mjh1

Loading

victorges Jul 30, 2024

victorges Jul 30, 2024

mjh1 Aug 12, 2024

victorges Aug 21, 2024

mjh1 Aug 22, 2024

mjh1 commented Aug 12, 2024

victorges left a comment

uploader: Improve upload retry policy #61

uploader: Improve upload retry policy #61

Conversation

victorges commented Jun 21, 2024 • edited by mjh1 Loading

victorges Jul 30, 2024

Choose a reason for hiding this comment

victorges Jul 30, 2024

Choose a reason for hiding this comment

mjh1 Aug 12, 2024

Choose a reason for hiding this comment

victorges Aug 21, 2024

Choose a reason for hiding this comment

mjh1 Aug 22, 2024

Choose a reason for hiding this comment

mjh1 commented Aug 12, 2024

victorges left a comment

Choose a reason for hiding this comment

victorges commented Jun 21, 2024 •

edited by mjh1

Loading