Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add grace period to uptime check #167

Merged
merged 1 commit into from
Apr 30, 2024

Conversation

Redm4x
Copy link
Contributor

@Redm4x Redm4x commented Apr 23, 2024

Improvements to the provider uptime tracking system #152

  • Added a grace period before treating a provider as completely offline. This will reflect in the following places:

    • /internal/gpu-prices endpoint used by the GPU Pricing page
    • /internal/gpu & /internal/provider-versions endpoints used for internal tracking
    • Network capacity tiles + historical graph (shown on https://stats.akash.network/)
    • Provider detail page in Cloudmos / Console

    The default grace period is 3h, but this can be changed with the ProviderUptimeGracePeriodMinutes env var in the api.

  • When a provider fails an uptime check, start retrying every minutes and slowly increase the interval based on the following table:

Downtime Duration Retry Frequency
< 15m 1m
15m - 1h 5m
1h - 6h 15m
6h - 24h 30m
24h - 7d 1h
7d+ 24h
  • Merge the uptime bars when they occurs in the same 15m window (during retries) so that they each represent the same interval. Added an orange bar when a provider is up and down in the same 15m window.
    image
  • Uptime percentage (1d,7d,30d) will now take into account the interval between each check. That way if a provider fails a check, but the retry is successful after 1m then it will only count for 1 minute of downtime.

@Redm4x Redm4x marked this pull request as ready for review April 23, 2024 15:51
function groupUptimeChecksByPeriod(uptimeChecks: { isOnline: boolean; checkDate: string }[] = []) {
const groupedSnapshots: { checkDate: Date; checks: boolean[] }[] = [];

uptimeChecks.toSorted((a, b) => new Date(a.checkDate).getTime() - new Date(b.checkDate).getTime());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know about this method :) after looking into the doc and testing it seems like it's not mutating the original array. So seems like this usage might not be intended

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I changed the sort for toSorted to make use of the new immutable array methods which I prefer, but forgot to add the new variable. 😄 There's some other cool ones like toSpliced and toReversed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, I like that too!

const uptime1d = provider.total1d > 0 ? provider.online1d / provider.total1d : 0;
const uptime7d = provider.total7d > 0 ? provider.online7d / provider.total7d : 0;
const uptime30d = provider.total30d > 0 ? provider.online30d / provider.total30d : 0;
const uptime1d = Math.max(0, 1 - offline_seconds_1d / (24 * 60 * 60));
Copy link
Contributor

@ygrishajev ygrishajev Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't mind this numbers. just dropping it here https://www.npmjs.com/package/time-constants - used to use it. a lib is quite simple but improves readability quite a bit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I found out that date-fns has similar constants so I used those instead. Way more readable 👍

}));
}

const uptimePeriods = groupUptimeChecksByPeriod(provider?.uptime || []);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method could be extracted as a hook with useMemo since it's called on every render and does a decent amount of calculations.

@Redm4x Redm4x force-pushed the refactor/improve-provider-uptime-tracking branch from ac7c09d to eafb3ec Compare April 25, 2024 17:35
@Redm4x Redm4x force-pushed the refactor/improve-provider-uptime-tracking branch from eafb3ec to 2758a90 Compare April 29, 2024 13:41
@Redm4x Redm4x merged commit 8980154 into main Apr 30, 2024
5 checks passed
@Redm4x Redm4x deleted the refactor/improve-provider-uptime-tracking branch April 30, 2024 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants