Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix usage of the Metrics Timer. #135

Merged
merged 1 commit into from
Sep 12, 2024
Merged

Fix usage of the Metrics Timer. #135

merged 1 commit into from
Sep 12, 2024

Conversation

maciejtrybilo
Copy link
Contributor

@maciejtrybilo maciejtrybilo commented Sep 12, 2024

These changes are now available in 1.16.1

When using the Timer aggregate the measurements by status and job name rather than by status and job id.

Job id is unique to each job run and therefore in the current implementation a new summary is created for each job run which doesn't collect useful metrics data and causes excessive memory usage.

Copy link
Member

@ptoffy ptoffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! I do think it might make sense keeping the job id in the dimensions. what do you say?

Copy link

codecov bot commented Sep 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.76%. Comparing base (be4ac72) to head (0e88458).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #135      +/-   ##
==========================================
+ Coverage   84.48%   84.76%   +0.28%     
==========================================
  Files          22       22              
  Lines         709      709              
==========================================
+ Hits          599      601       +2     
+ Misses        110      108       -2     
Files with missing lines Coverage Δ
Sources/Queues/QueueWorker.swift 96.66% <100.00%> (ø)

... and 1 file with indirect coverage changes

@maciejtrybilo
Copy link
Contributor Author

thanks! I do think it might make sense keeping the job id in the dimensions. what do you say?

The matching is on the dimensions, so that wouldn't fix the issue. Apart from the memory issue that, I've just found out, affects Prometheus 1 only I think, if you use the job id in the dimensions, you're not aggregating anything. You're getting a separate histogram made out of one data point for each run of a job.

Like this:

eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer_count 1
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer_sum 0.002279166
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer{success="true", quantile="0.01", id="EEAE9DDD-B279-4BB2-8D86-A993E1B3AE71"} 0.002279166
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer{success="true", quantile="0.05", id="EEAE9DDD-B279-4BB2-8D86-A993E1B3AE71"} 0.002279166
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer{success="true", quantile="0.5", id="EEAE9DDD-B279-4BB2-8D86-A993E1B3AE71"} 0.002279166
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer{success="true", quantile="0.9", id="EEAE9DDD-B279-4BB2-8D86-A993E1B3AE71"} 0.002279166
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer{success="true", quantile="0.95", id="EEAE9DDD-B279-4BB2-8D86-A993E1B3AE71"} 0.002279166
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer{success="true", quantile="0.99", id="EEAE9DDD-B279-4BB2-8D86-A993E1B3AE71"} 0.002279166
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer{success="true", quantile="0.999", id="EEAE9DDD-B279-4BB2-8D86-A993E1B3AE71"} 0.002279166
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer_count{success="true", id="EEAE9DDD-B279-4BB2-8D86-A993E1B3AE71"} 1
eeae9ddd_b279_4bb2_8d86_a993e1b3ae71_jobdurationtimer_sum{success="true", id="EEAE9DDD-B279-4BB2-8D86-A993E1B3AE71"} 0.002279166
# TYPE 17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer summary
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{quantile="0.01"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{quantile="0.05"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{quantile="0.5"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{quantile="0.9"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{quantile="0.95"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{quantile="0.99"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{quantile="0.999"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer_count 1
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer_sum 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{success="true", quantile="0.01", id="17DF1D6D-3C71-4DB5-9BDA-80CD8ECFB013"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{success="true", quantile="0.05", id="17DF1D6D-3C71-4DB5-9BDA-80CD8ECFB013"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{success="true", quantile="0.5", id="17DF1D6D-3C71-4DB5-9BDA-80CD8ECFB013"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{success="true", quantile="0.9", id="17DF1D6D-3C71-4DB5-9BDA-80CD8ECFB013"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{success="true", quantile="0.95", id="17DF1D6D-3C71-4DB5-9BDA-80CD8ECFB013"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{success="true", quantile="0.99", id="17DF1D6D-3C71-4DB5-9BDA-80CD8ECFB013"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer{success="true", quantile="0.999", id="17DF1D6D-3C71-4DB5-9BDA-80CD8ECFB013"} 0.002005
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer_count{success="true", id="17DF1D6D-3C71-4DB5-9BDA-80CD8ECFB013"} 1
17df1d6d_3c71_4db5_9bda_80cd8ecfb013_jobdurationtimer_sum{success="true", id="17DF1D6D-3C71-4DB5-9BDA-80CD8ECFB013"} 0.002005

@maciejtrybilo
Copy link
Contributor Author

I think this is the kind of output that's useful?

emailjob_jobdurationtimer_count 1002
emailjob_jobdurationtimer_sum 0.233929109
emailjob_jobdurationtimer{success="true", quantile="0.01", jobName="EmailJob"} 0.000174042
emailjob_jobdurationtimer{success="true", quantile="0.05", jobName="EmailJob"} 0.00017925
emailjob_jobdurationtimer{success="true", quantile="0.5", jobName="EmailJob"} 0.000211583
emailjob_jobdurationtimer{success="true", quantile="0.9", jobName="EmailJob"} 0.000280563
emailjob_jobdurationtimer{success="true", quantile="0.95", jobName="EmailJob"} 0.000302521
emailjob_jobdurationtimer{success="true", quantile="0.99", jobName="EmailJob"} 0.0003743125
emailjob_jobdurationtimer{success="true", quantile="0.999", jobName="EmailJob"} 0.000685083
emailjob_jobdurationtimer_count{success="true", jobName="EmailJob"} 1002
emailjob_jobdurationtimer_sum{success="true", jobName="EmailJob"} 0.233929109

Copy link
Member

@ptoffy ptoffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool - I thought keeping the id around would make sense for debugging issues etc but TIL!

@ptoffy
Copy link
Member

ptoffy commented Sep 12, 2024

@0xTim not sure if you want to give this a once over too

@0xTim 0xTim added the semver-patch Internal changes only label Sep 12, 2024
Copy link
Member

@0xTim 0xTim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@0xTim 0xTim merged commit 2d38cd2 into vapor:main Sep 12, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-patch Internal changes only
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants