Feature/prometheusv2 #478

V3ckt0r · 2018-01-19T15:23:01Z

This is an alternative PR to #415 using the K6 engine directly to transpose metrics to Prometheus.

codecov-io · 2018-01-19T15:30:24Z

Codecov Report

Merging #478 into master will decrease coverage by 9.9%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #478      +/-   ##
==========================================
- Coverage   72.31%   62.41%   -9.91%     
==========================================
  Files         132       93      -39     
  Lines        9703     6667    -3036     
==========================================
- Hits         7017     4161    -2856     
- Misses       2272     2273       +1     
+ Partials      414      233     -181

Impacted Files	Coverage Δ
api/server.go	`73.33% <100%> (+7.81%)`	⬆️
stats/sink.go	`5.63% <0%> (-94.37%)`	⬇️
lib/models.go	`20.98% <0%> (-73.54%)`	⬇️
stats/cloud/collector.go	`0% <0%> (-70.39%)`	⬇️
lib/runner.go	`0% <0%> (-67.65%)`	⬇️
lib/netext/dialer.go	`34.48% <0%> (-59.89%)`	⬇️
stats/cloud/config.go	`0% <0%> (-56.87%)`	⬇️
stats/cloud/errors.go	`0% <0%> (-54.84%)`	⬇️
stats/influxdb/util.go	`34.78% <0%> (-43.48%)`	⬇️
cmd/config.go	`33.33% <0%> (-41.5%)`	⬇️
... and 100 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3fc300c...43c6ea4. Read the comment docs.

jonathonlacher · 2018-01-22T15:21:38Z

Could you add a gauge for p99 as well?

V3ckt0r · 2018-01-22T16:56:22Z

@jonathon-L, yea sure I'll add that in.

Before I do further work, I'll await to get some feedback.

V3ckt0r · 2018-01-29T14:43:18Z

hey @liclac @robingustafsson,

Any thoughts on this and #415?

Thanks,
B

jonathonlacher · 2018-01-29T15:34:56Z

I'm not sure if this is in-scope for this PR, but ultimately exposing these metrics in histogram format would provide the most value. Gauges are fine for graphing an individual test. However, comparing tests over time requires storing the metrics for each request, or at least storing histogram data of each request.

robingustafsson · 2018-01-30T11:40:32Z

@V3ckt0r Great, thanks for contributing this! Apologies for taking so long before reviewing. I think this PR looks preferable to #415 for the reasons discussed in that PR (avoiding an extra HTTP call to /v1/metrics and exposing metrics more directly in a prometheus native way).

There's some issues we need to solve though in terms of the metrics data. First, the statistics that k6 outputs through the Sink interface are global; avg, med, p(X) etc. are going to be across the entire set of samples for a metric, it's not time bucketed/per-stage/periodized since last call or anything like that (the idea being that that would be handled by the result storage system; InfluxDB, Prometheus, Load Impact Insights etc.).

The second issue relates to Trend metrics, before calling sink.Format() on a Trend metric we need to make sure that the sink.Calc() method has been called. If not then the samples in the float64 TrendSink.Values slice will be in chronological (or worse) order, which means that the med and all p(X) values will be wrong.

Given this, I think the most appropriate would be, as @jonathon-L said above, to expose Trend metrics in histogram format. I've only had a quick look at the prometheus Go client docs and code so I've no real understanding of the amount of work needed to make that happen though, so I'd like to hear ideas of how to solve the issues mentioned above. Thoughts?

V3ckt0r · 2018-05-08T19:23:17Z

Woops, sorry I must admit I dropped the ball with this @robingustafsson @jonathon-L 😵

I see what you mean about the way the Sink interface handles the percentiles/avg/med etc. I didn't appreciate this before so thought just taking them as is was correct. I agree that getting Prometheus to handle this is better and using Histograms to do this is the way forwards, as @jonathon-L said.

Correct me if I'm wrong but I think the second point you raised out about sink.Calc() wouldn't be an issue given the above? As the percentile/med/avg etc will be calculated in Prometheus.

In terms of the use case for this functionality. How do you envisage this working? Are you thinking that the metrics will be tagged with the particular endpoint that is tested. For instance something like http_reqs{endpoint="https://google.com"} for a k6 test of https://google.com

mstoykov · 2019-01-16T11:53:16Z

Thanks a lot for this PR(and the one before it ;)).
We would like to support prometheus and I think we can push this along. I am going to try to look into the specific of the implementation until the end of the week and try to come up with an update list of issues/things to do. @V3ckt0r Do you still fill like you can continue pushing it along ?

V3ckt0r · 2019-01-21T11:58:39Z

yea I'll brush off the cobwebs and try pick this back up 😅

…et the metrics, instead of make use of the internal api as 415 does.

CLAassistant · 2019-04-21T21:24:56Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

golangcibot · 2019-04-21T21:25:54Z