Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Prometheus Exporter and documentation #534

Closed
wants to merge 2 commits into from

Conversation

flaviostutz
Copy link
Contributor

Background

This is a complete initial implementation of Prometheus Exporter support as discussed in #477

Checklist

  • Git commit messages conform to community standards.
  • Each Git commit represents meaningful milestones or atomic units of work.
  • Changed or added code is covered by appropriate tests.

@flaviostutz flaviostutz requested review from tsenart and xla as code owners July 24, 2020 20:30
@flaviostutz
Copy link
Contributor Author

This is a continuation of PR #526 but decided to create a new PR to cleanup things for a new review.

@flaviostutz flaviostutz mentioned this pull request Jul 24, 2020
3 tasks
@flaviostutz
Copy link
Contributor Author

@tsenart Could you please give a check on this PR?

@tsenart
Copy link
Owner

tsenart commented Aug 6, 2020

@flaviostutz: I've been swamped with work but saved some time to look at this on Saturday! Sorry.

@flaviostutz
Copy link
Contributor Author

No worries... I know how it is... here with me it's the same... thanks!

Copy link
Owner

@tsenart tsenart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of review done! Thanks for working on this 🙇

Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
lib/prom/prom.go Outdated Show resolved Hide resolved
lib/prom/prom.go Outdated Show resolved Hide resolved
lib/prom/prom.go Outdated Show resolved Hide resolved
lib/prom/prom.go Outdated Show resolved Hide resolved
lib/results_test.go Outdated Show resolved Hide resolved
flaviostutz added a commit to flaviostutz/vegeta that referenced this pull request Aug 28, 2020
@flaviostutz
Copy link
Contributor Author

@tsenart could you please take a look at this PR when you have some time? I think I managed to do all the requested changes.

@xla xla removed their request for review September 3, 2020 18:06
tsenart
tsenart previously approved these changes Sep 13, 2020
Copy link
Owner

@tsenart tsenart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few leftover changes / inconsistencies. Thank you 🙇

Dockerfile Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
lib/prom/prom_test.go Outdated Show resolved Hide resolved
lib/prom/prom.go Outdated Show resolved Hide resolved
lib/prom/prom.go Outdated
vegeta "github.com/tsenart/vegeta/v12/lib"
)

//PrometheusMetrics vegeta metrics observer with exposition as Prometheus metrics endpoint
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do this.

flaviostutz added a commit to flaviostutz/vegeta that referenced this pull request Oct 9, 2020
@flaviostutz
Copy link
Contributor Author

@tsenart please take a look at the requested changes.
Sorry for the late reply...

go.sum Outdated
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/objx v0.1.1/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs=
github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flaviostutz: Why is this still here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed all dependencies to testify in prom lib, but in go.sum there is an existing testify dependency in master already and I don't know where it comes from. go.mod doesn't declare it. probably an indirect dependency somewhere (but it already exists before my PR)

@@ -803,6 +807,62 @@ $ ulimit -u # processes / threads

Just pass a new number as the argument to change it.

## Prometheus Support

Vegeta has a built-in Prometheus Exporter that may be enabled during "attacks" so that you can point any Prometheus instance to Vegeta instances and get some metrics about http requests performance and about the Vegeta process itself.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I like this for a lot reasons, I think vegeta should be exporting metrics to a Prometheus push gateway rather than being scraped directly by prometheus.

Since vegeta isn't a long lived process you'll have race conditions where Prometheus might not scrape the last bit of attack data before vegeta shuts down.

Prometheus best practices docs say you should use push gateway for "service-level batch jobs" which is what I think vegeta would qualify as:

Usually, the only valid use case for the Pushgateway is for capturing the outcome of a service-level batch job. A "service-level" batch job is one which is not semantically related to a specific machine or job instance (for example, a batch job that deletes a number of users for an entire service). Such a job's metrics should not include a machine or instance label to decouple the lifecycle of specific machines or instances from the pushed metrics. This decreases the burden for managing stale metrics in the Pushgateway. See also the best practices for monitoring batch jobs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bring this up, bc I think it should instead have, or at least additionally offer, a report type of prom which will then export metrics from a report to a prometheus push gateway.

I am currently writing that for datadog rather than Prometheus but the idea is the same. The advantage of the report is that one could take an existing Vegeta test result and push into a metrics store rather than having to run a new test.

And the user could publish the results to a Prometheus and a DataDog and wherever else they needed with the reporting decoupled from the attacking.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: When -prometheus-addr is set and we start an HTTP server for prometheus to scrape, make sure that all metrics are scrapped before exiting from the program, even after the attack finished per se.

@ghost
Copy link

ghost commented Dec 3, 2020

Hi guys,

Hope you are all well !

Just wanted to know the current status of this PR as it sounds awesome for load testing any webapp.

What is missing to merge it ?

Cheers,
Luc

@leosunmo
Copy link

leosunmo commented Dec 16, 2020

As someone who uses Vegeta as a library for testing/benchmarking/loadtesting, this is fantastic! Having to manually implement OpenCensus middleware for HTTP is a bit of a pain.

Just a note, is there a way to customise the latency histogram buckets? It looked like they were hardcoded to me, and since requestSecondsHistogram is not exposed I can't change it.

I can think of two immediate ways of making it customisable. Exposing the prom metrics types to the user, or enabling the user to set it as a "param" upon creating the PrometheusMetrics using NewPrometheusMetricsWithParams.

@dntosas
Copy link

dntosas commented Jan 23, 2021

hola people!

any news on this one? seems this functionality is widely needed and a lot of thanks for your nice job cc @flaviostutz @tsenart

@spacez320
Copy link

@hfuss @flaviostutz I'm wondering what the state of the PR is and if it's still possible to implement Prometheus support.

I agree that normally an ad-hoc job like a Vegeta process could benefit from just delivering results to Pushgateway. I also think creating an exporter could be useful if you're going to run Vegeta for a long time and want a simpler integration.

I'm preparing to use Vegeta for some long-running (hours-days) load-testing jobs and wondered if I could help at all with Prometheus integration one way or the other.

  • Do we think it's still possible to create the exporter?
  • Should we instead (or also) make it easy to transmit to Pushgateway via vegeta report or something?

@daluu
Copy link

daluu commented Jul 11, 2023

Depending on use case, I would advise against the PushGateway approach except as a last resort. In my efforts to use it, observed that metrics over the push gateway have no TTL and persist forever, and the metric's value only updates if you send new values over time with the exact same labels/dimensions. If the labels change over time (different instance, host, etc.), you'll just get new metrics instead and both old and new metrics persist in the system over time.

So when say viewing the metrics in grafana, you have a never ending set of timeseries, rather than timeseries with filled lines for whenever metrics are sent to gateway, and missing data on chart for intervals where no data is sent to push gateway. I wanted and expected the latter but got the former. For ad-hoc and periodic load testing, I'd assume people would want the latter. The former is more suitable if say you ran vegeta for monitoring of a test instance that never changes, same service you are monitoring over time.

For what I want to do, if you go with push gateway, you may need something like this forked version instead that offers TTL to the push gateway metrics: https://github.com/dinumathai/pushgateway

So I think the exporter route is good, I think it doesn't have the TTL issue that the push gateway has.

Another option to consider for alternative to exporter and push gateway, is the Prometheus remote write protocol/feature, as a way to push metrics to prometheus. We're looking into and testing the remote write out at our organization, I'm not aware of the specific implementation or how well it's performing at the moment. But I wouldn't know for case of vegeta, which would be more optimal from a performance standpoint since vegeta needs to both generate the load (test) and also expose or push the metrics. https://last9.io/blog/what-is-prometheus-remote-write/

@tsenart
Copy link
Owner

tsenart commented Jul 11, 2023

@daluu: Makes sense. I think the Prometheus exporter should then be implemented as a vegeta sub command, like vegeta prom-export, which you'd use with vegeta attack ... | tee results.bin | vegeta prom-export

@flaviostutz
Copy link
Contributor Author

As discussed in #477 we decided to instrument attack because then we don't need to parse stdin and it would be simpler/more performant.

If we decide to change this at this point of the PR actually we should cancel it and start a new branch because a lot of things will be different.

I would recommend us to finish this PR with the initial requirements because it's almost there and if necessary in the future we discuss better about creating a reporter/push gateway/direct write version of it.

What do you think?

@tsenart
Copy link
Owner

tsenart commented Jul 19, 2023

Ah, I had forgotten that discussion. It was a long time ago!

Thinking about it now, having the attack command expose a web server handler that a Prometheus instance can scrape is good for performance and reducing moving pieces in distributed attacks.

But for interactive use and debugging past attacks I think we should still have a sub command that exporta saved results as Prometheus metrics.

So, the answer is I think we need both, and they should share as much code as possible. The only difference between doing it in attack and doing it in another sub-command is that in attack we can observe the metric without decoding the result first.

I suggest we introduce a lib/prom package where all of this is encapsulated and modularized.

Also, very much up for hacking on this together. So let me know how you'd want to go about it.

@flaviostutz
Copy link
Contributor Author

Ah, I had forgotten that discussion. It was a long time ago!

Thinking about it now, having the attack command expose a web server handler that a Prometheus instance can scrape is good for performance and reducing moving pieces in distributed attacks.

But for interactive use and debugging past attacks I think we should still have a sub command that exporta saved results as Prometheus metrics.

So, the answer is I think we need both, and they should share as much code as possible. The only difference between doing it in attack and doing it in another sub-command is that in attack we can observe the metric without decoding the result first.

I suggest we introduce a lib/prom package where all of this is encapsulated and modularized.

Also, very much up for hacking on this together. So let me know how you'd want to go about it.

This PR is already creating a lib/prom package. I just reviewed and fixed all comments that were open (after 3y lol!). Please do another review round and mark comments as resolved if they are ok.

@flaviostutz flaviostutz force-pushed the issue/#477-promlib branch 2 times, most recently from 44c66f3 to 7962a87 Compare July 22, 2023 16:00
@flaviostutz
Copy link
Contributor Author

I squashed the messy commits (various upstream merges and small changes) from my branch into one and applied the PGP signatures so you can merge it to master 😁

Copy link
Owner

@tsenart tsenart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for moving this forward! Out of all my comments, let me know what you want to work on. I'm happy to address my own feedback on top of your changes. I'll also implement the prom sub-command that I mentioned.

docker-compose.yml Outdated Show resolved Hide resolved
grafana.json Outdated Show resolved Hide resolved
lib/prom/prom.go Outdated Show resolved Hide resolved
lib/prom/prom.go Outdated Show resolved Hide resolved
lib/prom/prom.go Outdated Show resolved Hide resolved
lib/prom/prom.go Outdated Show resolved Hide resolved
lib/prom/prom.go Outdated Show resolved Hide resolved
prometheus-sample.png Outdated Show resolved Hide resolved
lib/results_test.go Outdated Show resolved Hide resolved
lib/prom/prom_test.go Outdated Show resolved Hide resolved
…es with P90; minor changes

Signed-off-by: Flávio Stutz <[email protected]>
@flaviostutz
Copy link
Contributor Author

@tsenart Pushed everything I changed here. In summary I tried to resolve all comments, so please do another round of checks.

As this PR is too big already, I would advise us to merge it as soon as possible and build the "prom" command in another one.

tsenart pushed a commit that referenced this pull request Jul 24, 2023
Closes #477, #534

Signed-off-by: Flávio Stutz <[email protected]>
Signed-off-by: Tomás Senart <[email protected]>
tsenart added a commit that referenced this pull request Jul 24, 2023
Signed-off-by: Tomás Senart <[email protected]>
@tsenart
Copy link
Owner

tsenart commented Jul 24, 2023

Thank you for your great effort on this <3 Landed in 81403a6. Opened #637 which I'll work on next!

@tsenart tsenart closed this Jul 24, 2023
@fasibio
Copy link

fasibio commented Feb 28, 2024

@daluu #534 (comment)
to this point:
have no TTL and persist forever, and the metric's value only updates if you send new values over time with the exact same labels/dimensions

As Info I use a PushGateway and delete at the end all metrics based by vegta so i see no problem with pushgateway:

import (
	"log"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/push"
	vegeta "github.com/tsenart/vegeta/v12/lib"
	"github.com/tsenart/vegeta/v12/lib/prom"
)


type PushGatewayRegister struct {
	Pusher *push.Pusher
}

// MustRegister implements prometheus.Registerer.
func (p *PushGatewayRegister) MustRegister(c ...prometheus.Collector) {
	for _, v := range c {
		p.Pusher = p.Pusher.Collector(v)
	}
}

// Register implements prometheus.Registerer.
func (p *PushGatewayRegister) Register(c prometheus.Collector) error {
	p.Pusher = p.Pusher.Collector(c)
	return nil
}

// Unregister implements prometheus.Registerer.
func (p *PushGatewayRegister) Unregister(c prometheus.Collector) bool {
	return true
}


func LoadTest() {
      metrics := prom.NewMetrics()

	pusher := push.New("url", "job")
	_ := metrics.Register(&PushGatewayRegister{Pusher: pusher})
       for res := range attacker.Attack(targeter.Targeter, rate, duration, "Big Bang!") {
		metrics.Observe(res)
		err = pusher.Push()
		if err != nil {
			log.Println(fmt.Errorf("push error %w", err))
		}
	}
	_ = pusher.Delete() <- thats the point... all pusher created metrics will be removed from pushgateway
}

but change my mind

@daluu
Copy link

daluu commented Feb 28, 2024

@fasibio

As Info I use a PushGateway and delete at the end all metrics based by vegta so i see no problem with pushgateway:

As long as that works out in the end. Then yes should be fine. I would advise to please do test that this works as anticipated, by sending some metrics, then stop sending anymore for some time e.g. over 15 minutes to an hour or more, and verifying that grafana (or equivalent data viewer) properly renders the metric as discrete data points for when metrics were sent rather than an extrapolated line that continues through current time even though no more metrics were sent.

Expecting how something works in theory is different from actually validating it with some testing. So as long as this special handling in push gateway has been confirmed to work as expected with testing, then sounds good.

Note, maybe this approach works better when you have more access to interface to the push gateway, the library/interface we were using at the time, I'm not aware if it had a way to "delete" metrics on the push gateway, we only sent metrics to it.

but change my mind

I'm not sure what is meant here, it is a little vague for interpretation. Did you mean to say, unless someone can convince you to change your mind otherwise, pushgateway works fine for you? Or did you mean despite what you mentioned, you have decided to change your mind about using push gateway approach?

@fasibio
Copy link

fasibio commented Mar 6, 2024

@daluu
I use "github.com/prometheus/client_golang/prometheus/push" to handle pusher.Delete() at the end of the test.
And so he remove all created Metrics by the same pusher object.

My Grafana graph looks like this:
image

As you see only at the time of attack there are data. So i think to advise against pushgateway (see readme) is incorrect.

@daluu
Copy link

daluu commented Mar 6, 2024

So i think to advise against pushgateway (see readme) is incorrect.

Yes, makes sense, I take back my prior advice, but with the caveat/warning that provided the user has properly utilized the push gateway logic. Because if you omit the delete step at the end, I believe you will run into the concern I previously mentioned, one can try to confirm it. So if we're building a solution and documentation here, need to account for that to ensure proper successful deployment.

@fasibio, curious what made you issue delete at the end of pushing metrics? Somehow you were aware of the need for this (or the issue when you don't delete), or you came across it from trial & error, or found it documented somewhere? Because unless I overlooked the documentation/example code, as far as I can recall I don't recall seeing the documentation or example code indicating to user to issue a delete after pushing out metrics. It's not so intuitive to me how the push gateway was designed, as one would think the push (and pull) model is discrete - you send/poll data, you get data. when no push or poll is occurring, then there should be no data - but the push gateway still holds on to it for continuous forwarding to prometheus if you don't clear it out specifically when the metric values don't change.

@fasibio
Copy link

fasibio commented Mar 21, 2024

@daluu Simple I follow the pushgateway "Use it" (no go specific) and there the CURL delete command is part of.
https://github.com/prometheus/pushgateway?tab=readme-ov-file#use-it .
And that is all.

@tsenart see discussion, might make sense to update readme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants