Skip to content

Commit

Permalink
feat (Telemetry) Use Summary to record percentile metrics of time (#101)
Browse files Browse the repository at this point in the history
* Refactor tags to be vararg

* Use summary to record time percentile metrics

* Adding README.md

* linting

* Update readme

* readme

* more readme

* formatting

* linting

* formatting

* Update telemetry/README.md

Co-authored-by: Cal Bera <[email protected]>
Signed-off-by: gordonbear <[email protected]>

* Update telemetry/README.md

Co-authored-by: Cal Bera <[email protected]>
Signed-off-by: gordonbear <[email protected]>

* Update telemetry/README.md

Co-authored-by: Cal Bera <[email protected]>
Signed-off-by: gordonbear <[email protected]>

* update readme

* Add TODO

---------

Signed-off-by: gordonbear <[email protected]>
Co-authored-by: Gordon <[email protected]>
Co-authored-by: Cal Bera <[email protected]>
  • Loading branch information
3 people authored Jun 18, 2024
1 parent 0f9d5cd commit 058a314
Show file tree
Hide file tree
Showing 7 changed files with 160 additions and 73 deletions.
75 changes: 75 additions & 0 deletions telemetry/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Telemetry

The metrics utility for offchain-sdk.

[types.go](./types.go) defines the interface for the supported metrics methods.

By specifying the configuration, the metrics can be emitted via Datadog and/or Prometheus.
Please see the following subsections for detailed configurations.

## Datadog

### Configuration

The first step is adding a section in your config file. See following subsection for details. The
source code defining those configs can be found in [config.go](./datadog/config.go).

#### Datadog Configs

* `Enabled`: Set to `true` to enable metrics emission to Datadog.

* `StatsdAddr`: The address of the Datadog StatsD client. This is needed if the metrics should be
emitted from Datadog.

* `Namespace`: This will appear as the `Namespace` tag in Datadog.

### Datadog Methods

[metrics.go](./datadog/metrics.go) implements the Datadog version of the supported metrics methods
defined in [types.go](./types.go). All implementations are simple wrappers around the native methods
provided by the Datadog `statsd` client.

## Prometheus

### Configuration

The first step is to add a section in your config file. The source code defining these configs can
be found in [config.go](./prometheus/config.go).

#### Prometheus Configs

* `Enabled`: Set to true to enable metrics emission to Prometheus.

* `Namespace` and `Subsystem`: These fields will be added as prefixes to the metrics name.
For example, if `Namespace` is `app` and `Subsystem` is `api`, then the full metrics name of
`request_success` will be `app_api_request_success`.

* `HistogramBucketCount`: The number of buckets used for Histogram typed metrics. Default is 10.
* Note: Each bucket represents an observation that Prometheus scrapes. Therefore, it's recommended
to keep the number of buckets within a manageable scale, typically in the tens.

### Prometheus Methods

Different from Datadog, Prometheus only provides
[4 basic metrics type](https://prometheus.io/docs/concepts/metric_types/). As a result,
[metrics.go](./prometheus/metrics.go) implements the metrics methods defined in [type.go](./type.go)
using these four basic Prometheus metrics. The following subsection documents the methods with
implementation notes. For more information on the four basic Prometheus metrics, please see
[here](https://prometheus.io/docs/tutorials/understanding_metric_types/).

* `Gauge`: This method wraps the `Gauge` metrics of Prometheus.

* `Decr` and `Incr`: Implemented using the `Gauge` metrics of Prometheus.

* `Count`: This method wraps the `Count` metrics of Prometheus. Note that after deployment or instance
restart, `Count` will reset to 0. This is by design in Prometheus.

* `IncMonotonic` and `Error`: Implemented using the `Count` metrics of Prometheus.

* `Histogram`: This method wraps the `Histogram` metrics of Prometheus with linear buckets.
* Note: The maximum value covered is determined by the product of BucketCount and the rate
parameter.
* TODO: Support different types of buckets beyond linear buckets in future implementations.

* `Time` and `Latency`: Implemented using the `Summary` metrics of Prometheus, with pre-defined
quantile observations: p50, p90, and p99.
22 changes: 11 additions & 11 deletions telemetry/datadog/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,59 +38,59 @@ func (m *metrics) Close() error {
return m.client.Close()
}

func (m *metrics) Gauge(name string, value float64, tags []string, rate float64) {
func (m *metrics) Gauge(name string, value float64, rate float64, tags ...string) {
if !m.enabled {
return
}
//#nosec:G104 // handled by m.client.Gauge()
m.client.Gauge(name, value, tags, rate) //nolint:errcheck // handled by m.client.Gauge()
}

func (m *metrics) Count(name string, value int64, tags []string) {
func (m *metrics) Count(name string, value int64, tags ...string) {
if !m.enabled {
return
}
//#nosec:G104 // handled by m.client.Count()
m.client.Count(name, value, tags, 1) //nolint:errcheck // handled by m.client.Count()
}

func (m *metrics) IncMonotonic(name string, tags []string) {
m.Incr(name, tags)
func (m *metrics) IncMonotonic(name string, tags ...string) {
m.Incr(name, tags...)
}

func (m *metrics) Incr(name string, tags []string) {
func (m *metrics) Incr(name string, tags ...string) {
if !m.enabled {
return
}
//#nosec:G104 // handled by m.client.Incr()
m.client.Incr(name, tags, 1) //nolint:errcheck // handled by m.client.Incr()
}

func (m *metrics) Decr(name string, tags []string) {
func (m *metrics) Decr(name string, tags ...string) {
if !m.enabled {
return
}
//#nosec:G104 // handled by m.client.Decr()
m.client.Decr(name, tags, 1) //nolint:errcheck // handled by m.client.Decr()
}

func (m *metrics) Set(name string, value string, tags []string) {
func (m *metrics) Set(name string, value string, tags ...string) {
if !m.enabled {
return
}
//#nosec:G104 // handled by m.client.Set()
m.client.Set(name, value, tags, 1) //nolint:errcheck // handled by m.client.Set()
}

func (m *metrics) Histogram(name string, value float64, tags []string, rate float64) {
func (m *metrics) Histogram(name string, value float64, rate float64, tags ...string) {
if !m.enabled {
return
}
//#nosec:G104 // handled by m.client.Histogram()
m.client.Histogram(name, value, tags, rate) //nolint:errcheck // handled by m.client.Histogram()
}

func (m *metrics) Time(name string, value time.Duration, tags []string) {
func (m *metrics) Time(name string, value time.Duration, tags ...string) {
if !m.enabled {
return
}
Expand All @@ -99,10 +99,10 @@ func (m *metrics) Time(name string, value time.Duration, tags []string) {
}

func (m *metrics) Error(errName string) {
m.Incr("stats.errors", []string{fmt.Sprintf("type:%s", errName)})
m.Incr("stats.errors", fmt.Sprintf("type:%s", errName))
}

// Latency is a helper function to measure the latency of a routine.
func (m *metrics) Latency(jobName string, start time.Time, tags ...string) {
m.Time("stats.latency", time.Since(start), append(tags, fmt.Sprintf("job:%s", jobName)))
m.Time("stats.latency", time.Since(start), append(tags, fmt.Sprintf("job:%s", jobName))...)
}
12 changes: 6 additions & 6 deletions telemetry/handler_wrapper.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,13 @@ func WrapHTTPHandler(m Metrics, log log.Logger) func(http.Handler) http.Handler
metricsTags := getHTTPRequestTags(r)

// Increment request count metric under `request.count`
m.IncMonotonic("request.count", metricsTags)
m.IncMonotonic("request.count", metricsTags...)

start := time.Now()
next.ServeHTTP(customWriter, r)

// Record latency metric under `response.latency`
m.Time("request.latency", time.Since(start), metricsTags)
m.Time("request.latency", time.Since(start), metricsTags...)

// Separately record errors under `request.errors`
if customWriter.statusCode >= http.StatusBadRequest {
Expand All @@ -59,7 +59,7 @@ func WrapHTTPHandler(m Metrics, log log.Logger) func(http.Handler) http.Handler
}

metricsTags = append(metricsTags, fmt.Sprintf("code:%d", customWriter.statusCode))
m.IncMonotonic("request.errors", metricsTags)
m.IncMonotonic("request.errors", metricsTags...)
}
})
}
Expand All @@ -73,13 +73,13 @@ func WrapMicroServerHandler(m Metrics, log log.Logger) server.HandlerWrapper {
metricsTags := getMicroRequestTags(req)

// Increment request count metric under `request.count`
m.IncMonotonic("request.count", metricsTags)
m.IncMonotonic("request.count", metricsTags...)

start := time.Now()
err := next(c, req, rsp)

// Record latency metric under `response.latency`
m.Time("request.latency", time.Since(start), metricsTags)
m.Time("request.latency", time.Since(start), metricsTags...)

// Separately record errors under `request.errors`
if err != nil {
Expand All @@ -89,7 +89,7 @@ func WrapMicroServerHandler(m Metrics, log log.Logger) server.HandlerWrapper {
}

metricsTags = append(metricsTags, fmt.Sprintf("code:%s", code.String()))
m.IncMonotonic("request.errors", metricsTags)
m.IncMonotonic("request.errors", metricsTags...)
}

return err
Expand Down
42 changes: 21 additions & 21 deletions telemetry/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,48 +42,48 @@ type metrics struct {
prometheus Metrics
}

func (m *metrics) Gauge(name string, value float64, tags []string, rate float64) {
func (m *metrics) Gauge(name string, value float64, rate float64, tags ...string) {
if m.datadog != nil {
m.datadog.Gauge(name, value, tags, rate)
m.datadog.Gauge(name, value, rate, tags...)
}
if m.prometheus != nil {
m.prometheus.Gauge(name, value, tags, rate)
m.prometheus.Gauge(name, value, rate, tags...)
}
}

func (m *metrics) Incr(name string, tags []string) {
func (m *metrics) Incr(name string, tags ...string) {
if m.datadog != nil {
m.datadog.Incr(name, tags)
m.datadog.Incr(name, tags...)
}
if m.prometheus != nil {
m.prometheus.Incr(name, tags)
m.prometheus.Incr(name, tags...)
}
}

func (m *metrics) Decr(name string, tags []string) {
func (m *metrics) Decr(name string, tags ...string) {
if m.datadog != nil {
m.datadog.Decr(name, tags)
m.datadog.Decr(name, tags...)
}
if m.prometheus != nil {
m.prometheus.Decr(name, tags)
m.prometheus.Decr(name, tags...)
}
}

func (m *metrics) Count(name string, value int64, tags []string) {
func (m *metrics) Count(name string, value int64, tags ...string) {
if m.datadog != nil {
m.datadog.Count(name, value, tags)
m.datadog.Count(name, value, tags...)
}
if m.prometheus != nil {
m.prometheus.Count(name, value, tags)
m.prometheus.Count(name, value, tags...)
}
}

func (m *metrics) IncMonotonic(name string, tags []string) {
func (m *metrics) IncMonotonic(name string, tags ...string) {
if m.datadog != nil {
m.datadog.IncMonotonic(name, tags)
m.datadog.IncMonotonic(name, tags...)
}
if m.prometheus != nil {
m.prometheus.IncMonotonic(name, tags)
m.prometheus.IncMonotonic(name, tags...)
}
}

Expand All @@ -96,21 +96,21 @@ func (m *metrics) Error(errName string) {
}
}

func (m *metrics) Histogram(name string, value float64, tags []string, rate float64) {
func (m *metrics) Histogram(name string, value float64, rate float64, tags ...string) {
if m.datadog != nil {
m.datadog.Histogram(name, value, tags, rate)
m.datadog.Histogram(name, value, rate, tags...)
}
if m.prometheus != nil {
m.prometheus.Histogram(name, value, tags, rate)
m.prometheus.Histogram(name, value, rate, tags...)
}
}

func (m *metrics) Time(name string, value time.Duration, tags []string) {
func (m *metrics) Time(name string, value time.Duration, tags ...string) {
if m.datadog != nil {
m.datadog.Time(name, value, tags)
m.datadog.Time(name, value, tags...)
}
if m.prometheus != nil {
m.prometheus.Time(name, value, tags)
m.prometheus.Time(name, value, tags...)
}
}

Expand Down
9 changes: 3 additions & 6 deletions telemetry/prometheus/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,15 @@ import (
)

const (
// Default bucket count 1000 can satisfy the precision of p99 for most histogram stats.
DefaultBucketCount = 1000
// Default bucket count for histogram metrics.
DefaultBucketCount = 10
)

type Config struct {
Enabled bool
Namespace string // optional
Subsystem string // optional
HistogramBucketCount int // Number of buckets for histogram, default to 1000
// Number of buckets for time buckets, default to 1000.
// The bucket size is 0.01s(10ms), so the maximum covered time range is 10ms * TimeBucketCount.
TimeBucketCount int
HistogramBucketCount int // Number of linear buckets for histogram, default to 10
}

func (c *Config) Validate() error {
Expand Down
Loading

0 comments on commit 058a314

Please sign in to comment.