- Metrics Interfaces and Examples Usage
- Example for Starting a Metric Service
- Metric Registries and Configurations
- Creating Own Metrics
- Metrics Naming Conventions
- Available Metrics and Their Names
- Resources
In the Pravega Metrics Framework, we use Micrometer Metrics as the underlying library, and provide our own API to make it easier to use.
-
StatsProvider
: The Statistics Provider which provides the whole Metric service. -
StatsLogger
: The Statistics Logger is where the required Metrics (Counter/Gauge/Timer/Distribution Summary) are registered. -
OpStatsLogger
: The Operation Statistics Logger is a sub-metric for the complex ones (Timer/Distribution Summary). It is included inStatsLogger
andDynamicLogger
.
Pravega Metric Framework is initiated using the StatsProvider
interface: it provides the start and stop methods for the Metric service. It also provides startWithoutExporting()
for testing purpose, which only stores metrics in memory without exporting them to external systems. Currently we have support for InfluxDB, Prometheus, and StatsD registries.
start()
: Initializes the MetricRegistry and Reporters for our Metric service.startWithoutExporting()
: InitializesSimpleMeterRegistry
that holds the latest value of each Meter in memory and does not export the data anywhere, typically for unit tests.close()
: Shuts down the Metric Service.createStatsLogger()
: Create aStatsLogger
instance which is used to register and return metric objects. Application code could then perform metric operations directly with the returned metric objects.createDynamicLogger()
: Creates a Dynamic Logger.
This interface can be used to register the required metrics for simple types like Counter and Gauge and some complex statistics type of Metric like OpStatsLogger
, through which we provide Timer and
Distribution Summary.
createStats()
: Register and get aOpStatsLogger
, which is used for complex type of metrics. Notice the optional metric tags.createCounter()
: Register and get a Counter Metric.createMeter()
: Create and register a Meter Metric.registerGauge()
: Register a Gauge Metric.createScopeLogger()
: Create theStatsLogger
under the given scope name.
OpStatsLogger
can be used if the user is interested in measuring the latency of operations like CreateSegment
and ReadSegment
. Further, we could use it to record the number of operation and time/duration of each operation.
reportSuccessEvent()
: Used to track the Timer of a successful operation and will record the latency in nanoseconds in the required metric.reportFailEvent()
: Used to track the Timer of a failed operation and will record the latency in nanoseconds in required metric.reportSuccessValue()
: Used to track the Histogram of a success value.reportFailValue()
: Used to track the Histogram of a failed value.toOpStatsData()
: Used to support the JMX Reporters and unit tests.clear
: Used to clear the stats for this operation.
The following is an example of a simple interface that exposes only the simple type metrics: (Counter/Gauge/Meter).
incCounterValue()
: Increases the Counter with the given value. Notice the optional metric tags.updateCounterValue()
: Updates the Counter with the given value.freezeCounter()
: Notifies that, the Counter will not be updated.reportGaugeValue()
: Reports the Gauge value.freezeGaugeValue()
: Notifies that, the Gauge value will not be updated.recordMeterEvents()
: Records the occurrences of a given number of events in Meter.
This example is from io.pravega.segmentstore.server.host.ServiceStarter
. The code for this example can be found here. It starts Pravega Segment Store service and the Metrics Service is started as a sub service.
This is an example from io.pravega.segmentstore.server.host.stat.SegmentStatsRecorderImpl.java
. The code for this example can be found here. In the class PravegaRequestProcessor
, we have registered two metrics:
- one Timer (
createStreamSegment
) - one dynamic counter (
dynamicLogger
)
From the above example, we can see the required steps to register and use dynamic counter:
- Get a dynamic logger from MetricsProvider:
DynamicLogger dynamicLogger = MetricsProvider.getDynamicLogger();
- Increase the counter by providing metric base name and optional tags associated with the metric.
DynamicLogger dl = getDynamicLogger(); dl.incCounterValue(globalMetricName(SEGMENT_WRITE_BYTES), dataLength); ... dl.incCounterValue(SEGMENT_WRITE_BYTES, dataLength, segmentTags(streamSegmentName));
Here SEGMENT_WRITE_BYTES
is the base name of the metric. Below are the two metrics associated with it:
- The global Counter which has no tags associated.
- A Segment specific Counter which has a list of Segment tags associated.
Note that, the segmentTags
is a method to generate tags based on fully qualified Segment name.
The following are the required steps to register and use OpStatsLogger(Timer)
:
-
Get a
StatsLogger
fromMetricsProvider
.StatsLogger STATS_LOGGER = MetricsProvider.getStatsLogger("segmentstore");
-
Register all the desired metrics through
StatsLogger
.@Getter(AccessLevel.PROTECTED) final OpStatsLogger createStreamSegment = STATS_LOGGER.createStats(SEGMENT_CREATE_LATENCY);
-
Use these metrics within code at the appropriate places where the values should be collected and recorded.
getCreateStreamSegment().reportSuccessEvent(elapsed);
Here SEGMENT_CREATE_LATENCY
is the name of this metric, and createStreamSegment
is the metric object, which tracks operations of createSegment
and we will get the time (i.e. time taken by each operation and other numbers computed based on them) for each createSegment
operation happened.
This is an example from io.pravega.controller.metrics.StreamMetrics
. In this class, we report
a Dynamic Gauge which represents the open Transactions of a Stream. The code for this example can be found here.
This is an example from io.pravega.segmentstore.server.SegmentStoreMetrics
. The code for this example can be found here. In the class SegmentStoreMetrics
, we report a Dynamic Meter which represents the Segments created with a particular container.
With Micrometer, each meter registry is responsible for both storage and exporting of metrics objects.
In order to have a unified interface, Micrometer provides the CompositeMeterRegistry
for the application to interact with, CompositeMeterRegistry
will forward metric operations to all the concrete registries bounded to it.
Note that when metrics service start()
, initially only a global registry (of type CompositeMeterRegistry
) is provided, which will bind concrete registries (e.g. statsD, Influxdb) based on the configurations. If no registry is switched on in config
, metrics service throws error to prevent the global registry runs into no-op mode.
Mainly for testing purpose, metrics service can also startWithoutExporting()
, where a SimpleMeterRegistry
is bound to the global registry. SimpleMeterRegistry
holds memory only storage but does not export metrics, makes it ideal for tests to verify metrics objects.
Currently Pravega supports the following:
- StatsD registry in
Telegraf
flavor. - Dimensional metrics data model (or metric tags).
- UDP as Communication protocol.
- Direct InfluxDB connection.
The reporter could be configured using the MetricsConfig
. Please refer to the example.
-
When starting a Segment Store/Controller Service, start a Metric Service as a sub service. Please check
ServiceStarter.start()
public class AddMetrics { MetricsProvider.initialize(Config.METRICS_CONFIG); statsProvider.start(metricsConfig); statsProvider = MetricsProvider.getMetricsProvider(); statsProvider.start();
-
Create a new
StatsLogger
instance through theMetricsProvider.createStatsLogger(String loggerName)
, and register metric using name, e.g.STATS_LOGGER.createCounter(String name)
; and then update the metric object as appropriately in the code.static final StatsLogger STATS_LOGGER = MetricsProvider.getStatsLogger(); // <--- 1 DynamicLogger dynamicLogger = MetricsProvider.getDynamicLogger(); static class Metrics { // < --- 2 //Using Stats Logger static final String CREATE_STREAM = "stream_created"; static final OpStatsLogger CREATE_STREAM = STATS_LOGGER.createStats(CREATE_STREAM); static final String SEGMENT_CREATE_LATENCY = "segmentstore.segment.create_latency_ms"; static final OpStatsLogger createStreamSegment = STATS_LOGGER.createStats(SEGMENT_CREATE_LATENCY); //Using Dynamic Logger static final String SEGMENT_READ_BYTES = "segmentstore.segment.read_bytes"; //Dynamic Counter static final String OPEN_TRANSACTIONS = "controller.transactions.opened"; //Dynamic Gauge ... } //to report success or increment Metrics.CREATE_STREAM.reportSuccessValue(1); // < --- 3 Metrics.createStreamSegment.reportSuccessEvent(timer.getElapsed()); dynamicLogger.incCounterValue(Metrics.SEGMENT_READ_BYTES, 1); dynamicLogger.reportGaugeValue(OPEN_TRANSACTIONS, 0); //in case of failure Metrics.CREATE_STREAM.reportFailValue(1); Metrics.createStreamSegment.reportFailEvent(timer.getElapsed()); //to freeze dynamicLogger.freezeCounter(Metrics.SEGMENT_READ_BYTES); dynamicLogger.freezeGaugeValue(OPEN_TRANSACTIONS); }
All metric names are in the following format:
Metrics Prefix + Component Origin + Sub-Component (or Abstraction) + Metric Base Name
-
Metric Prefix: By default
pravega
is configurable. -
Component Origin: Indicates which component generates the metric, such as
segmentstore
,controller
. -
Sub-Component (or Abstraction): Indicates the second level component or abstraction, such as
cache
,transaction
,storage
. -
Metric Base Name: Indicates the
read_latency_ms
,create_count
.
For example:
pravega.segmentstore.segment.create_latency_ms
Following are some common combinations of component and sub-components (or abstractions) being used:
segmentstore.segment
: Metrics for individual Segmentssegmentstore.storage
: Metrics related to long-term storage (Tier 2)segmentstore.bookkeeper
: Metrics related to Bookkeeper (Tier 1)segmentstore.container
: Metrics for Segment Containerssegmentstore.thread_pool
: Metrics for Segment Store thread poolsegmentstore.cache
: Cache-related metricscontroller.stream
: Metrics for operations on Streams (e.g., number of streams created)controller.segments
: Metrics about Segments, per Stream (e.g., count, splits, merges)controller.transactions
: Metrics related to Transactions (e.g., created, committed, aborted)controller.retention
: Metrics related to data retention, per Stream (e.g., frequency, size of truncated data)controller.hosts
: Metrics related to Pravega servers in the cluster (e.g., number of servers, failures)controller.container
: Metrics related to Container lifecycle (e.g., failovers)
Following are the two types of metrics:
-
Global Metric:
_global
metrics are reporting global values per component (Segment Store or Controller) instance, and further aggregation logic is needed if looking for Pravega cluster globals. For instance,STORAGE_READ_BYTES
can be classified as a Global metric. -
Object-based Metric: Sometimes, we need to report metrics only based on specific objects, such as Streams or Segments. This kind of metrics use metric name as a base name in the file and are "dynamically" created based on the objects to be measured. For instance, in
CONTAINER_APPEND_COUNT
we actually report multiple metrics, one per eachcontainerId
measured, with different container tag (e.g.["containerId", "3"]
).
There are cases in which we may want both a Global and Object-based versions for the same metric. For example, regarding SEGMENT_READ_BYTES
we publish the Global version of it by adding _global
suffix to the base name
segmentstore.segment.read_bytes_global
to track the globally total number of bytes read, as well as the per-segment version of it by using the same base name and also supplying additional Segment tags to report in a finer granularity the events read per Segment.
segmentstore.segment.read_bytes, ["scope", "...", "stream", "...", "segment", "...", "epoch", "..."])
jvm_gc_live_data_size
jvm_gc_max_data_size
jvm_gc_memory_allocated
jvm_gc_memory_prompted
jvm_gc_pause
jvm_memory_committed
jvm_memory_max
jvm_memory_used
jvm_threads_daemon
jvm_threads_live
jvm_threads_peak
jvm_threads_states
-
Segment Store Read/Write latency of storage operations (Histograms):
segmentstore.segment.create_latency_ms segmentstore.segment.read_latency_ms segmentstore.segment.write_latency_ms
-
Segment Store global and per-segment Read/Write Metrics (Counters):
// Global counters segmentstore.segment.read_bytes_global segmentstore.segment.write_bytes_global segmentstore.segment.write_events_global // Per segment counters - all with tags {"scope", $scope, "stream", $stream, "segment", $segment, "epoch", $epoch} segmentstore.segment.write_bytes segmentstore.segment.read_bytes segmentstore.segment.write_events
-
Segment Store cache Read/Write latency Metrics (Histogram):
segmentstore.cache.insert_latency_ms segmentstore.cache.get_latency
-
Segment Store cache Read/Write Metrics (Counters):
segmentstore.cache.write_bytes segmentstore.cache.read_bytes
-
Segment Store cache size (Gauge) and generation spread (Histogram) Metrics:
segmentstore.cache.size_bytes segmentstore.cache.gen
-
Tier 1 Storage
DurableDataLog
Read/Write latency and queuing Metrics (Histogram):segmentstore.bookkeeper.total_write_latency_ms segmentstore.bookkeeper.write_latency_ms segmentstore.bookkeeper.write_queue_size segmentstore.bookkeeper.write_queue_fill
-
Tier 1 Storage
DurableDataLog
Read/Write (Counter) and per-container ledger count Metrics (Gauge):segmentstore.bookkeeper.write_bytes segmentstore.bookkeeper.bookkeeper_ledger_count - with tag {"container", $containerId}
-
Tier 2 Storage Read/Write latency Metrics (Histogram):
segmentstore.storage.read_latency_ms segmentstore.storage.write_latency_ms
-
Tier 2 Storage Read/Write data and file creation Metrics (Counters):
segmentstore.storage.read_bytes segmentstore.storage.write_bytes segmentstore.storage.create_count
-
Segment Store container-specific operation Metrics:
// Histograms - all with tags {"container", $containerId} segmentstore.container.process_operations.latency_ms segmentstore.container.process_operations.batch_size segmentstore.container.operation_queue.size segmentstore.container.operation_processor.in_flight segmentstore.container.operation_queue.wait_time segmentstore.container.operation_processor.delay_ms segmentstore.container.operation_commit.latency_ms segmentstore.container.operation.latency_ms segmentstore.container.operation_commit.metadata_txn_count segmentstore.container.operation_commit.memory_latency_ms // Gauge segmentstore.container.operation.log_size
-
Segment Store operation processor (Counter) Metrics - all with tags {"container", $containerId}.
// Counters/Meters segmentstore.container.append_count segmentstore.container.append_offset_count segmentstore.container.update_attributes_count segmentstore.container.get_attributes_count segmentstore.container.read_count segmentstore.container.get_info_count segmentstore.container.create_segment_count segmentstore.container.delete_segment_count segmentstore.container.merge_segment_count segmentstore.container.seal_count segmentstore.container.truncate_count
-
Segment Store active Segments (Gauge) and thread pool status (Histogram) Metrics:
// Gauge - with tags {"container", $containerId} segmentstore.active_segments // Histograms segmentstore.thread_pool.queue_size segmentstore.thread_pool.active_threads
-
Controller Stream operation latency Metrics (Histograms):
controller.stream.created_latency_ms controller.stream.sealed_latency_ms controller.stream.deleted_latency_ms controller.stream.updated_latency_ms controller.stream.truncated_latency_ms
-
Controller global and per-Stream operation Metrics (Counters):
controller.stream.created controller.stream.create_failed_global controller.stream.create_failed - with tags {"scope", $scope, "stream", $stream} controller.stream.sealed controller.stream.seal_failed_global controller.stream.seal_failed - with tags {"scope", $scope, "stream", $stream} controller.stream.deleted controller.stream.delete_failed_global controller.stream.delete_failed - with tags {"scope", $scope, "stream", $stream} controller.stream.updated_global controller.stream.updated - with tags {"scope", $scope, "stream", $stream} controller.stream.update_failed_global controller.stream.update_failed - with tags {"scope", $scope, "stream", $stream} controller.stream.truncated_global controller.stream.truncated - with tags {"scope", $scope, "stream", $stream} controller.stream.truncate_failed_global controller.stream.truncate_failed - with tags {"scope", $scope, "stream", $stream}
-
Controller Stream retention frequency (Counter) and truncated size (Gauge) Metrics:
controller.retention.frequency - with tags {"scope", $scope, "stream", $stream} controller.retention.truncated_size - with tags {"scope", $scope, "stream", $stream}
-
Controller Stream Segment operations (Counters) and open/timed out Transactions on a Stream (Gauge) Metrics - all with tags {"scope", $scope, "stream", $stream}:
controller.transactions.opened controller.transactions.timedout controller.segments.count controller.segment.splits controller.segment.merges
-
Controller Transaction operation latency Metrics:
controller.transactions.created_latency_ms controller.transactions.committed_latency_ms controller.transactions.aborted_latency_ms
-
Controller Transaction operation counter Metrics:
controller.transactions.created_global controller.transactions.created - with tags {"scope", $scope, "stream", $stream} controller.transactions.create_failed_global controller.transactions.create_failed - with tags {"scope", $scope, "stream", $stream} controller.transactions.committed_global controller.transactions.committed - with tags {"scope", $scope, "stream", $stream} controller.transactions.commit_failed_global controller.transactions.commit_failed - with tags {"scope", $scope, "stream", $stream} controller.transactions.commit_failed - with tags {"scope", $scope, "stream", $stream, "transaction", $txnId} controller.transactions.aborted_global controller.transactions.aborted - with tags {"scope", $scope, "stream", $stream} controller.transactions.abort_failed_global controller.transactions.abort_failed - with tags {"scope", $scope, "stream", $stream} controller.transactions.abort_failed - with tags {"scope", $scope, "stream", $stream, "transaction", $txnId}
-
Controller hosts available (Gauge) and host failure (Counter) Metrics:
controller.hosts.count controller.hosts.failures_global controller.hosts.failures - with tags {"host", $host}
-
Controller Container count per host (Gauge) and failover (Counter) Metrics:
controller.hosts.container_count controller.container.failovers_global controller.container.failovers - with tags {"container", $containerId}
-
Controller Zookeeper session expiration (Counter) metrics:
controller.zookeeper.session_expiration