Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyze naming conventions for (monotonic) counter metrics #260

Open
trask opened this issue Aug 15, 2023 · 8 comments
Open

Analyze naming conventions for (monotonic) counter metrics #260

trask opened this issue Aug 15, 2023 · 8 comments
Assignees

Comments

@trask
Copy link
Member

trask commented Aug 15, 2023

Related to #211 and #212

Here are existing (monotonic) counter metric names:

  • faas.coldstarts
  • faas.errors
  • faas.invocations
  • faas.timeouts
  • db.client.connection.timeouts
  • messaging.kafka.messages
  • messaging.kafka.requests.failed
  • messaging.kafka.isr.operations
  • messaging.kafka.leader.elections
  • messaging.kafka.leader.unclean-elections
  • hw.energy
  • hw.errors
  • hw.host.energy
  • hw.network.packets
  • hw.tape_drive.operations
  • hw.*.errors
  • process.context_switches
  • process.paging.faults
  • jvm.classes.loaded
  • jvm.classes.unloaded
  • system.paging.faults
  • system.paging.operations
  • system.disk.operations
  • system.network.dropped
  • system.network.packets
  • system.network.errors

.io

  • messaging.kafka.network.io
  • hw.gpu.io
  • hw.network.io
  • process.disk.io
  • process.network.io
  • system.disk.io
  • system.network.io

.time

  • process.cpu.time
  • jvm.cpu.time
  • system.cpu.time

.*_time

  • system.disk.io_time
  • system.disk.operation_time
@trask
Copy link
Member Author

trask commented Aug 15, 2023

An open question for me is whether we want to rename the first group to be something like:

  • faas.coldstart.count
  • faas.error.count
  • faas.invocation.count
  • faas.timeout.count
  • ...

the main(?) advantage of doing this is that it opens up the (implicit) namespaces for metric attributes on these metrics, e.g.

  • faas.coldstart.*
  • faas.error.*
  • faas.invocation.*
  • faas.timeout.*
  • ...

@mateuszrzeszutek
Copy link
Member

An open question for me is whether we want to rename the first group to be something like:

  • faas.coldstart.count
  • faas.error.count
  • faas.invocation.count
  • faas.timeout.count

Aren't we already using the .count suffix for UpDownCounters representing the current state of the system? E.g. jvm.cpu.count, jvm.class.count

@jack-berg
Copy link
Member

Aren't we already using the .count suffix for UpDownCounters representing the current state of the system?

Yes, but this is in line with the current advice for UpDownCounters. Counters don't follow that but IMO it means that naming of counters is inconsistent and lacks guidance.

Metric names follow the general form: {NAMESPACE}.{DESCRIPTIVE_NOUN}

Where {NAMESPACE} hierarchical, delimited by ..

And where {DESCRIPTIVE_NOUN} is a noun describing what values the metric captures, relative to the namespace.

My hope is that we could have a well defined set of {DESCRIPTIVE_NOUN} options that we can pick from for almost all situations. We have the start of that dictionary in the instrument naming section, but there are some notable exceptions. For example, duration is omitted, count is omitted (even tho its recommended for up down counters), and we say the following which makes counters the wild west:

Other instruments that do not fit the above descriptions may be named more freely. For example, system.paging.faults and system.network.packets. Units do not need to be specified in the names since they are included during instrument creation, but can be added if there is ambiguity.

@jack-berg
Copy link
Member

Oh just saw #211 where this is already being discussed.

@trask
Copy link
Member Author

trask commented Aug 16, 2023

yeah, this issue is specifically to explore, as you said above, "the wild west of counter naming"

@trask
Copy link
Member Author

trask commented Aug 17, 2023

a couple more on their way in #163:

  • messaging.publish.messages
  • messaging.receive.messages

@jack-berg
Copy link
Member

jack-berg commented Aug 17, 2023

Any reason these shouldn't all just have a suffix *.count?

I would think the counter argument would be that the prometheus compatibility document states that monotonic sums have the _total suffix added, and obviously *_count_total is weird. But we should be able to adjust that rule to say strip "count", and add "total".

@joaopgrassi
Copy link
Member

This was closed by mistake by the stale bot. Re-opening

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants