Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add system uptime metric #648

Open
andrzej-stencel opened this issue Sep 20, 2022 · 13 comments
Open

Add system uptime metric #648

andrzej-stencel opened this issue Sep 20, 2022 · 13 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@andrzej-stencel
Copy link
Member

What are you trying to achieve?

I want to add a metric to the semantic conventions that will describe the system uptime. How about system.uptime?

Additional context.

This is reported by Telegraf as uptime field of the system metric (in seconds).

Here's a related proposal on the hostmetrics receiver to add this metric: open-telemetry/opentelemetry-collector-contrib#14130.

@tigrannajaryan
Copy link
Member

Is this duplicate of open-telemetry/opentelemetry-specification#1273 ?

@andrzej-stencel
Copy link
Member Author

andrzej-stencel commented Sep 20, 2022

Thanks Dan Tigran 🤦, didn't see this issue. It is closely related. It talks about process namespace and not system, but I think the discussion can be applied to system too. If I understand correctly, (at least from the perspective of this issue) it boils down to adding an attribute process.start_time and system.start_time.

In fact, I can see there's a process namespace for attributes, but I cannot see a system namespace for attributes - only an os namespace. Would the attribute become os.start_time then?

Also when running the OT collector with the hostmetrics receiver, I cannot see any attributes from the os. namespace being reported (this is of course out of scope of this issue and repository).

@andrzej-stencel
Copy link
Member Author

We discussed this briefly during today's SIG Spec call, let's see where the conversation in open-telemetry/opentelemetry-specification#1273 takes us.

@jamesmoessis
Copy link

I would support system.uptime as a metric that measures the uptime of the system. The process.uptime is a different concern.

system.uptime would be, in the case of linux, which is read from /proc/uptime. Analogous for other operating systems.

@jmacd
Copy link
Contributor

jmacd commented Sep 21, 2022

I support both system.uptime and process.uptime semantic conventions.

@reyang
Copy link
Member

reyang commented Sep 23, 2022

These all make sense, but please pause for now, we are considering refactoring existing semantic conventions. Please come to ongoing discussions. See open-telemetry/opentelemetry-specification#2753.

@mx-psi
Copy link
Member

mx-psi commented Jan 18, 2024

@jsuereth Can we transfer this to the semantic-conventions repository?

@minuk-dev
Copy link

minuk-dev commented Jun 5, 2024

Q. Is there any plan to do it? I'm interested in it.

@mx-psi
Copy link
Member

mx-psi commented Jun 6, 2024

@dmitryax dmitryax added the help wanted Extra attention is needed label Aug 8, 2024
@dmitryax
Copy link
Member

dmitryax commented Aug 8, 2024

Looks like we have an agreement here. Just need someone to submit a PR

@tigrannajaryan
Copy link
Member

Is there any way we can generalize this to be an "uptime" of any entity, not just of "system"? What if we make this an uptime metric with the Resource describing what it is about (e.g. Resource can have "host.name=foo" to indicate that it is an uptime of a host).

@andrzej-stencel
Copy link
Member Author

Is there any way we can generalize this to be an "uptime" of any entity, not just of "system"? What if we make this an uptime metric with the Resource describing what it is about (e.g. Resource can have "host.name=foo" to indicate that it is an uptime of a host).

I suppose we could do it, I wonder what others think.

The uptime attribute name would not be namespaced, unlike system.uptime that is namespaced to system. Looking at the Attributes Registry, it doesn't look like we currently have any non-namespaced attributes in the semantic conventions. Is that true?

@rogercoll
Copy link
Contributor

Is there any way we can generalize this to be an "uptime" of any entity, not just of "system"?

I think this is part of a broader discussion taking place in #1161 (system.uptime vs process.uptime vs container.uptime). The main benefit of using the metric without namespace seems to be dashboards correlation and avoiding deduplication. But it comes at the cost of implying resource attributes to corresponding metrics (not sure if this is possible in semconv), for example, the uptime metric should always be linked to either host.name, process.pid or container.id.

During the System Semantic Conventions SIG (20/06/2024) we agreed on keeping the metrics in namespaces (even if there are duplications) due to:

The potential for minute differences between the meanings of seemingly identical metrics between the different contexts
The namespaces also semantically represent the reporting source, making query scenarios more clear (i.e. "I want all my operating system process metrics" or "I want all my jvm metrics" has a clear separation due to the metrics reported from each source all having their respective namespaces)

@joaopgrassi joaopgrassi removed their assignment Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
10 participants