Skip to content

Commit

Permalink
Merge branch 'main' into ff-system-back-to-provider-name
Browse files Browse the repository at this point in the history
  • Loading branch information
dyladan authored Nov 27, 2024
2 parents c4b78a5 + 68629b9 commit e23379a
Show file tree
Hide file tree
Showing 14 changed files with 383 additions and 3 deletions.
4 changes: 4 additions & 0 deletions .chloggen/1560.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
change_type: enhancement
component: db
note: Specify how to set span status for database operations.
issues: [1536, 1560]
22 changes: 22 additions & 0 deletions .chloggen/add-cli-spans.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Use this changelog template to create an entry for release notes.
#
# If your change doesn't affect end users you should instead start
# your pull request title with [chore] or use the "Skip Changelog" label.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: 'new_component'

# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db)
component: 'cli'

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Define span describing CLI application execution

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
# The values here must be integers.
issues: [1577]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
22 changes: 22 additions & 0 deletions .chloggen/add_k8s_uptime_metrics.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Use this changelog template to create an entry for release notes.
#
# If your change doesn't affect end users you should instead start
# your pull request title with [chore] or use the "Skip Changelog" label.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db)
component: k8s

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add uptime metrics for container, K8s Pod and K8s Node

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
# The values here must be integers.
issues: [1486]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
4 changes: 4 additions & 0 deletions .chloggen/fix-typo-messaging-schema.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
change_type: 'bug_fix'
component: messaging
note: Fix typo in schemas for messaging attribute changes
issues: [1595]
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Approvers ([@open-telemetry/specs-semconv-approvers](https://github.com/orgs/ope

- [Alexandra Konrad](https://github.com/trisch-me), Elastic
- [Christian Neumüller](https://github.com/Oberon00), Dynatrace
- [Daniel Dyla](https://github.com/dyladan), Dynatrace
- [James Moessis](https://github.com/jamesmoessis), Atlassian
- [Sean Marciniak](https://github.com/MovieStoreGuy), Atlassian
- [Ted Young](https://github.com/tedsuo), Lightstep
Expand Down
122 changes: 122 additions & 0 deletions docs/cli/cli-spans.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
<!--- Hugo front matter used to generate the website version of this page:
linkTitle: CLI
--->

# Semantic Conventions for CLI (Command Line Interface) programs

**Status**: [Experimental][DocumentStatus]

This document defines semantic conventions to apply when instrumenting CLI programs, both as a caller and as callee. This document is intended for short-lived programs that end their execution, i.e. not daemon or long running background tasks.

Span kind SHOULD be `INTERNAL` when the traced program is the callee or `CLIENT` when the caller is tracing another program.

The span name SHOULD be set to `{process.executable.name}`.
Instrumentations that have additional context about executed commands MAY use a different low-cardinality span name format and SHOULD document it.

Span status SHOULD be set to `Error` if `{process.exit.code}` is not 0.

<!-- TODO: context propagation https://github.com/open-telemetry/semantic-conventions/issues/1612 -->

## Execution (callee) spans

<!-- semconv span.cli.internal -->
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
<!-- see templates/registry/markdown/snippet.md.j2 -->
<!-- prettier-ignore-start -->
<!-- markdownlint-capture -->
<!-- markdownlint-disable -->

| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
|---|---|---|---|---|---|
| [`process.executable.name`](/docs/attributes-registry/process.md) | string | The name of the process executable. On Linux based systems, can be set to the `Name` in `proc/[pid]/status`. On Windows, can be set to the base name of `GetProcessImageFileNameW`. | `otelcol` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`process.exit.code`](/docs/attributes-registry/process.md) | int | The exit code of the process. | `127` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`process.pid`](/docs/attributes-registry/process.md) | int | Process identifier (PID). | `1234` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` if and only if process.exit.code is not 0 | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`process.command_args`](/docs/attributes-registry/process.md) | string[] | All the command arguments (including the command/executable itself) as received by the process. On Linux-based systems (and some other Unixoid systems supporting procfs), can be set according to the list of null-delimited strings extracted from `proc/[pid]/cmdline`. For libc-based executables, this would be the full argv vector passed to `main`. | `["cmd/otecol", "--config=config.yaml"]` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`process.executable.path`](/docs/attributes-registry/process.md) | string | The full path to the process executable. On Linux based systems, can be set to the target of `proc/[pid]/exe`. On Windows, can be set to the result of `GetProcessImageFileNameW`. | `/usr/bin/cmd/otelcol` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

**[1] `error.type`:** The `error.type` SHOULD be predictable, and SHOULD have low cardinality.

When `error.type` is set to a type (e.g., an exception type), its
canonical class name identifying the type within the artifact SHOULD be used.

Instrumentations SHOULD document the list of errors they report.

The cardinality of `error.type` within one instrumentation library SHOULD be low.
Telemetry consumers that aggregate data from multiple instrumentation libraries and applications
should be prepared for `error.type` to have high cardinality at query time when no
additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set `error.type`.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes),
it's RECOMMENDED to:

- Use a domain-specific attribute
- Set `error.type` to capture all errors, regardless of whether they are defined within the domain-specific set or not.

---

`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

| Value | Description | Stability |
|---|---|---|
| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |

<!-- markdownlint-restore -->
<!-- prettier-ignore-end -->
<!-- END AUTOGENERATED TEXT -->
<!-- endsemconv -->

## Client (caller) spans

<!-- semconv span.cli.client -->
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
<!-- see templates/registry/markdown/snippet.md.j2 -->
<!-- prettier-ignore-start -->
<!-- markdownlint-capture -->
<!-- markdownlint-disable -->

| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
|---|---|---|---|---|---|
| [`process.executable.name`](/docs/attributes-registry/process.md) | string | The name of the process executable. On Linux based systems, can be set to the `Name` in `proc/[pid]/status`. On Windows, can be set to the base name of `GetProcessImageFileNameW`. | `otelcol` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`process.exit.code`](/docs/attributes-registry/process.md) | int | The exit code of the process. | `127` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`process.pid`](/docs/attributes-registry/process.md) | int | Process identifier (PID). | `1234` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` if and only if process.exit.code is not 0 | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`process.command_args`](/docs/attributes-registry/process.md) | string[] | All the command arguments (including the command/executable itself) as received by the process. On Linux-based systems (and some other Unixoid systems supporting procfs), can be set according to the list of null-delimited strings extracted from `proc/[pid]/cmdline`. For libc-based executables, this would be the full argv vector passed to `main`. | `["cmd/otecol", "--config=config.yaml"]` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`process.executable.path`](/docs/attributes-registry/process.md) | string | The full path to the process executable. On Linux based systems, can be set to the target of `proc/[pid]/exe`. On Windows, can be set to the result of `GetProcessImageFileNameW`. | `/usr/bin/cmd/otelcol` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

**[1] `error.type`:** The `error.type` SHOULD be predictable, and SHOULD have low cardinality.

When `error.type` is set to a type (e.g., an exception type), its
canonical class name identifying the type within the artifact SHOULD be used.

Instrumentations SHOULD document the list of errors they report.

The cardinality of `error.type` within one instrumentation library SHOULD be low.
Telemetry consumers that aggregate data from multiple instrumentation libraries and applications
should be prepared for `error.type` to have high cardinality at query time when no
additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set `error.type`.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes),
it's RECOMMENDED to:

- Use a domain-specific attribute
- Set `error.type` to capture all errors, regardless of whether they are defined within the domain-specific set or not.

---

`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

| Value | Description | Stability |
|---|---|---|
| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |

<!-- markdownlint-restore -->
<!-- prettier-ignore-end -->
<!-- END AUTOGENERATED TEXT -->
<!-- endsemconv -->

[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status
60 changes: 60 additions & 0 deletions docs/database/database-spans.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ linkTitle: Client Calls
<!-- toc -->

- [Name](#name)
- [Status](#status)
- [Recording exception events](#recording-exception-events)
- [Common attributes](#common-attributes)
- [Notes and well-known identifiers for `db.system`](#notes-and-well-known-identifiers-for-dbsystem)
- [Sanitization of `db.query.text`](#sanitization-of-dbquerytext)
Expand Down Expand Up @@ -85,6 +87,62 @@ and SHOULD adhere to one of the following values, provided they are accessible:
If a corresponding `{target}` value is not available for a specific operation, the instrumentation SHOULD omit the `{target}`.
For example, for an operation describing SQL query on an anonymous table like `SELECT * FROM (SELECT * FROM table) t`, span name should be `SELECT`.

## Status

[Span Status Code][SpanStatus] MUST be left unset if the operation has ended without any errors.

Instrumentation SHOULD consider the operation as failed if any of the following is true:

- the `db.response.status_code` value indicates an error

> [!NOTE]
>
> The classification of status code as an error depends on the context.
> For example, a SQL STATE `02000` (`no_data`) indicates an error when the application
> expected the data to be available. However, it is not an error when the
> application is simply checking whether the data exists.
>
> Instrumentations that have additional context about a specific operation MAY use
> this context to set the span status more precisely.
> Instrumentations that don't have any additional context MUST follow the
> guidelines in this section.
- an exception is thrown by the instrumented method call
- the instrumented method returns an error in another way

When the operation ends with an error, instrumentation:

- SHOULD set the span status code to `Error`
- SHOULD set the `error.type` attribute
- SHOULD set the span status description when it has additional information
about the error which is not expected to contain sensitive details and aligns
with [Span Status Description][SpanStatus] definition.

It's NOT RECOMMENDED to duplicate `db.response.status_code` or `error.type`
in span status description.

When the operation fails with an exception, the span status description SHOULD be set to
the exception message.

### Recording exception events

**Status**: [Experimental][DocumentStatus]

When the operation fails with an exception, instrumentation SHOULD record
an [exception event](../exceptions/exceptions-spans.md) by default if, and only if,
the span being recorded is a local root span (does not have a local parent).

> [!NOTE]
>
> Exception stack traces could be very long and are expensive to capture and store.
> Exceptions which are not handled by instrumented libraries are likely to be handled
> and logged by the caller.
> Exceptions that are not handled will be recorded by the outermost (local root)
> instrumentation such as HTTP or gRPC server.
Instrumentation MAY provide a configuration option to record exceptions that
escape the surface of the instrumented API.

## Common attributes

These attributes will usually be the same for all operations performed over the same database connection.
Expand Down Expand Up @@ -309,6 +367,7 @@ The `db.query.summary` attribute captures a shortened representation of a query
which SHOULD have low-cardinality and SHOULD NOT contain any dynamic or sensitive data.

> [!NOTE]
>
> The `db.query.text` attribute is intended to identify individual queries. Even though
> it is sanitized if captured by default, it could still have high cardinality and
> might reach hundreds of lines.
Expand Down Expand Up @@ -418,3 +477,4 @@ More specific Semantic Conventions are defined for the following database techno
* [SQL](sql.md): Semantic Conventions for *SQL* databases.

[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status
[SpanStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.37.0/specification/trace/api.md#set-status
24 changes: 24 additions & 0 deletions docs/system/container-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,29 @@ This document describes instruments and attributes for common container level
metrics in OpenTelemetry. These metrics are collected from technology-specific,
well-defined APIs (e.g. Kubelet's API or container runtimes).

### Metric: `container.uptime`

This metric is [recommended][MetricRecommended].

<!-- semconv metric.container.uptime -->
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
<!-- see templates/registry/markdown/snippet.md.j2 -->
<!-- prettier-ignore-start -->
<!-- markdownlint-capture -->
<!-- markdownlint-disable -->

| Name | Instrument Type | Unit (UCUM) | Description | Stability |
| -------- | --------------- | ----------- | -------------- | --------- |
| `container.uptime` | Gauge | `s` | The time the container has been running [1] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

**[1]:** Instrumentations SHOULD use a gauge with type `double` and measure uptime in seconds as a floating point number with the highest precision available.
The actual accuracy would depend on the instrumentation and operating system.

<!-- markdownlint-restore -->
<!-- prettier-ignore-end -->
<!-- END AUTOGENERATED TEXT -->
<!-- endsemconv -->

### Metric: `container.cpu.time`

This metric is [opt-in][MetricOptIn].
Expand Down Expand Up @@ -198,3 +221,4 @@ This metric is [opt-in][MetricOptIn].

[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status
[MetricOptIn]: /docs/general/metric-requirement-level.md#opt-in
[MetricRecommended]: /docs/general/metric-requirement-level.md#recommended
46 changes: 46 additions & 0 deletions docs/system/k8s-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,29 @@ well-defined APIs (e.g. Kubelet's API).
Metrics in `k8s.` instruments SHOULD be attached to a [K8s Resource](/docs/resource/k8s.md)
and therefore inherit its attributes, like `k8s.pod.name` and `k8s.pod.uid`.

### Metric: `k8s.pod.uptime`

This metric is [recommended][MetricRecommended].

<!-- semconv metric.k8s.pod.uptime -->
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
<!-- see templates/registry/markdown/snippet.md.j2 -->
<!-- prettier-ignore-start -->
<!-- markdownlint-capture -->
<!-- markdownlint-disable -->

| Name | Instrument Type | Unit (UCUM) | Description | Stability |
| -------- | --------------- | ----------- | -------------- | --------- |
| `k8s.pod.uptime` | Gauge | `s` | The time the Pod has been running [1] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

**[1]:** Instrumentations SHOULD use a gauge with type `double` and measure uptime in seconds as a floating point number with the highest precision available.
The actual accuracy would depend on the instrumentation and operating system.

<!-- markdownlint-restore -->
<!-- prettier-ignore-end -->
<!-- END AUTOGENERATED TEXT -->
<!-- endsemconv -->

### Metric: `k8s.pod.cpu.time`

This metric is [recommended][MetricRecommended].
Expand Down Expand Up @@ -149,6 +172,29 @@ This metric is [recommended][MetricRecommended].
<!-- END AUTOGENERATED TEXT -->
<!-- endsemconv -->

### Metric: `k8s.node.uptime`

This metric is [recommended][MetricRecommended].

<!-- semconv metric.k8s.node.uptime -->
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
<!-- see templates/registry/markdown/snippet.md.j2 -->
<!-- prettier-ignore-start -->
<!-- markdownlint-capture -->
<!-- markdownlint-disable -->

| Name | Instrument Type | Unit (UCUM) | Description | Stability |
| -------- | --------------- | ----------- | -------------- | --------- |
| `k8s.node.uptime` | Gauge | `s` | The time the Node has been running [1] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

**[1]:** Instrumentations SHOULD use a gauge with type `double` and measure uptime in seconds as a floating point number with the highest precision available.
The actual accuracy would depend on the instrumentation and operating system.

<!-- markdownlint-restore -->
<!-- prettier-ignore-end -->
<!-- END AUTOGENERATED TEXT -->
<!-- endsemconv -->

### Metric: `k8s.node.cpu.time`

This metric is [recommended][MetricRecommended].
Expand Down
Loading

0 comments on commit e23379a

Please sign in to comment.