Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector docs on single-writer principle #4433

Closed
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
b6db3c4
Add some additional collector deployment docs
michael2893 May 7, 2024
b85b4c9
tweak a couple of the formatting issues
michael2893 May 7, 2024
c7c192e
one last small change to the header title (added a 'when')
michael2893 May 7, 2024
2d205ab
update documentation on multiple collectors. move to a new section
michael2893 May 8, 2024
dfb03f2
Revert gateway document to original
michael2893 May 8, 2024
7517654
small gramatical nitpick
michael2893 May 8, 2024
c5cae81
update some of the formatting under the links
michael2893 May 8, 2024
198e937
update scaling docs to include SWP, and update link references
michael2893 May 8, 2024
ca2b41a
move scaling section to proper directory
michael2893 May 8, 2024
57400a8
remove redundant reference to tail-sampling. It's already covered ear…
michael2893 May 9, 2024
f1a7d4b
revert accidental change to sampling doc
michael2893 May 9, 2024
af60cba
Merge branch 'main' into michael2893-update-collector-documentation
michael2893 Jun 8, 2024
b46d3b6
Merge branch 'main' into michael2893-update-collector-documentation
michael2893 Aug 7, 2024
8abfd04
update section in gateway docs, remove separate SWP doc
Aug 23, 2024
2676b5c
Merge branch 'main' into michael2893-update-collector-documentation
michael2893 Aug 23, 2024
ec0aa3d
merge main
michael2893 Aug 24, 2024
71f6c65
update heading
michael2893 Aug 24, 2024
26501cb
Merge branch 'main' of https://github.com/michael2893/opentelemetry.i…
Sep 8, 2024
4962ccf
Auto-update registry versions (c7b7b764a9d7f7e68118c6e4af5442b3730bc0…
opentelemetrybot Aug 24, 2024
bd4aa08
Create instrumentation-go-kafka.yml (#5096)
jurabek Aug 25, 2024
ec32cf1
Auto-update registry versions (e834e85c8e142adda812f685081f8cc9fdbc65…
opentelemetrybot Aug 26, 2024
f983cdf
[pt] Add content/pt/docs/concepts/instrumentation/code-based.md (#5093)
janssenlima Aug 26, 2024
c445f36
Fix name of jaeger_remote polling interval property (#5100)
drewhammond Aug 26, 2024
11bebfa
[pt] Add /content/pt/docs/concepts/semantic-conventions.md (#5047)
igorestevanjasinski Aug 26, 2024
4685784
Update NPM packages, including Hugo to 0.133.0 (#5106)
chalin Aug 26, 2024
66a17a5
Add context import to examples that use "context" (#5105)
MrAlias Aug 26, 2024
940db05
Add Go OTLP over gRPC exporter section (#5104)
MrAlias Aug 26, 2024
6dc98d3
Add blog post for go.opentelemetry.io switch (#5087)
damemi Aug 27, 2024
be09012
Revert "Fix name of jaeger_remote polling interval property" (#5103)
drewhammond Aug 27, 2024
7765fb0
Add Checkmk App Integration Into the Registry (#5079)
LiraLemur Aug 27, 2024
14eec43
Update integrations.md (#5110)
tedder Aug 27, 2024
64227d3
Auto-update registry versions (3b751c457df60d2dd89d1c99ad5edf6eddd1e3…
opentelemetrybot Aug 27, 2024
ca1fd1c
[es] feature: contributing - spanish translation (#5058)
krol3 Aug 27, 2024
259489e
Update opentelemetry-collector-releases version to v0.108.0 (#5113)
opentelemetrybot Aug 27, 2024
1a83914
[zh] add blog docs-localized.md (#5112)
shalk Aug 28, 2024
187e763
[es] feature: demo structure - spanish translation (#5053)
krol3 Aug 28, 2024
83472b4
Auto-update registry versions (49a605df5a62461493877154c3bc9591c2969c…
opentelemetrybot Aug 29, 2024
666dd5a
Custom collector page: link to tagged releases (#5118)
chalin Aug 29, 2024
7f45f12
[CI] After a `fix:*` command, tell user to rerun full checks (#5122)
chalin Aug 29, 2024
85371c7
[CI] PR-actions: escape PR comment special char (#5126)
chalin Aug 30, 2024
060998c
Mention sync gauge in Otel Go metrics docs (#5116)
cijothomas Aug 30, 2024
633d862
[CI] PR-actions: multiline comment fix (#5127)
chalin Aug 30, 2024
9e8961f
Update vendor list to include Arize Phoenix (#5129)
Jgilhuly Aug 30, 2024
45b87dc
Create Haystack OpenInference Registry Entry (#5128)
Jgilhuly Aug 30, 2024
c0e34fa
Add MercadoLibre as adopter (#5117)
vitorvasc Aug 30, 2024
ba50f9c
add last9 to otel vendors.yaml (#5121)
sahilk Aug 31, 2024
503caee
[pt] Add /pt/docs/concepts/signals/logs.md (#5062)
EzzioMoreira Sep 2, 2024
5f8fff1
Bump @opentelemetry/exporter-trace-otlp-http from 0.52.1 to 0.53.0 (#…
dependabot[bot] Sep 2, 2024
54751a0
Bump @opentelemetry/instrumentation from 0.52.1 to 0.53.0 (#5136)
dependabot[bot] Sep 2, 2024
4c03ab7
correct metric type for http req/res sizes (#5141)
jamesmoessis Sep 3, 2024
787837e
Auto-update registry versions (1922be899a97dc57f05399208a33a78d76ee6c…
opentelemetrybot Sep 3, 2024
993b0eb
Replace "dynamically injects bytecode" in Python zero-code instrument…
CFly17 Sep 3, 2024
5e9db1c
Add GC Election 2024 announcement blog post (#5133)
danielgblanco Sep 3, 2024
0589ca7
Add operator runbooks (#5131)
bogdan-at-adobe Sep 3, 2024
1d19cb1
Bump markdownlint from 0.34.0 to 0.35.0 (#5135)
dependabot[bot] Sep 3, 2024
49bb772
Auto-update registry versions (afeb3dc251b2b6eadef7f4e1baeebc44e49846…
opentelemetrybot Sep 4, 2024
3cfd724
[CI] Report an error when URLs are missing from an integrations regis…
chalin Sep 4, 2024
8a72552
[cleanup] Remove unnecessary aliases (#5145)
chalin Sep 4, 2024
5d629bb
chore(docs): Update getting started to not use Bandit as it is not re…
pdgonzalez872 Sep 4, 2024
9b36c17
Create blog post about Prometheus and OpenTelemetry (#4119)
reese-lee Sep 4, 2024
3f43ad7
[chore] Remove references to the logging exporter (#5143)
TylerHelmuth Sep 4, 2024
8f38791
Auto-update registry versions (a01432fe2d99316cf2bf6aa4f60979441edf2a…
opentelemetrybot Sep 7, 2024
5528e7c
Collector internal telemetry updates (#4867)
danelson Sep 7, 2024
7a11378
[ux] Rework OTel-highlights ribbon, move Integrations to ribbon (#5156)
chalin Sep 8, 2024
a4dcb49
Merge branch 'michael2893-update-collector-documentation' of https://…
michael2893 Sep 8, 2024
81ebef9
Merge branch 'main' into michael2893-update-collector-documentation
michael2893 Sep 8, 2024
3736b0a
Merge branch 'michael2893-update-collector-documentation' of https://…
michael2893 Sep 8, 2024
06ad689
Merge branch 'main' into michael2893-update-collector-documentation
michael2893 Sep 8, 2024
a6c9b65
Merge branch 'michael2893-update-collector-documentation' of https://…
michael2893 Sep 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions content/en/docs/collector/deployment/multiple-collectors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
title: Multiple Collectors
description:
Considerations for single-writer responsibility when deploying multiple
collectors in a gateway configuration.
weight: 3
---

## Deploying Multiple Collectors

When deploying multiple collectors in a gateway configuration, it's important to
ensure that all metric data streams have a single writer and a globally unique
identity.

### The Single-Writer Principle

The Single-Writer Principle refers to employing a single logical writer for a
particular resource. Concurrent access from multiple applications that modify or
report on the same data can lead to data loss or, at least, degraded data
quality. In gateway collector deployments, applying this principle guards
against sending inconsistent data to the backend. All metric data streams within
OTLP must have a
[single writer](/docs/specs/otel/metrics/data-model/#single-writer).
In a system with multiple collectors, the single-writer principle is most
relevant for receivers that create their own metrics, such a pull-based scrapers
or a host metrics receiver.

### Deployment Considerations

#### Host Metrics Receiver

When creating metrics related to the host system via the
[host metrics receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver),
it is important to have only one host metrics receiver instance per host. A
violation of the single-writer principle in this scenario would mean deploying
more than one host metrics receiver on the same host. If both try to collect
system data at the same time, this may result in inconsistent data or data loss.
Collisions resulting from inconsistent timestamps may lead to an unstable or
inconsistent representation of metrics, such as CPU usage.

### Detection

There are patterns in the data that may provide some insight into whether this
is happening or not. For example, upon visual inspection, a series with
unexplained gaps or jumps in the same series may be a clue that multiple
collectors are sending the same samples. Unexplained behavior in a time series
could potentially point to the backend scraping data from multiple sources.

There are also more direct errors that could surface in the backend.

With a Prometheus backend, an example error is:
`Error on ingesting out-of-order samples`.

This could indicate that identical targets exist in two jobs, and the order of
the timestamps is incorrect.

Ex:

- Metric T2 received at time 13:56:04
- Metric T1 received at time 13:56:07 for the same state as T2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure if this is a good example. I am not very familiar with Prometheus, but this seems like a different issue: you can have out of order points (I believe the OTLP data model does not explicitly forbid this) while still not running into single-writer issues.


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better example would be:

Suggested change
- Metric T2 received at time 13:56:04
- Metric T1 received at time 13:56:07 for the same state as T2
- Metric `M1` received at time 13:56:04 with value `100`
- Metric `M1` received at time 13:56:24 with value `120`
- Metric `M1` received at time 13:56:04 with value `110`

### Prevention

All metric streams produced by OTel SDKs should have a globally unique
[Metric Identity](/docs/specs/otel/metrics/data-model/#opentelemetry-protocol-data-model-producer-recommendations).
This is to lower the risk of duplication, and ensure writers are sending unique
data to the backend.

### References

- [Understanding Duplicate Samples and Out-of-order Timestamp Errors in Prometheus ](https://promlabs.com/blog/2022/12/15/understanding-duplicate-samples-and-out-of-order-timestamp-errors-in-prometheus)
60 changes: 60 additions & 0 deletions content/en/docs/collector/scaling.md
Original file line number Diff line number Diff line change
Expand Up @@ -382,3 +382,63 @@ service:
exporters:
- loadbalancing
```

### The Single-Writer Principle

The Single-Writer Principle refers to employing a single logical writer for a
particular resource. When scaling collectors horizontally in a system it's
important to properly distinguish between targets using unique identities.


##### Pull-based Scraping

In pull-based metric reporting it is important to maintain the concept of unique
metric identities. When scaling scrapers, it's important to ensure that the
targets have globally unique references. If the complexity in the environment is
lower, you may be able to rely on sharding based on namespace or workload alone.
As the system increases in complexity, consider adding custom labels related to
the application or service to better delineate the targets.

Here is an example of how to configure target labels in a collector
configuration that uses a Prometheus receiver.

```yaml
# config.yaml example
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector-dev-test-service-frontend'
static_configs:
- targets: ['test-service:metrics-port']
relabel_configs:
- source_labels: ['__meta_kubernetes_namespace']
target_label: 'namespace'
action: keep
- source_labels: ['__meta_kubernetes_service_label_tier']
target_label: 'tier'
action: keep

- job_name: 'otel-collector-dev-test-service-backend'
static_configs:
- targets: ['test-service:metrics-port']
relabel_configs:
- source_labels: ['__meta_kubernetes_namespace']
target_label: 'namespace'
action: keep
- source_labels: ['__meta_kubernetes_service_label_tier']
target_label: 'tier'
action: keep

exporters:
otlp:
endpoint: my.sample:4317
tls:
insecure: true

service:
pipelines:
metrics:
receivers: [prometheus]
exporters: [otlp]
```
Loading