-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collector docs on single-writer principle #4433
Closed
michael2893
wants to merge
69
commits into
open-telemetry:main
from
michael2893:michael2893-update-collector-documentation
Closed
Changes from 12 commits
Commits
Show all changes
69 commits
Select commit
Hold shift + click to select a range
b6db3c4
Add some additional collector deployment docs
michael2893 b85b4c9
tweak a couple of the formatting issues
michael2893 c7c192e
one last small change to the header title (added a 'when')
michael2893 2d205ab
update documentation on multiple collectors. move to a new section
michael2893 dfb03f2
Revert gateway document to original
michael2893 7517654
small gramatical nitpick
michael2893 c5cae81
update some of the formatting under the links
michael2893 198e937
update scaling docs to include SWP, and update link references
michael2893 ca2b41a
move scaling section to proper directory
michael2893 57400a8
remove redundant reference to tail-sampling. It's already covered ear…
michael2893 f1a7d4b
revert accidental change to sampling doc
michael2893 af60cba
Merge branch 'main' into michael2893-update-collector-documentation
michael2893 b46d3b6
Merge branch 'main' into michael2893-update-collector-documentation
michael2893 8abfd04
update section in gateway docs, remove separate SWP doc
2676b5c
Merge branch 'main' into michael2893-update-collector-documentation
michael2893 ec0aa3d
merge main
michael2893 71f6c65
update heading
michael2893 26501cb
Merge branch 'main' of https://github.com/michael2893/opentelemetry.i…
4962ccf
Auto-update registry versions (c7b7b764a9d7f7e68118c6e4af5442b3730bc0…
opentelemetrybot bd4aa08
Create instrumentation-go-kafka.yml (#5096)
jurabek ec32cf1
Auto-update registry versions (e834e85c8e142adda812f685081f8cc9fdbc65…
opentelemetrybot f983cdf
[pt] Add content/pt/docs/concepts/instrumentation/code-based.md (#5093)
janssenlima c445f36
Fix name of jaeger_remote polling interval property (#5100)
drewhammond 11bebfa
[pt] Add /content/pt/docs/concepts/semantic-conventions.md (#5047)
igorestevanjasinski 4685784
Update NPM packages, including Hugo to 0.133.0 (#5106)
chalin 66a17a5
Add context import to examples that use "context" (#5105)
MrAlias 940db05
Add Go OTLP over gRPC exporter section (#5104)
MrAlias 6dc98d3
Add blog post for go.opentelemetry.io switch (#5087)
damemi be09012
Revert "Fix name of jaeger_remote polling interval property" (#5103)
drewhammond 7765fb0
Add Checkmk App Integration Into the Registry (#5079)
LiraLemur 14eec43
Update integrations.md (#5110)
tedder 64227d3
Auto-update registry versions (3b751c457df60d2dd89d1c99ad5edf6eddd1e3…
opentelemetrybot ca1fd1c
[es] feature: contributing - spanish translation (#5058)
krol3 259489e
Update opentelemetry-collector-releases version to v0.108.0 (#5113)
opentelemetrybot 1a83914
[zh] add blog docs-localized.md (#5112)
shalk 187e763
[es] feature: demo structure - spanish translation (#5053)
krol3 83472b4
Auto-update registry versions (49a605df5a62461493877154c3bc9591c2969c…
opentelemetrybot 666dd5a
Custom collector page: link to tagged releases (#5118)
chalin 7f45f12
[CI] After a `fix:*` command, tell user to rerun full checks (#5122)
chalin 85371c7
[CI] PR-actions: escape PR comment special char (#5126)
chalin 060998c
Mention sync gauge in Otel Go metrics docs (#5116)
cijothomas 633d862
[CI] PR-actions: multiline comment fix (#5127)
chalin 9e8961f
Update vendor list to include Arize Phoenix (#5129)
Jgilhuly 45b87dc
Create Haystack OpenInference Registry Entry (#5128)
Jgilhuly c0e34fa
Add MercadoLibre as adopter (#5117)
vitorvasc ba50f9c
add last9 to otel vendors.yaml (#5121)
sahilk 503caee
[pt] Add /pt/docs/concepts/signals/logs.md (#5062)
EzzioMoreira 5f8fff1
Bump @opentelemetry/exporter-trace-otlp-http from 0.52.1 to 0.53.0 (#…
dependabot[bot] 54751a0
Bump @opentelemetry/instrumentation from 0.52.1 to 0.53.0 (#5136)
dependabot[bot] 4c03ab7
correct metric type for http req/res sizes (#5141)
jamesmoessis 787837e
Auto-update registry versions (1922be899a97dc57f05399208a33a78d76ee6c…
opentelemetrybot 993b0eb
Replace "dynamically injects bytecode" in Python zero-code instrument…
CFly17 5e9db1c
Add GC Election 2024 announcement blog post (#5133)
danielgblanco 0589ca7
Add operator runbooks (#5131)
bogdan-at-adobe 1d19cb1
Bump markdownlint from 0.34.0 to 0.35.0 (#5135)
dependabot[bot] 49bb772
Auto-update registry versions (afeb3dc251b2b6eadef7f4e1baeebc44e49846…
opentelemetrybot 3cfd724
[CI] Report an error when URLs are missing from an integrations regis…
chalin 8a72552
[cleanup] Remove unnecessary aliases (#5145)
chalin 5d629bb
chore(docs): Update getting started to not use Bandit as it is not re…
pdgonzalez872 9b36c17
Create blog post about Prometheus and OpenTelemetry (#4119)
reese-lee 3f43ad7
[chore] Remove references to the logging exporter (#5143)
TylerHelmuth 8f38791
Auto-update registry versions (a01432fe2d99316cf2bf6aa4f60979441edf2a…
opentelemetrybot 5528e7c
Collector internal telemetry updates (#4867)
danelson 7a11378
[ux] Rework OTel-highlights ribbon, move Integrations to ribbon (#5156)
chalin a4dcb49
Merge branch 'michael2893-update-collector-documentation' of https://…
michael2893 81ebef9
Merge branch 'main' into michael2893-update-collector-documentation
michael2893 3736b0a
Merge branch 'michael2893-update-collector-documentation' of https://…
michael2893 06ad689
Merge branch 'main' into michael2893-update-collector-documentation
michael2893 a6c9b65
Merge branch 'michael2893-update-collector-documentation' of https://…
michael2893 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
71 changes: 71 additions & 0 deletions
71
content/en/docs/collector/deployment/multiple-collectors.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,71 @@ | ||||||||||||
--- | ||||||||||||
title: Multiple Collectors | ||||||||||||
description: | ||||||||||||
Considerations for single-writer responsibility when deploying multiple | ||||||||||||
collectors in a gateway configuration. | ||||||||||||
weight: 3 | ||||||||||||
--- | ||||||||||||
|
||||||||||||
## Deploying Multiple Collectors | ||||||||||||
|
||||||||||||
When deploying multiple collectors in a gateway configuration, it's important to | ||||||||||||
ensure that all metric data streams have a single writer and a globally unique | ||||||||||||
identity. | ||||||||||||
|
||||||||||||
### The Single-Writer Principle | ||||||||||||
|
||||||||||||
The Single-Writer Principle refers to employing a single logical writer for a | ||||||||||||
particular resource. Concurrent access from multiple applications that modify or | ||||||||||||
report on the same data can lead to data loss or, at least, degraded data | ||||||||||||
quality. In gateway collector deployments, applying this principle guards | ||||||||||||
against sending inconsistent data to the backend. All metric data streams within | ||||||||||||
OTLP must have a | ||||||||||||
[single writer](/docs/specs/otel/metrics/data-model/#single-writer). | ||||||||||||
In a system with multiple collectors, the single-writer principle is most | ||||||||||||
relevant for receivers that create their own metrics, such a pull-based scrapers | ||||||||||||
or a host metrics receiver. | ||||||||||||
|
||||||||||||
### Deployment Considerations | ||||||||||||
|
||||||||||||
#### Host Metrics Receiver | ||||||||||||
|
||||||||||||
When creating metrics related to the host system via the | ||||||||||||
[host metrics receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver), | ||||||||||||
it is important to have only one host metrics receiver instance per host. A | ||||||||||||
violation of the single-writer principle in this scenario would mean deploying | ||||||||||||
more than one host metrics receiver on the same host. If both try to collect | ||||||||||||
system data at the same time, this may result in inconsistent data or data loss. | ||||||||||||
Collisions resulting from inconsistent timestamps may lead to an unstable or | ||||||||||||
inconsistent representation of metrics, such as CPU usage. | ||||||||||||
|
||||||||||||
### Detection | ||||||||||||
|
||||||||||||
There are patterns in the data that may provide some insight into whether this | ||||||||||||
is happening or not. For example, upon visual inspection, a series with | ||||||||||||
unexplained gaps or jumps in the same series may be a clue that multiple | ||||||||||||
collectors are sending the same samples. Unexplained behavior in a time series | ||||||||||||
could potentially point to the backend scraping data from multiple sources. | ||||||||||||
|
||||||||||||
There are also more direct errors that could surface in the backend. | ||||||||||||
|
||||||||||||
With a Prometheus backend, an example error is: | ||||||||||||
`Error on ingesting out-of-order samples`. | ||||||||||||
|
||||||||||||
This could indicate that identical targets exist in two jobs, and the order of | ||||||||||||
the timestamps is incorrect. | ||||||||||||
|
||||||||||||
Ex: | ||||||||||||
|
||||||||||||
- Metric T2 received at time 13:56:04 | ||||||||||||
- Metric T1 received at time 13:56:07 for the same state as T2 | ||||||||||||
|
||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a better example would be:
Suggested change
|
||||||||||||
### Prevention | ||||||||||||
|
||||||||||||
All metric streams produced by OTel SDKs should have a globally unique | ||||||||||||
[Metric Identity](/docs/specs/otel/metrics/data-model/#opentelemetry-protocol-data-model-producer-recommendations). | ||||||||||||
This is to lower the risk of duplication, and ensure writers are sending unique | ||||||||||||
data to the backend. | ||||||||||||
|
||||||||||||
### References | ||||||||||||
|
||||||||||||
- [Understanding Duplicate Samples and Out-of-order Timestamp Errors in Prometheus ](https://promlabs.com/blog/2022/12/15/understanding-duplicate-samples-and-out-of-order-timestamp-errors-in-prometheus) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure if this is a good example. I am not very familiar with Prometheus, but this seems like a different issue: you can have out of order points (I believe the OTLP data model does not explicitly forbid this) while still not running into single-writer issues.