Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector docs on single-writer principle #4433

Conversation

michael2893
Copy link

Summary

This change addresses the request for documentation on the Single-Writer principle. #4368

Description

  • add section on multiple collector deployments in deployment/gateway
  • define single writer principle
  • provide examples and context

Open questions

  • Can I provide examples from open issues to help better capture this problem?

@michael2893 michael2893 changed the title Michael2893 update collector documentation #4368 - update collector deployment documentation May 7, 2024
@michael2893 michael2893 marked this pull request as ready for review May 7, 2024 01:12
@michael2893 michael2893 requested review from a team and atoulme and removed request for a team May 7, 2024 01:12
@svrnm
Copy link
Member

svrnm commented May 7, 2024

@open-telemetry/collector-approvers ptal


There is a gateway deployment configured to handle all traffic for three other collectors in the same system.
If the collectors are not uniquely identified and the SDK fails to distinguish between them, they may
send identical data to the gateway collector from different points in time. In this scenario,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give a more concrete example here? Having multiple instances of a collector behind a load-balancer is certainly common practice, and there's no inherent problem in having a SDK sending data via this load-balancer, causing different data points for the same workload to land at different collector instances.

There are a few situations that need to be accounted for when scaling, like using target-allocator for pull-based scraping (nothing to do with OTLP though), or tail-sampling (due to the statefulness characteristic of this component).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - what I have here isn't really specific enough. I can certainly provide an example here

content/en/docs/collector/deployment/gateway.md Outdated Show resolved Hide resolved
There are patterns in the data that may provide some insight into whether this is happening or not.
For example, upon visual inspection, a series with unexplained gaps or jumps in the same series may be a clue that
multiple collectors are sending the same samples. Unexplained behavior in a time series could potentially
point to the backend scraping data from multiple sources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another common way to find this out is when the backend complains about "out of order samples" -- if a data point for the state of a counter at T2 was received, and later a data point for the state of the same counter at T1 was received, a backend might say that the late data point is discarded.

@michael2893 michael2893 changed the title #4368 - update collector deployment documentation #4368 - update collector documentation on Single-writer principle May 8, 2024
@michael2893 michael2893 requested a review from jpkrohling May 8, 2024 12:40
content/en/docs/collector/scaling.md Outdated Show resolved Hide resolved
content/en/docs/collector/scaling.md Outdated Show resolved Hide resolved
content/en/docs/collector/scaling.md Outdated Show resolved Hide resolved
content/en/docs/collector/scaling.md Outdated Show resolved Hide resolved
@michael2893 michael2893 requested a review from jpkrohling May 9, 2024 12:16
@chalin chalin force-pushed the michael2893-update-collector-documentation branch from 18efeda to f1a7d4b Compare June 8, 2024 09:38
@chalin chalin changed the title #4368 - update collector documentation on Single-writer principle Collector docs on single-writer principle Jun 8, 2024
@chalin
Copy link
Contributor

chalin commented Jun 8, 2024

/fix:all

@opentelemetrybot
Copy link
Collaborator

You triggered fix:all action run at https://github.com/open-telemetry/opentelemetry.io/actions/runs/9427840190

Jgilhuly and others added 27 commits September 8, 2024 09:25
Co-authored-by: opentelemetrybot <[email protected]>
Co-authored-by: Phillip Carter <[email protected]>
Co-authored-by: opentelemetrybot <[email protected]>
Co-authored-by: Phillip Carter <[email protected]>
…pen-telemetry#5134)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…etry#5136)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ation (open-telemetry#5130)

Co-authored-by: opentelemetrybot <[email protected]>
Co-authored-by: Fabrizio Ferri-Benedetti <[email protected]>
Co-authored-by: Juraci Paixão Kröhling <[email protected]>
Co-authored-by: Fabrizio Ferri-Benedetti <[email protected]>
Co-authored-by: Fabrizio Ferri-Benedetti <[email protected]>
Co-authored-by: opentelemetrybot <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
)

Signed-off-by: svrnm <[email protected]>
Co-authored-by: Fabrizio Ferri-Benedetti <[email protected]>
Co-authored-by: Arve Knudsen <[email protected]>
Co-authored-by: Cijo Thomas <[email protected]>
Co-authored-by: Goutham Veeramachaneni <[email protected]>
Co-authored-by: Jacob Aronoff <[email protected]>
Co-authored-by: Adriana Villela <[email protected]>
Co-authored-by: Severin Neumann <[email protected]>
Co-authored-by: Patrice Chalin <[email protected]>
Co-authored-by: opentelemetrybot <[email protected]>
Co-authored-by: Tiffany Hrabusa <[email protected]>
Co-authored-by: Alex Boten <[email protected]>
Co-authored-by: opentelemetrybot <[email protected]>
Co-authored-by: Phillip Carter <[email protected]>
@michael2893
Copy link
Author

there was an issue with squashing the commits from the invalid email here, so I just moved the change to here:
#5166

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.