From 22821f42a549be12bb46b6242877512f46093564 Mon Sep 17 00:00:00 2001 From: Severin Neumann Date: Fri, 21 Jun 2024 16:12:43 +0200 Subject: [PATCH 1/5] Update and fix textlinter (#4728) Signed-off-by: svrnm --- .textlintrc.yml | 4 ++-- content/en/blog/2023/end-user-q-and-a-01.md | 2 +- content/en/docs/collector/management.md | 4 ++-- content/en/docs/demo/_index.md | 2 +- content/en/docs/demo/feature-flags.md | 2 +- content/en/docs/languages/js/serverless.md | 2 +- content/en/docs/languages/php/instrumentation.md | 2 +- content/en/docs/languages/python/_index.md | 4 ++-- content/en/docs/languages/ruby/exporters.md | 2 +- content/en/docs/languages/ruby/getting-started.md | 2 +- package.json | 2 +- 11 files changed, 14 insertions(+), 14 deletions(-) diff --git a/.textlintrc.yml b/.textlintrc.yml index 95125fda5a7d..e70a4a84d709 100644 --- a/.textlintrc.yml +++ b/.textlintrc.yml @@ -8,8 +8,8 @@ filters: enablingComment: prettier-ignore-end allowlist: allow: - # Don't check registry .yml file fields for language, repo and tags: - - '/^\s*(?:language|repo|name|docs): .*$/m' + # Don't check registry .yml file fields for language, repo, url and tags: + - '/^\s*(?:language|repo|name|docs|url): .*$/m' - /^(?:tags):(\s*-.+$)*/m # Hugo template syntax: - /{{.*?}}/ diff --git a/content/en/blog/2023/end-user-q-and-a-01.md b/content/en/blog/2023/end-user-q-and-a-01.md index b0fae1f5b62a..ccb51f0cb0df 100644 --- a/content/en/blog/2023/end-user-q-and-a-01.md +++ b/content/en/blog/2023/end-user-q-and-a-01.md @@ -169,7 +169,7 @@ They are hoping to leverage [OpenTelemetry’s Exemplars](/docs/specs/otel/metrics/data-model/#exemplars) to link traces and metrics. -### How is the organization sending telemetry data to various observability back-ends? +### How is the organization sending telemetry data to various observability backends? J’s team uses a combination of the proprietary backend agent and the OpenTelemetry Collector (for metrics). They are one of the primary users of diff --git a/content/en/docs/collector/management.md b/content/en/docs/collector/management.md index d41987eea02b..7e4a7ce2247b 100644 --- a/content/en/docs/collector/management.md +++ b/content/en/docs/collector/management.md @@ -90,7 +90,7 @@ We will set up a simple OpAMP control plane consisting of an example OpAMP server and let an OpenTelemetry Collector connect to it via an example OpAMP supervisor. -First, clone the `open-telemetry/opamp-go` repo: +First, clone the `open-telemetry/opamp-go` repository: ```sh git clone https://github.com/open-telemetry/opamp-go.git @@ -159,7 +159,7 @@ service: ``` Now it's time to launch the supervisor (which in turn will launch your -OpenTelemetry collector): +OpenTelemetry Collector): ```console $ go run . diff --git a/content/en/docs/demo/_index.md b/content/en/docs/demo/_index.md index f4ed9dd79043..cf0cdf49607c 100644 --- a/content/en/docs/demo/_index.md +++ b/content/en/docs/demo/_index.md @@ -63,7 +63,7 @@ solve them. We'll be adding more scenarios over time. - Generate a [Product Catalog error](feature-flags) for `GetProduct` requests - with product id: `OLJCESPC7Z` using the Feature Flag service + with product ID: `OLJCESPC7Z` using the Feature Flag service - Discover a memory leak and diagnose it using metrics and traces. [Read more](scenarios/recommendation-cache/) diff --git a/content/en/docs/demo/feature-flags.md b/content/en/docs/demo/feature-flags.md index f7a798f4b625..4b9fd1c72999 100644 --- a/content/en/docs/demo/feature-flags.md +++ b/content/en/docs/demo/feature-flags.md @@ -19,7 +19,7 @@ change the `defaultVariant` value in the config file for a given flag to "on". | `adServiceManualGc` | Ad Service | Trigger full manual garbage collections in the ad service | | `adServiceHighCpu` | Ad Service | Trigger high cpu load in the ad service. If you want to demo cpu throttling, set cpu resource limits | | `cartServiceFailure` | Cart Service | Generate an error for `EmptyCart` 1/10th of the time | -| `productCatalogFailure` | Product Catalog | Generate an error for `GetProduct` requests with product id: `OLJCESPC7Z` | +| `productCatalogFailure` | Product Catalog | Generate an error for `GetProduct` requests with product ID: `OLJCESPC7Z` | | `recommendationServiceCacheFailure` | Recommendation | Create a memory leak due to an exponentially growing cache. 1.4x growth, 50% of requests trigger growth. | | `paymentServiceFailure` | Payment Service | Generate an error when calling the `charge` method. | | `paymentServiceUnreachable` | Checkout Service | Use a bad address when calling the PaymentService to make it seem like the PaymentService is unavailable. | diff --git a/content/en/docs/languages/js/serverless.md b/content/en/docs/languages/js/serverless.md index 629d26fe292d..c418b95e00c3 100644 --- a/content/en/docs/languages/js/serverless.md +++ b/content/en/docs/languages/js/serverless.md @@ -25,7 +25,7 @@ If you are interested in a plug and play user experience, see ### Dependencies -First, create an empty package.json: +First, create an empty `package.json`: ```sh npm init -y diff --git a/content/en/docs/languages/php/instrumentation.md b/content/en/docs/languages/php/instrumentation.md index f4e70fbfd8b5..a983898532c8 100644 --- a/content/en/docs/languages/php/instrumentation.md +++ b/content/en/docs/languages/php/instrumentation.md @@ -690,7 +690,7 @@ See [Exporters](/docs/languages/php/exporters) OpenTelemetry can be used to measure and record different types of metrics from an application, which can then be [pushed](/docs/specs/otel/metrics/sdk/#push-metric-exporter) to a metrics -service such as the OpenTelemetry collector: +service such as the OpenTelemetry Collector: - counter - async counter diff --git a/content/en/docs/languages/python/_index.md b/content/en/docs/languages/python/_index.md index 08ed02503454..072d7c3cfcad 100644 --- a/content/en/docs/languages/python/_index.md +++ b/content/en/docs/languages/python/_index.md @@ -61,8 +61,8 @@ pip install -e ./opentelemetry-sdk ## Repositories and benchmarks -- Main repo: [opentelemetry-python][] -- Contrib repo: [opentelemetry-python-contrib][] +- Main repository: [opentelemetry-python][] +- Contrib repository: [opentelemetry-python-contrib][] [opentelemetry-python]: https://github.com/open-telemetry/opentelemetry-python [opentelemetry-python-contrib]: diff --git a/content/en/docs/languages/ruby/exporters.md b/content/en/docs/languages/ruby/exporters.md index f5dd3779ec6e..42099925237d 100644 --- a/content/en/docs/languages/ruby/exporters.md +++ b/content/en/docs/languages/ruby/exporters.md @@ -115,7 +115,7 @@ end ``` If you now run your application, set the environment variable -`OTEL_TRACES_EXPORTER` to zipkin: +`OTEL_TRACES_EXPORTER` to Zipkin: ```sh env OTEL_TRACES_EXPORTER=zipkin rails server diff --git a/content/en/docs/languages/ruby/getting-started.md b/content/en/docs/languages/ruby/getting-started.md index 16430e6a8f6f..1839617fe3a1 100644 --- a/content/en/docs/languages/ruby/getting-started.md +++ b/content/en/docs/languages/ruby/getting-started.md @@ -34,7 +34,7 @@ For more elaborate examples, see [examples](/docs/languages/ruby/examples/). ### Dependencies -To begin, install rails: +To begin, install Rails: ```sh gem install rails diff --git a/package.json b/package.json index e07bc6d524f1..af4ab40adbc6 100644 --- a/package.json +++ b/package.json @@ -115,7 +115,7 @@ "textlint": "^14.0.4", "textlint-filter-rule-allowlist": "^4.0.0", "textlint-filter-rule-comments": "^1.2.2", - "textlint-rule-terminology": "^5.0.10", + "textlint-rule-terminology": "^5.0.15", "through2": "^4.0.2", "yargs": "^17.7.2" }, From a47ab10d1ab3631d9797fbc81bfcab47867f3f38 Mon Sep 17 00:00:00 2001 From: Yoshi Yamaguchi Date: Fri, 21 Jun 2024 23:19:11 +0900 Subject: [PATCH 2/5] [ja] translate OpenTelemetry primer (#4694) --- content/ja/docs/concepts/_index.md | 11 ++ .../ja/docs/concepts/observability-primer.md | 118 ++++++++++++++++++ 2 files changed, 129 insertions(+) create mode 100644 content/ja/docs/concepts/_index.md create mode 100644 content/ja/docs/concepts/observability-primer.md diff --git a/content/ja/docs/concepts/_index.md b/content/ja/docs/concepts/_index.md new file mode 100644 index 000000000000..a624699333d7 --- /dev/null +++ b/content/ja/docs/concepts/_index.md @@ -0,0 +1,11 @@ +--- +title: OpenTelemetryの概念 +linkTitle: 概念 +description: OpenTelemetryの重要概念 +aliases: [concepts/overview] +weight: 170 +default_lang_commit: ebd92bb +--- + +このセクションでは、OpenTelemetryプロジェクトのデータソースと主要な要素について説明します。 +これらのドキュメントを読めば、OpenTelemetryが動作原理について理解できるでしょう。 diff --git a/content/ja/docs/concepts/observability-primer.md b/content/ja/docs/concepts/observability-primer.md new file mode 100644 index 000000000000..5e25387c9f74 --- /dev/null +++ b/content/ja/docs/concepts/observability-primer.md @@ -0,0 +1,118 @@ +--- +title: Observability入門 +description: 重要なオブザーバビリティに関する概念 +weight: 9 +cSpell:ignore: webshop +default_lang_commit: ebd92bb +--- + +## オブザーバビリティとは何か {#what-is-observability} + +オブザーバビリティは、システムの内部構造を知らなくても、そのシステムについて質問することで、システムを外側から理解することを可能にします。 +さらに、真新しい問題、つまり「未知の未知」のトラブルシューティングや対処が容易になります。 +また、「なぜこのようなことが起こるのか」という疑問に答えるのにも役立ちます。 + +システムに関してこれらの質問をするためには、アプリケーションが適切に計装されていなければなりません。 +つまり、アプリケーションのコードは、[トレース](/docs/concepts/signals/traces/)、[メトリクス](/docs/concepts/signals/metrics/)、[ログ](/docs/concepts/signals/logs/)などの[シグナル](/docs/concepts/signals/)を発しなければなりません。 +開発者が問題をトラブルシュートするために計装を追加する必要がないとき、アプリケーションは適切に計装されていると言えます。 +なぜなら開発者が必要な情報をすべて持っているということになるからです。 + +[OpenTelemetry](/docs/what-is-opentelemetry/)は、システムをオブザーバビリティがある状態にするために、アプリケーションコードの計装を手助けする仕組みです。 + +## 信頼性とメトリクス + +**テレメトリー** とは、システムやその動作から送出されるデータのことです。 +データは[トレース](/docs/concepts/signals/traces/)、[メトリクス](/docs/concepts/signals/metrics/)、[ログ](/docs/concepts/signals/logs/)などの形式で得られます。 + +**信頼性** は「サービスがユーザーの期待通りに動いているでしょうか」といった疑問に答えてくれます。 +システムは常に100%稼働していても、ユーザーがショッピングカートに黒い靴を追加するために「カートに追加」をクリックしたときに、システムが常に黒い靴を追加するとは限らない場合、システムは **信頼性がない** と言えるでしょう。 + +**メトリクス** とは、インフラやアプリケーションに関する数値データを一定期間にわたって集計したものです。 +たとえば、システムエラー率、CPU使用率、あるサービスのリクエスト率などです。 +メトリクスとOpenTelemetryとの関係については、[メトリクス](/docs/concepts/signals/metrics/) のページを参照してください。 + +**SLI**(サービスレベル指標)は、サービスの動作の計測値を表します。 +優れたSLIは、ユーザーの視点からサービスを計測します。 +SLIの例として、ウェブページの読み込み速度が挙げられます。 + +**SLO**(サービスレベル目標)は、信頼性を組織や他のチームに伝達する手段を表します。 +これは、1つ以上のSLIをビジネス価値に付加することで達成されます。 + +## 分散トレースを理解する + +分散トレースにより、複雑な分散システムを通してリクエストが伝搬する様子を観察できます。 +分散トレースはアプリケーションやシステムの健全性の可視性を向上させ、ローカルで再現するのが困難な挙動をデバッグできます。 +これは、一般的に非決定論的な問題があったり、ローカルで再現するには複雑すぎる分散システムには不可欠です。 + +分散トレースを理解するには、ログ、スパン、トレースといった各要素の役割を理解する必要があります。 + +### ログ + +**ログ**は、サービスや他のコンポーネントが発するタイムスタンプ付きのメッセージです。 +[トレース](#distributed-traces)とは異なり、ログは必ずしも特定のユーザーリクエストやトランザクションに関連付けられているわけではありません。 +ログは、ソフトウェアのあらゆる場所で見られます。 +ログは、開発者と運用者の両方がシステムの挙動を理解するのに役立つため、これまで大いに利用されてきました。 + +次にあるのはログの例です。 + +```text +I, [2021-02-23T13:26:23.505892 #22473] INFO -- : [6459ffe1-ea53-4044-aaa3-bf902868f730] Started GET "/" for ::1 at 2021-02-23 13:26:23 -0800 +``` + +ログはコードの実行を追跡するには十分ではありません。 +ログには通常、どこから呼び出されたかといったコンテキスト情報が欠けているからです。 + +ログは、[スパン](#span)の一部として含まれるとき、あるいはトレースやスパンと相関があるときに、はるかに有用になります。 + +ログの詳細とOpenTelemetryとの関係については、[ログ](/docs/concepts/signals/logs/)のページを参照してください。 + +### スパン {#span} + +**スパン** は作業または操作の単位を表します。 +スパンは、リクエストが行う特定の操作を追跡し、その操作が実行された時間に何が起こったかを説明してくれます。 + +スパンには、名前、時間関連データ、[構造化ログメッセージ](/docs/concepts/signals/traces/#span-events)、[その他のメタデータ(つまり属性)](/docs/concepts/signals/traces/#attributes)が含まれ、追跡する操作に関する情報を提供します。 + +#### スパン属性 + +スパン属性はスパンに紐づけられたメタデータです。 + +次の表はスパン属性の例を列挙しています。 + +| キー | 値 | +| :-------------------------- | :--------------------------------------------------------------------------------- | +| `http.request.method` | `"GET"` | +| `network.protocol.version` | `"1.1"` | +| `url.path` | `"/webshop/articles/4"` | +| `url.query` | `"?s=1"` | +| `server.address` | `"example.com"` | +| `server.port` | `8080` | +| `url.scheme` | `"https"` | +| `http.route` | `"/webshop/articles/:article_id"` | +| `http.response.status_code` | `200` | +| `client.address` | `"192.0.2.4"` | +| `client.socket.address` | `"192.0.2.5"` (クライアントはプロキシ経由) | +| `user_agent.original` | `"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"` | + +スパンと OpenTelemetry との関係については、[スパン](/docs/concepts/signals/traces/#spans)の節を参照してください。 + +### 分散トレース {#distributed-traces} + +一般的に**トレース**として知られている**分散トレース**は、マイクロサービスやサーバーレスアプリケーションのようなマルチサービスアーキテクチャを伝播するリクエスト(アプリケーションまたはエンドユーザーによって行われる)が辿った経路を記録します。 + +トレースは1つ以上のスパンで構成されるます。 +最初のスパンはルートスパンを表します。 +各ルートスパンは、リクエストの開始から終了までを表します。 +親の下にあるスパンは、リクエスト中に発生すること(またはリクエストを構成するステップ)について、より詳細なコンテキストを提供します。 + +トレースなしでは、分散システムのパフォーマンス問題の根本的な原因を見つけることは困難です。 +トレースは、分散システムを流れるリクエストの中で何が起こっているのかを分解することで、分散システムのデバッグと理解をしやすくします。 + +多くのオブザーバビリティバックエンドは、トレースをこのようなウォーターフォール図として視覚化しています。 + +![トレースの例](/img/waterfall-trace.svg 'トレースのウォーターフォール図') + +ウォーターフォール図は、ルートスパンとその子スパンの親子関係を示しています。 +スパンが別のスパンを含む場合も、入れ子関係を表します。 + +トレースとOpenTelemetryとの関係については、[トレース](/docs/concepts/signals/traces/)のページを参照してください。 From 84946567f39c26964b9b81497652d8ba23a278c3 Mon Sep 17 00:00:00 2001 From: OpenTelemetry Bot <107717825+opentelemetrybot@users.noreply.github.com> Date: Sat, 22 Jun 2024 14:40:01 +0200 Subject: [PATCH 3/5] Auto-update registry versions (2def409ec4c2dd6fb8f31aef2b3bc75588ff089b) (#4732) --- data/registry/exporter-dotnet-geneva.yml | 2 +- data/registry/exporter-js-instana.yml | 2 +- data/registry/exporter-js-jaeger.yml | 2 +- data/registry/exporter-js-prometheus.yml | 2 +- data/registry/exporter-js-zipkin.yml | 2 +- data/registry/instrumentation-js-fetch.yml | 2 +- data/registry/instrumentation-js-grpc.yml | 2 +- data/registry/instrumentation-js-http.yml | 2 +- data/registry/instrumentation-js-xml-http-request.yml | 2 +- data/registry/instrumentation-ruby-active-support.yml | 2 +- data/registry/instrumentation-ruby-faraday.yml | 2 +- data/registry/tools-ruby-rspec-matcher.yml | 2 +- 12 files changed, 12 insertions(+), 12 deletions(-) diff --git a/data/registry/exporter-dotnet-geneva.yml b/data/registry/exporter-dotnet-geneva.yml index 965616a2e946..d6fc51aacfcf 100644 --- a/data/registry/exporter-dotnet-geneva.yml +++ b/data/registry/exporter-dotnet-geneva.yml @@ -17,4 +17,4 @@ createdAt: 2022-11-07 package: registry: nuget name: OpenTelemetry.Exporter.Geneva - version: 1.9.0-rc.2 + version: 1.9.0 diff --git a/data/registry/exporter-js-instana.yml b/data/registry/exporter-js-instana.yml index 02131f53f727..be3dfc02192a 100644 --- a/data/registry/exporter-js-instana.yml +++ b/data/registry/exporter-js-instana.yml @@ -15,4 +15,4 @@ createdAt: 2022-04-18 package: registry: npm name: '@instana/opentelemetry-exporter' - version: 3.11.0 + version: 3.12.0 diff --git a/data/registry/exporter-js-jaeger.yml b/data/registry/exporter-js-jaeger.yml index b6be8fe367f9..70e839eac0d0 100644 --- a/data/registry/exporter-js-jaeger.yml +++ b/data/registry/exporter-js-jaeger.yml @@ -14,4 +14,4 @@ createdAt: 2020-02-06 package: registry: npm name: '@opentelemetry/exporter-jaeger' - version: 1.25.0 + version: 1.25.1 diff --git a/data/registry/exporter-js-prometheus.yml b/data/registry/exporter-js-prometheus.yml index 342b5e39490f..a886b2e20509 100644 --- a/data/registry/exporter-js-prometheus.yml +++ b/data/registry/exporter-js-prometheus.yml @@ -14,4 +14,4 @@ createdAt: 2020-02-06 package: registry: npm name: '@opentelemetry/exporter-prometheus' - version: 0.52.0 + version: 0.52.1 diff --git a/data/registry/exporter-js-zipkin.yml b/data/registry/exporter-js-zipkin.yml index 7e5080ed8321..62ba4b872fc9 100644 --- a/data/registry/exporter-js-zipkin.yml +++ b/data/registry/exporter-js-zipkin.yml @@ -11,7 +11,7 @@ authors: package: name: '@opentelemetry/exporter-zipkin' registry: npm - version: 1.25.0 + version: 1.25.1 urls: repo: https://github.com/open-telemetry/opentelemetry-js/tree/main/packages/opentelemetry-exporter-zipkin docs: /docs/languages/js/exporters/#zipkin diff --git a/data/registry/instrumentation-js-fetch.yml b/data/registry/instrumentation-js-fetch.yml index c69af40c0f6d..37ab7cd5e00a 100644 --- a/data/registry/instrumentation-js-fetch.yml +++ b/data/registry/instrumentation-js-fetch.yml @@ -14,4 +14,4 @@ createdAt: 2020-11-09 package: registry: npm name: '@opentelemetry/instrumentation-fetch' - version: 0.52.0 + version: 0.52.1 diff --git a/data/registry/instrumentation-js-grpc.yml b/data/registry/instrumentation-js-grpc.yml index 58c2f88235bb..376785ac50ad 100644 --- a/data/registry/instrumentation-js-grpc.yml +++ b/data/registry/instrumentation-js-grpc.yml @@ -14,4 +14,4 @@ createdAt: 2020-11-09 package: registry: npm name: '@opentelemetry/instrumentation-grpc' - version: 0.52.0 + version: 0.52.1 diff --git a/data/registry/instrumentation-js-http.yml b/data/registry/instrumentation-js-http.yml index 898bf5ee9d02..d744c467139a 100644 --- a/data/registry/instrumentation-js-http.yml +++ b/data/registry/instrumentation-js-http.yml @@ -14,4 +14,4 @@ createdAt: 2020-11-09 package: registry: npm name: '@opentelemetry/instrumentation-http' - version: 0.52.0 + version: 0.52.1 diff --git a/data/registry/instrumentation-js-xml-http-request.yml b/data/registry/instrumentation-js-xml-http-request.yml index b54edb78457c..830803d40f93 100644 --- a/data/registry/instrumentation-js-xml-http-request.yml +++ b/data/registry/instrumentation-js-xml-http-request.yml @@ -14,4 +14,4 @@ createdAt: 2020-11-09 package: registry: npm name: '@opentelemetry/instrumentation-xml-http-request' - version: 0.52.0 + version: 0.52.1 diff --git a/data/registry/instrumentation-ruby-active-support.yml b/data/registry/instrumentation-ruby-active-support.yml index a9a3490a88c4..f33c7a0db5c4 100644 --- a/data/registry/instrumentation-ruby-active-support.yml +++ b/data/registry/instrumentation-ruby-active-support.yml @@ -15,4 +15,4 @@ createdAt: 2020-11-09 package: registry: gems name: opentelemetry-instrumentation-active_support - version: 0.5.1 + version: 0.5.3 diff --git a/data/registry/instrumentation-ruby-faraday.yml b/data/registry/instrumentation-ruby-faraday.yml index 104333205afa..c1446b72da40 100644 --- a/data/registry/instrumentation-ruby-faraday.yml +++ b/data/registry/instrumentation-ruby-faraday.yml @@ -15,4 +15,4 @@ createdAt: 2020-11-09 package: registry: gems name: opentelemetry-instrumentation-faraday - version: 0.24.4 + version: 0.24.5 diff --git a/data/registry/tools-ruby-rspec-matcher.yml b/data/registry/tools-ruby-rspec-matcher.yml index 5b3f49cc26f3..79b33c894563 100644 --- a/data/registry/tools-ruby-rspec-matcher.yml +++ b/data/registry/tools-ruby-rspec-matcher.yml @@ -18,4 +18,4 @@ createdAt: 2024-02-13 package: registry: gems name: rspec-otel - version: 0.0.2 + version: 0.0.3 From 9218cac655e0438e1d08fb1037a9a7bb6f8c2b7d Mon Sep 17 00:00:00 2001 From: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com> Date: Sat, 22 Jun 2024 15:03:23 -0700 Subject: [PATCH 4/5] Add section on null maps to troubleshooting page (#4731) --- content/en/docs/collector/troubleshooting.md | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/content/en/docs/collector/troubleshooting.md b/content/en/docs/collector/troubleshooting.md index e48030b648fb..2483bddb0058 100644 --- a/content/en/docs/collector/troubleshooting.md +++ b/content/en/docs/collector/troubleshooting.md @@ -2,7 +2,7 @@ title: Troubleshooting description: Recommendations for troubleshooting the Collector weight: 25 -cSpell:ignore: pprof tracez zpages +cSpell:ignore: confmap pprof tracez zpages --- On this page, you can learn how to troubleshoot the health and performance of @@ -359,3 +359,21 @@ container, producing the error message the `NO_WINDOWS_SERVICE=1` environment variable must be set to force the Collector to start as if it were running in an interactive terminal, without attempting to run as a Windows service. + +### Collector is experiencing configuration issues + +The Collector might experience problems due to configuration issues. + +#### Null maps + +During configuration resolution of multiple configs, values in earlier configs +are removed in favor of later configs, even if the later value is null. You can +fix this issue by + +- Using `{}` to represent an empty map, such as `processors: {}` instead of + `processors:`. +- Omitting empty configurations such as `processors:` from the configuration. + +See +[confmap troubleshooting](https://github.com/open-telemetry/opentelemetry-collector/blob/main/confmap/README.md#null-maps) +for more information. From 313e39187146d1d095ae0c267582057f73dbab64 Mon Sep 17 00:00:00 2001 From: Adriana Villela <50256412+avillela@users.noreply.github.com> Date: Sun, 23 Jun 2024 16:48:15 -0400 Subject: [PATCH 5/5] Add TA troubleshooting page (#4708) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Severin Neumann Co-authored-by: Fabrizio Ferri-Benedetti Co-authored-by: Mikołaj Świątek Co-authored-by: Jacob Aronoff Co-authored-by: Phillip Carter --- .../operator/troubleshooting/_index.md | 8 + .../troubleshooting/target-allocator.md | 515 ++++++++++++++++++ static/refcache.json | 12 + 3 files changed, 535 insertions(+) create mode 100644 content/en/docs/kubernetes/operator/troubleshooting/_index.md create mode 100644 content/en/docs/kubernetes/operator/troubleshooting/target-allocator.md diff --git a/content/en/docs/kubernetes/operator/troubleshooting/_index.md b/content/en/docs/kubernetes/operator/troubleshooting/_index.md new file mode 100644 index 000000000000..91f696066c0f --- /dev/null +++ b/content/en/docs/kubernetes/operator/troubleshooting/_index.md @@ -0,0 +1,8 @@ +--- +title: Troubleshooting the OpenTelemetry Operator for Kubernetes +linkTitle: Troubleshooting +description: + Contains a collection of tips for troubleshooting various aspects of the + OpenTelemetry Kubernetes Operator. For example, what to do when the target + allocator is failing to discover scrape targets. +--- diff --git a/content/en/docs/kubernetes/operator/troubleshooting/target-allocator.md b/content/en/docs/kubernetes/operator/troubleshooting/target-allocator.md new file mode 100644 index 000000000000..ea979bece401 --- /dev/null +++ b/content/en/docs/kubernetes/operator/troubleshooting/target-allocator.md @@ -0,0 +1,515 @@ +--- +title: Target Allocator +cSpell:ignore: bleh targetallocator +--- + +If you’ve enabled +[Target Allocator](/docs/kubernetes/operator/target-allocator/) service +discovery on the [OpenTelemetry Operator](/docs/kubernetes/operator), and the +Target Allocator is failing to discover scrape targets, there are a few +troubleshooting steps that you can take to help you understand what’s going on +and restore normal operation. + +## Troubleshooting steps + +### Did you deploy all of your resources to Kubernetes? + +As a first step, make sure that you have deployed all relevant resources to your +Kubernetes cluster. + +### Do you know if metrics are actually being scraped? + +After you’ve deployed all of your resources to Kubernetes, make sure that the +Target Allocator is discovering scrape targets from your +[`ServiceMonitor`](https://prometheus-operator.dev/docs/operator/design/#servicemonitor)(s) +or +[`PodMonitor`](https://prometheus-operator.dev/docs/user-guides/getting-started/#using-podmonitors)(s). + +Suppose that you have this `ServiceMonitor` definition: + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + name: sm-example + namespace: opentelemetry + labels: + app.kubernetes.io/name: py-prometheus-app + release: prometheus +spec: + selector: + matchLabels: + app: my-app + namespaceSelector: + matchNames: + - opentelemetry + endpoints: + - port: prom + path: /metrics + - port: py-client-port + interval: 15s + - port: py-server-port +``` + +this `Service` definition: + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: py-prometheus-app + namespace: opentelemetry + labels: + app: my-app + app.kubernetes.io/name: py-prometheus-app +spec: + selector: + app: my-app + app.kubernetes.io/name: py-prometheus-app + ports: + - name: prom + port: 8080 +``` + +and this `OpenTelemetryCollector` definition: + +```yaml +apiVersion: opentelemetry.io/v1beta1 +kind: OpenTelemetryCollector +metadata: + name: otelcol + namespace: opentelemetry +spec: + mode: statefulset + targetAllocator: + enabled: true + serviceAccount: opentelemetry-targetallocator-sa + prometheusCR: + enabled: true + podMonitorSelector: {} + serviceMonitorSelector: {} + config: + receivers: + otlp: + protocols: + grpc: {} + http: {} + prometheus: + config: + scrape_configs: + - job_name: 'otel-collector' + scrape_interval: 10s + static_configs: + - targets: ['0.0.0.0:8888'] + + processors: + batch: {} + + exporters: + logging: + verbosity: detailed + + service: + pipelines: + traces: + receivers: [otlp] + processors: [batch] + exporters: [logging] + metrics: + receivers: [otlp, prometheus] + processors: [] + exporters: [logging] + logs: + receivers: [otlp] + processors: [batch] + exporters: [logging] +``` + +First, set up a `port-forward` in Kubernetes, so that you can expose the Target +Allocator service: + +```shell +kubectl port-forward svc/otelcol-targetallocator -n opentelemetry 8080:80 +``` + +Where `otelcol-targetallocator` is the value of `metadata.name` in your +`OpenTelemetryCollector` CR concatenated with the `-targetallocator` suffix, and +`opentelemetry` is the namespace to which the `OpenTelemetryCollector` CR is +deployed. + +{{% alert title="Tip" %}} + +You can also get the service name by running + +```shell +kubectl get svc -l app.kubernetes.io/component=opentelemetry-targetallocator -n +``` + +{{% /alert %}} + +Next, get a list of jobs registered with the Target Allocator: + +```shell +curl localhost:8080/jobs | jq +``` + +Your sample output should look like this: + +```json +{ + "serviceMonitor/opentelemetry/sm-example/1": { + "_link": "/jobs/serviceMonitor%2Fopentelemetry%2Fsm-example%2F1/targets" + }, + "serviceMonitor/opentelemetry/sm-example/2": { + "_link": "/jobs/serviceMonitor%2Fopentelemetry%2Fsm-example%2F2/targets" + }, + "otel-collector": { + "_link": "/jobs/otel-collector/targets" + }, + "serviceMonitor/opentelemetry/sm-example/0": { + "_link": "/jobs/serviceMonitor%2Fopentelemetry%2Fsm-example%2F0/targets" + }, + "podMonitor/opentelemetry/pm-example/0": { + "_link": "/jobs/podMonitor%2Fopentelemetry%2Fpm-example%2F0/targets" + } +} +``` + +Where `serviceMonitor/opentelemetry/sm-example/0` represents one of the +`Service` ports that the `ServiceMonitor`picked up: + +- `opentelemetry` is the namespace in which the `ServiceMonitor` resource + resides. +- `sm-example` is the name of the `ServiceMonitor`. +- `0` is one of the port endpoints matched between the `ServiceMonitor` and the + `Service`. + +Similarly, the `PodMonitor`, shows up as `podMonitor/opentelemetry/pm-example/0` +in the `curl` output. + +This is good news, because it tells us that the scrape config discovery is +working! + +You might also be wondering about the `otel-collector` entry. This is happening +because `spec.config.receivers.prometheusReceiver` in the +`OpenTelemetryCollector` resource (named `otel-collector`) has self-scrape +enabled: + +```yaml +prometheus: + config: + scrape_configs: + - job_name: 'otel-collector' + scrape_interval: 10s + static_configs: + - targets: ['0.0.0.0:8888'] +``` + +We can take a deeper look into `serviceMonitor/opentelemetry/sm-example/0`, to +see what scrape targets are getting picked up by running `curl` against the +value of the `_link` output above: + +```shell +curl localhost:8080/jobs/serviceMonitor%2Fopentelemetry%2Fsm-example%2F0/targets | jq +``` + +Sample output: + +```json +{ + "otelcol-collector-0": { + "_link": "/jobs/serviceMonitor%2Fopentelemetry%2Fsm-example%2F0/targets?collector_id=otelcol-collector-0", + "targets": [ + { + "targets": ["10.244.0.11:8080"], + "labels": { + "__meta_kubernetes_endpointslice_port_name": "prom", + "__meta_kubernetes_pod_labelpresent_app_kubernetes_io_name": "true", + "__meta_kubernetes_endpointslice_port_protocol": "TCP", + "__meta_kubernetes_endpointslice_address_target_name": "py-prometheus-app-575cfdd46-nfttj", + "__meta_kubernetes_endpointslice_annotation_endpoints_kubernetes_io_last_change_trigger_time": "2024-06-21T20:01:37Z", + "__meta_kubernetes_endpointslice_labelpresent_app_kubernetes_io_name": "true", + "__meta_kubernetes_pod_name": "py-prometheus-app-575cfdd46-nfttj", + "__meta_kubernetes_pod_controller_name": "py-prometheus-app-575cfdd46", + "__meta_kubernetes_pod_label_app_kubernetes_io_name": "py-prometheus-app", + "__meta_kubernetes_endpointslice_address_target_kind": "Pod", + "__meta_kubernetes_pod_node_name": "otel-target-allocator-talk-control-plane", + "__meta_kubernetes_pod_labelpresent_pod_template_hash": "true", + "__meta_kubernetes_endpointslice_label_kubernetes_io_service_name": "py-prometheus-app", + "__meta_kubernetes_endpointslice_annotationpresent_endpoints_kubernetes_io_last_change_trigger_time": "true", + "__meta_kubernetes_service_name": "py-prometheus-app", + "__meta_kubernetes_pod_ready": "true", + "__meta_kubernetes_pod_labelpresent_app": "true", + "__meta_kubernetes_pod_controller_kind": "ReplicaSet", + "__meta_kubernetes_endpointslice_labelpresent_app": "true", + "__meta_kubernetes_pod_container_image": "otel-target-allocator-talk:0.1.0-py-prometheus-app", + "__address__": "10.244.0.11:8080", + "__meta_kubernetes_service_label_app_kubernetes_io_name": "py-prometheus-app", + "__meta_kubernetes_pod_uid": "495d47ee-9a0e-49df-9b41-fe9e6f70090b", + "__meta_kubernetes_endpointslice_port": "8080", + "__meta_kubernetes_endpointslice_label_endpointslice_kubernetes_io_managed_by": "endpointslice-controller.k8s.io", + "__meta_kubernetes_endpointslice_label_app": "my-app", + "__meta_kubernetes_service_labelpresent_app_kubernetes_io_name": "true", + "__meta_kubernetes_pod_host_ip": "172.24.0.2", + "__meta_kubernetes_namespace": "opentelemetry", + "__meta_kubernetes_endpointslice_endpoint_conditions_serving": "true", + "__meta_kubernetes_endpointslice_labelpresent_kubernetes_io_service_name": "true", + "__meta_kubernetes_endpointslice_endpoint_conditions_ready": "true", + "__meta_kubernetes_service_annotation_kubectl_kubernetes_io_last_applied_configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Service\",\"metadata\":{\"annotations\":{},\"labels\":{\"app\":\"my-app\",\"app.kubernetes.io/name\":\"py-prometheus-app\"},\"name\":\"py-prometheus-app\",\"namespace\":\"opentelemetry\"},\"spec\":{\"ports\":[{\"name\":\"prom\",\"port\":8080}],\"selector\":{\"app\":\"my-app\",\"app.kubernetes.io/name\":\"py-prometheus-app\"}}}\n", + "__meta_kubernetes_endpointslice_endpoint_conditions_terminating": "false", + "__meta_kubernetes_pod_container_port_protocol": "TCP", + "__meta_kubernetes_pod_phase": "Running", + "__meta_kubernetes_pod_container_name": "my-app", + "__meta_kubernetes_pod_container_port_name": "prom", + "__meta_kubernetes_pod_ip": "10.244.0.11", + "__meta_kubernetes_service_annotationpresent_kubectl_kubernetes_io_last_applied_configuration": "true", + "__meta_kubernetes_service_labelpresent_app": "true", + "__meta_kubernetes_endpointslice_address_type": "IPv4", + "__meta_kubernetes_service_label_app": "my-app", + "__meta_kubernetes_pod_label_app": "my-app", + "__meta_kubernetes_pod_container_port_number": "8080", + "__meta_kubernetes_endpointslice_name": "py-prometheus-app-bwbvn", + "__meta_kubernetes_pod_label_pod_template_hash": "575cfdd46", + "__meta_kubernetes_endpointslice_endpoint_node_name": "otel-target-allocator-talk-control-plane", + "__meta_kubernetes_endpointslice_labelpresent_endpointslice_kubernetes_io_managed_by": "true", + "__meta_kubernetes_endpointslice_label_app_kubernetes_io_name": "py-prometheus-app" + } + } + ] + } +} +``` + +The query parameter `collector_id` in the `_link` field of the above output +states that these are the targets pertain to `otelcol-collector-0` (the name of +the `StatefulSet` created for the `OpenTelemetryCollector` resource). + +{{% alert title="Note" %}} + +See the +[Target Allocator readme](https://github.com/open-telemetry/opentelemetry-operator/blob/main/cmd/otel-allocator/README.md?plain=1#L128-L134) +for more information on the `/jobs` endpoint. + +{{% /alert %}} + +### Is the Target Allocator enabled? Is Prometheus service discovery enabled? + +If the `curl` commands above don’t show a list of expected `ServiceMonitor`s and +`PodMonitor`s, you need to check whether the features that populate those values +are turned on. + +One thing to remember is that just because you include the `targetAllocator` +section in the `OpenTelemetryCollector` CR doesn’t mean that it’s enabled. You +need to explicitly enable it. Furthermore, if you want to use +[Prometheus service discovery](https://github.com/open-telemetry/opentelemetry-operator/blob/main/cmd/otel-allocator/README.md#discovery-of-prometheus-custom-resources), +you must explicitly enable it: + +- Set `spec.targetAllocator.enabled` to `true` +- Set `spec.targetAllocator.prometheusCR.enabled` to `true` + +So that your `OpenTelemetryCollector` resource looks like this: + +```yaml +apiVersion: opentelemetry.io/v1beta1 +kind: OpenTelemetryCollector +metadata: + name: otelcol + namespace: opentelemetry +spec: + mode: statefulset + targetAllocator: + enabled: true + serviceAccount: opentelemetry-targetallocator-sa + prometheusCR: + enabled: true +``` + +See the full `OpenTelemetryCollector` +[resource definition in "Do you know if metrics are actually being scraped?"](#do-you-know-if-metrics-are-actually-beingscraped). + +### Did you configure a ServiceMonitor (or PodMonitor) selector? + +If you configured a +[`ServiceMonitor`](https://observability.thomasriley.co.uk/prometheus/configuring-prometheus/using-service-monitors/#:~:text=The%20ServiceMonitor%20is%20used%20to,build%20the%20required%20Prometheus%20configuration.) +selector, it means that the Target Allocator only looks for `ServiceMonitors` +having a `metadata.label` that matches the value in +[`serviceMonitorSelector`](https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#opentelemetrycollectorspectargetallocatorprometheuscr-1). + +Suppose that you configured a +[`serviceMonitorSelector`](https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#opentelemetrycollectorspectargetallocatorprometheuscr-1) +for your Target Allocator, like in the following example: + +```yaml +apiVersion: opentelemetry.io/v1beta1 +kind: OpenTelemetryCollector +metadata: + name: otelcol + namespace: opentelemetry +spec: + mode: statefulset + targetAllocator: + enabled: true + serviceAccount: opentelemetry-targetallocator-sa + prometheusCR: + enabled: true + serviceMonitorSelector: + matchLabels: + app: my-app +``` + +By setting the value of +`spec.targetAllocator.prometheusCR.serviceMonitorSelector.matchLabels` to +`app: my-app`, it means that your `ServiceMonitor` resource must in turn have +that same value in `metadata.labels`: + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + name: sm-example + labels: + app: my-app + release: prometheus +spec: +``` + +See the full `ServiceMonitor` +[resource definition in "Do you know if metrics are actually being scraped?"](#do-you-know-if-metrics-are-actually-beingscraped). + +In this case, the `OpenTelemetryCollector` resource's +`prometheusCR.serviceMonitorSelector.matchLabels` is looking only for +`ServiceMonitors` having the label `app: my-app`, which we see in the previous +example. + +If your `ServiceMonitor` resource is missing that label, then the Target +Allocator will fail to discover scrape targets from that `ServiceMonitor`. + +{{% alert title="Tip" %}} + +The same applies if you’re using a +[PodMonitor](https://prometheus-operator.dev/docs/user-guides/getting-started/#using-podmonitors). +In that case, you would use a +[`podMonitorSelector`](https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#opentelemetrycollectorspectargetallocatorprometheuscr) +instead of a `serviceMonitorSelector`. + +{{% /alert %}} + +### Did you leave out the serviceMonitorSelector and/or podMonitorSelector configuration altogether? + +As mentioned in +["Did you configure a ServiceMonitor or PodMonitor selector"](#did-you-configure-a-servicemonitor-or-podmonitor-selector), +setting mismatched values for `serviceMonitorSelector` and `podMonitorSelector` +results in the Target Allocator failing to discover scrape targets from your +`ServiceMonitors` and `PodMonitors`, respectively. + +Similarly, in +[`v1beta1`](https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#opentelemetrycollector-1) +of the `OpenTelemetryCollector` CR, leaving out this configuration altogether +also results in the Target Allocator failing to discover scrape targets from +your `ServiceMonitors` and `PodMonitors`. + +As of `v1beta1` of the `OpenTelemetryOperator`, a `serviceMonitorSelector` and +`podMonitorSelector` must be included, even if you don’t intend to use it, like +this: + +```yaml +prometheusCR: + enabled: true + podMonitorSelector: {} + serviceMonitorSelector: {} +``` + +This configuration means that it will match on all `PodMonitor` and +`ServiceMonitor` resources. See the +[full OpenTelemetryCollector definition in "Do you know if metrics are actually being scraped?"](#do-you-know-if-metrics-are-actually-beingscraped). + +### Do your labels, namespaces, and ports match for your ServiceMonitor and your Service (or PodMonitor and your Pod)? + +The `ServiceMonitor` is configured to pick up Kubernetes +[Services](https://kubernetes.io/docs/concepts/services-networking/service/) +that match on: + +- Labels +- Namespaces (optional) +- Ports (endpoints) + +Suppose that you have this `ServiceMonitor`: + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + name: sm-example + labels: + app: my-app + release: prometheus +spec: + selector: + matchLabels: + app: my-app + namespaceSelector: + matchNames: + - opentelemetry + endpoints: + - port: prom + path: /metrics + - port: py-client-port + interval: 15s + - port: py-server-port +``` + +The previous `ServiceMonitor` is looking for any services that have: + +- the label `app: my-app` +- reside in a namespace called `opentelemetry` +- a port named `prom`, `py-client-port`, _or_ `py-server-port` + +For example, the following `Service` resource would get picked up by the +`ServiceMonitor`, because it matches the previous criteria: + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: py-prometheus-app + namespace: opentelemetry + labels: + app: my-app + app.kubernetes.io/name: py-prometheus-app +spec: + selector: + app: my-app + app.kubernetes.io/name: py-prometheus-app + ports: + - name: prom + port: 8080 +``` + +The following `Service` resource would not be picked up, because the +`ServiceMonitor` is looking for ports named `prom`, `py-client-port`, _or_ +`py-server-port`, and this service’s port is called `bleh`. + +```yaml +apiVersion: v1 +kind: Service +metadata: + name: py-prometheus-app + namespace: opentelemetry + labels: + app: my-app + app.kubernetes.io/name: py-prometheus-app +spec: + selector: + app: my-app + app.kubernetes.io/name: py-prometheus-app + ports: + - name: bleh + port: 8080 +``` + +{{% alert title="Tip" %}} + +If you’re using `PodMonitor`, the same applies, except that it picks up +Kubernetes pods that match on labels, namespaces, and named ports. + +{{% /alert %}} diff --git a/static/refcache.json b/static/refcache.json index 49330b5568dd..c90df904e258 100644 --- a/static/refcache.json +++ b/static/refcache.json @@ -6115,6 +6115,10 @@ "StatusCode": 200, "LastSeen": "2024-02-09T11:48:44.205582+01:00" }, + "https://observability.thomasriley.co.uk/prometheus/configuring-prometheus/using-service-monitors/#:~:text=The%20ServiceMonitor%20is%20used%20to,build%20the%20required%20Prometheus%20configuration.": { + "StatusCode": 206, + "LastSeen": "2024-06-18T13:27:45.202877-04:00" + }, "https://observiq.com/blog/what-are-connectors-in-opentelemetry/": { "StatusCode": 206, "LastSeen": "2024-01-30T06:06:02.410999-05:00" @@ -7703,6 +7707,14 @@ "StatusCode": 206, "LastSeen": "2024-01-30T06:01:24.93578-05:00" }, + "https://prometheus-operator.dev/docs/operator/design/#servicemonitor": { + "StatusCode": 206, + "LastSeen": "2024-06-18T16:43:08.829675-04:00" + }, + "https://prometheus-operator.dev/docs/user-guides/getting-started/#using-podmonitors": { + "StatusCode": 206, + "LastSeen": "2024-06-18T13:27:46.505689-04:00" + }, "https://prometheus.io": { "StatusCode": 206, "LastSeen": "2024-01-18T19:07:18.12399-05:00"