Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native oTel Support #69

Open
chambear2809 opened this issue Aug 5, 2024 · 3 comments
Open

Native oTel Support #69

chambear2809 opened this issue Aug 5, 2024 · 3 comments

Comments

@chambear2809
Copy link

Problem Description
I'm always frustrated when trying to integrate various monitoring solutions due to the lack of a standardized format and semantics. OpenTelemetry (oTel) and the OpenTelemetry Protocol (oTLP) offer a standard approach, but many solutions don't natively support it. This results in redundant work and increased complexity in data integration.

Desired Solution
I would like to see a native OpenTelemetry (oTel) Exporter implemented, enabling the seamless export of data (primarily Metrics from the MELT stack) to any backend that supports oTel, such as Splunk Observability Cloud, DynaTrace, Loki, etc. This would simplify data integration and enhance interoperability among observability products.

Alternatives Considered
SNMPoTel Project: While the SNMPoTel project provides a solution, SNMP itself is cumbersome and adds unnecessary complexity. A native oTel Exporter would offer a more streamlined and efficient approach. More details on SNMPoTel can be found here.
Additional Context
A wide range of observability products already leverage OpenTelemetry, making it a robust standard for sending and contextualizing data across different backends. Implementing a native oTel Exporter would align with this trend and provide significant benefits in standardization and ease of integration.

@thenodon
Copy link
Member

thenodon commented Aug 8, 2024

Hi @chambear2809 and thanks for your request. If the aci-exporter would be native otlp it would require a lot of additional development that prometheus already do. This include the scheduling and the discovery. Both these parts would be need to integrate into the aci-exporter. From my point of view this is a different concerns and should not be part of aci-exporter. The aci-exporter's concern is to enable execution of aci queries against apics, spines and leafs in the most efficient and feature rich way and to expose the result as metrics. But your requirement to get these metrics according to the Open telemetry standard is super valid requirement. The solution is to use the open telemetry collector with the prometheus receiver. This is like a stripped down version of prometheus that include the key parts like scheduling and discovery. Here is a simple otel collector configuration to achieve this:

receivers:
  prometheus:
    config:
      scrape_configs:

        # Job for APIC queries
        - job_name: 'aci'
          scrape_interval: 1m
          scrape_timeout: 30s
          metrics_path: /probe
          params:
            queries:
              - health,fabric_node_info,object_count,max_capacity

          http_sd_configs:
            # discovery all fabrics
            # To discover an individual fabric use - url: "http://localhost:9643/sd?target=<fabric>"
            - url: "http://localhost:9643/sd"
              refresh_interval: 5m

          relabel_configs:
            - source_labels: [ __meta_role ]
              # Only include the aci_exporter_fabric __meta_role
              regex: "aci_exporter_fabric"
              action: "keep"

            - source_labels: [ __address__ ]
              target_label: __param_target
            - source_labels: [ __param_target ]
              target_label: instance
            - target_label: __address__
              replacement: 127.0.0.1:9643

        # Job for ACI nodes based on discovery
        - job_name: 'aci_nodes'
          scrape_interval: 1m
          scrape_timeout: 30s
          metrics_path: /probe
          params:
            # OBS make sure to specify queries that only works for nodes AND have correct label regex for node based response
            queries:
              - interface_info
              - interface_rx_stats
              - interface_tx_stats
              - interface_rx_err_stats
              - interface_tx_err_stats

          http_sd_configs:
            # discovery all fabrics
            # To discover an individual fabric use - url: "http://localhost:9643/sd?target=<fabric>"
            - url: "http://localhost:9643/sd"
              refresh_interval: 5m

          relabel_configs:
            - source_labels: [ __meta_role ]
              # Only include the spine and leaf __meta_role
              regex: "(spine|leaf)"
              action: "keep"

            # Get the target param from __address__ that is <fabric>#<oobMgmtAddr> by default
            - source_labels: [ __address__ ]
              separator: "#"
              regex: (.*)#(.*)
              replacement: $$1
              target_label: __param_target

            # Get the node param from __address__ that is <fabric>#<oobMgmtAddr> by default
            - source_labels: [ __address__ ]
              separator: "#"
              regex: (.*)#(.*)
              replacement: $$2
              target_label: __param_node

            # Set instance to the ip/hostname from the __param_node
            - source_labels: [ __param_node ]
              target_label: instance

            # Add labels from discovery
            - source_labels: [ __meta_fabricDomain ]
              target_label: aci
            - source_labels: [ __meta_id ]
              target_label: nodeid
            - source_labels: [ __meta_podId ]
              target_label: podid
            - source_labels: [ __meta_role ]
              target_label: role
            - source_labels: [ __meta_name ]
              target_label: name

            - target_label: __address__
              replacement: 127.0.0.1:9643

processors:
  batch:
    timeout: 10s

exporters:
  otlp:
    endpoint: XYZ

service:
  pipelines:
    metrics:
      receivers:
        - prometheus
      processors:
        - batch
      exporters: [otlp]

As you can see the prometheus config is almost exactly like https://github.com/opsdis/aci-exporter/blob/master/prometheus/prometheus_nodes.yml, except the the relabeling variables must be quoted with a $, so $1 must be $$1.
The otel collector is an excellent tool to manage the processing, transformation and routing of open telemetry data. So to be able to send aci-exporter metrics to Splunk, DataDog, etc it is just to add additional exporters.
If you test this against something else than a Prometheus compatible storage it would be great to here about your experience.

@thenodon
Copy link
Member

@chambear2809 do you like to add anything to this issue?

@chambear2809
Copy link
Author

@thenodon I still need to get this setup and send some data. I have been swamped post travel for GSX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants