Skip to content

Commit

Permalink
README: add more comprehensive instructions for running the agent
Browse files Browse the repository at this point in the history
this adds more comprehensive instructions to run the agent, and
separate instructions for Kubernetes / Docker and running directly
on the host (while recommending the first two).
  • Loading branch information
Gandem committed Sep 18, 2024
1 parent 054e5e3 commit e450d1e
Show file tree
Hide file tree
Showing 4 changed files with 198 additions and 85 deletions.
96 changes: 11 additions & 85 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,111 +1,37 @@
# Datadog Fork

This is an experimental fork of [elastic/otel-profiling-agent](https://github.com/elastic/otel-profiling-agent). The upstream project is in the process of being [donated](https://github.com/open-telemetry/community/issues/1918) to the OpenTelemetry project. Please refer to our [documentation](https://docs.datadoghq.com/profiler/) for a list of offically supported Datadog profilers.
This is an experimental fork of [elastic/otel-profiling-agent](https://github.com/elastic/otel-profiling-agent). The upstream project is in the process of being [donated](https://github.com/open-telemetry/community/issues/1918) to the OpenTelemetry project. Please refer to our [documentation](https://docs.datadoghq.com/profiler/) for a list of officially supported Datadog profilers.

Our fork adds support for sending profiling data to the Datadog backend via the Datadog Agent. We are active members of the OpenTelemetry Profiling SIG that is working on the OpenTelemetry profiling signal. However, the signal is still under active development, so this fork can be used by Datadog users until we release our support for directly ingesting the data using OTLP.

## Requirements

The otel-profiling-agent requires the following Linux kernel versions:
The otel-profiling-agent only runs on Linux, and requires the following Linux kernel versions:
* Kernel version 4.19 or newer for amd64/x86_64
* Kernel version 5.5 or newer for arm64/aarch64

## Installation
## Running the profiler

Download pre-built amd64 and arm64 binaries for our [latest release](https://github.com/DataDog/otel-profiling-agent/releases/latest).
If the host is running workloads inside containers, it is recommended to run the profiler inside a container as well. A container image is available at https://github.com/DataDog/otel-profiling-agent/pkgs/container/otel-profiling-agent/.

Alternatively, you can build the agent from source. The following instructions assume you have docker installed.
If you're using Kubernetes, please follow the documentation here: [Running in Kubernetes](doc/running-in-kubernetes.md). If you're directly using Docker, please follow the documentation here: [Running in Docker](doc/running-in-docker.md).

<details>
<summary>Manual build instructions</summary>
<br />

To build the agent, you can use the following commands:
If you're not using any container runtime, please check this section to run the profiler directly on the host: [Running on the host](doc/running-on-host.md).

```
make docker-image
make agent
```

This will create a `otel-profiling-agent` binary in the current directory.

</details>

## Run

To run the agent, you need to make sure that debugfs is mounted. If it's not, you can run:

```
sudo mount -t debugfs none /sys/kernel/debug
```

After that, you can start the agent as shown below (make sure you run it as root):

```
sudo otel-profiling-agent -tags 'service:myservice' -collection-agent "http://localhost:8126" -reporter-interval 60s -samples-per-second 20
```

For this to work you need to run a Datadog agent that listens for APM traffic at `localhost:8126`. If your agent is reachable under a different address, you can modify the `-collection-agent` parameter accordingly.

## Running inside a container

#### Requirements

When running the agent in a container, you need to ensure the following conditions are met:
* The container is running in privileged mode.
* The container has the `SYS_ADMIN` capability.
* The container has Host PID enabled (and procMount: "Unmasked").
* The host's debugfs filesystem is mounted to the container (in read-only mode).
* The agent is running as root inside the container.

#### Container name resolution

To be able to resolve container names, the agent needs to be able to access the underlying container runtime (in read-only mode). The agent supports Docker and containerd.

To enable this feature, you need to mount the container runtime socket to the agent container in read-only mode (`/var/run/docker.sock` for Docker, `/run/containerd/containerd.sock` for containerd).

#### Pod name resolution

To be able to resolve pod names in Kubernetes, the agent needs to be able to:

1. Get the `KUBERNETS_NODE_NAME` environment variable:
```yaml
env:
- name: KUBERNETES_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
```
2. Access the underlying Kubernetes API server. This is usually done through a ClusterRole and ClusterRoleBinding with the following permissions:
```yaml
rules:
- verbs:
- get
- watch
- list
resources:
- nodes
- pods
apiGroups:
- ""
```
## Configuration
## Configuring the agent

### Local symbol upload (Experimental)

For compiled languages (C/C++/Rust/Go), the profiling-agent can upload local symbols (when available) to Datadog for symbolication. Symbols need to be available locally (unstripped binaries).
For compiled languages (C/C++/Rust/Go), the profiler can upload local symbols (when available) to Datadog for symbolication. Symbols need to be available locally (unstripped binaries).

To enable local symbol upload:
1. Set the `DD_EXPERIMENTAL_LOCAL_SYMBOL_UPLOAD` environment variable to `true`.
2. Provide a Datadog API key through the `DD_API_KEY` environment variable.
3. Set the `DD_SITE` environment variable to [your Datadog site](https://docs.datadoghq.com/getting_started/site/#access-the-datadog-site) (e.g. `datadoghq.com`).


## Development

A `docker-compose.yml` file is provided to help run the agent in a container for local development.
A `docker-compose.yml` file is provided to help run the profiler in a container for local development.

First, create a `.env` file with the following content:

Expand All @@ -118,13 +44,13 @@ OTEL_PROFILING_AGENT_REPORTER_INTERVAL=10s # optional, defaults to 60s
DD_EXPERIMENTAL_LOCAL_SYMBOL_UPLOAD=true # optional, defaults to false
```

Then, you can run the agent with the following command:
Then, you can run the profiler with the following command:

```
docker-compose up
```

The agent will submit profiling data to the Datadog Agent using the value of OTEL_PROFILING_AGENT_SERVICE as the service name.
The profiler will submit profiling data to the Datadog Agent using the value of OTEL_PROFILING_AGENT_SERVICE as the service name.


The contents of the original upstream README are below.
Expand Down
34 changes: 34 additions & 0 deletions doc/running-in-docker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Running the agent in Docker

This document is a guide to running the application in a Docker container.

## Prerequisites

The datadog-agent must be running on the host and configured to collect APM data (this is enabled by default in the agent, unless you explicitly disabled it). See https://docs.datadoghq.com/containers/docker/apm/ for more information.

For the purposes of this guide, we assume that the datadog agent is accessible at a specific address from the docker container: `http://<agent_address>:8126`.

## Running the profiler

See https://github.com/DataDog/otel-profiling-agent/pkgs/container/otel-profiling-agent/ for a container image that can be used to run the profiler.

To run the profiler in Docker, you should ensure the following requirements are met (see example below):
1. The container has host PID enabled.
2. The container is running in privileged mode.
3. The container has the `SYS_ADMIN` capability.
4. The `OTEL_PROFILING_AGENT_COLLECTION_AGENT` environment variable is set to the address of the Datadog agent: `http://<agent_address>:8126`.

Additionally, to be able to resolve container names, the profiler needs access to the container runtime socket. This is done by mounting the container runtime socket into the profiler container.

### Example command to run the profiler in Docker

```bash
docker run \
--pid=host \
--privileged \
--cap-add=SYS_ADMIN \
-e OTEL_PROFILING_AGENT_COLLECTION_AGENT=http://<agent_address>:8126 \
-e OTEL_PROFILING_AGENT_TAGS="service:$(hostname)" \
-v /var/run/docker.sock:/var/run/docker.sock \
ghcr.io/datadog/otel-profiling-agent:latest
```
109 changes: 109 additions & 0 deletions doc/running-in-kubernetes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Running the profiler in Kubernetes

This document is a guide to running the profiler in a Kubernetes cluster.

## Prerequisites

The datadog-agent must be running in the cluster and configured to collect APM data (this is enabled by default in the agent, unless you explicitly disabled it). See https://docs.datadoghq.com/containers/kubernetes/apm/ for more information.

For the purposes of this guide, we assume that the datadog agent is accessible at a specific address: `http://<agent_address>:8126`.

## Running the profiler

See https://github.com/DataDog/otel-profiling-agent/pkgs/container/otel-profiling-agent/ for a container image that can be used to run the profiler.

To run the profiler in a Kubernetes cluster, you should ensure the following requirements are met (see example below):
1. The container has host PID enabled.
2. The container is running in privileged mode.
3. The `procMount` security context field is set to `Unmasked`.
4. The container has the `SYS_ADMIN` capability.
5. The `OTEL_PROFILING_AGENT_COLLECTION_AGENT` environment variable is set to the address of the Datadog agent: `http://<agent_address>:8126`.

Additionally, to be able to resolve pod names in Kubernetes, the profiler needs:
* The `KUBERNETES_NODE_NAME` environment variable set to the name of the node where the profiler is running.
* A ClusterRole and ClusterRoleBinding configured (see below).

### Example spec

The profiler pod spec excerpt:
```yaml
apiVersion: apps/v1
# ...
spec:
# ...
template:
# ...
spec:
# ...
serviceAccountName: <my-service-account> # The service account used
hostPID: true # Setting hostPID to true (1.)
containers:
- name: otel-profiling-agent
securityContext:
runAsUser: 0
privileged: true # Running in privileged mode (2.)
procMount: Unmasked # Setting procMount to Unmasked (3.)
capabilities:
add:
- SYS_ADMIN # Adding SYS_ADMIN capability (4.)
env:
- name: OTEL_PROFILING_AGENT_COLLECTION_AGENT # The address of the Datadog agent (5.)
value: "http://<agent_address>:8126"
- name: KUBERNETES_NODE_NAME # this is needed to resolve pod names in Kubernetes
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: OTEL_PROFILING_AGENT_TAGS
value: "service:$(KUBERNETES_NODE_NAME)" # will inherit the variable set above
# ...
volumeMounts:
- name: containerd # Or alternatively, docker if using docker. This is required to be able to resolve container names.
mountPath: /run/containerd/containerd.sock # Or alternatively, /var/run/docker.sock
# ...
volumes:
- name: containerd # Or alternatively, docker if using docker
hostPath:
path: /run/containerd/containerd.sock # Or alternatively, /var/run/docker.sock
type: Socket
# ...
```

You will also need to create a ServiceAccount, ClusterRole, and ClusterRoleBinding for the profiler to be able to list pods in the cluster. Here is an example:
```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: <my-service-account>
namespace: <my-service-account-namespace>
# ...
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: <my-cluster-role>
# ...
rules:
- apiGroups:
- ""
resources:
- nodes
- pods
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: <my-cluster-role-binding>
# ...
subjects:
- kind: ServiceAccount
name: <my-service-account>
namespace: <my-service-account-namespace>
roleRef:
kind: ClusterRole
name: <my-cluster-role>
apiGroup: rbac.authorization.k8s.io
```
44 changes: 44 additions & 0 deletions doc/running-on-host.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Running the agent directly on the host

## Prerequisites

The datadog-agent must be running on the host and configured to collect APM data (this is enabled by default in the agent, unless you explicitly disabled it). See agent installation instructions [here](https://docs.datadoghq.com/agent/) and the flag to enable APM [here](https://github.com/DataDog/datadog-agent/blob/8a80bcd1c1460ba9caa97d974568bd9d0c702f3f/pkg/config/config_template.yaml#L1036-L1042).

For the purposes of this guide, we assume that the datadog agent is accessible at a specific address from the docker container: `http://localhost:8126`.

## Installation

Download pre-built amd64 and arm64 binaries for our [latest release](https://github.com/DataDog/otel-profiling-agent/releases/latest).

Alternatively, you can build the profiler from source. The following instructions assume you have docker installed.

<details>
<summary>Manual build instructions</summary>
<br />

To build the profiler, you can use the following commands:

```
make docker-image
make agent
```

This will create a `otel-profiling-agent` binary in the current directory.

</details>

## Running the profiler

To run the profiler, you need to make sure that debugfs is mounted. If it's not, you can run:

```
sudo mount -t debugfs none /sys/kernel/debug
```

After that, you can start the profiler as shown below (make sure you run it as root):

```
sudo otel-profiling-agent -tags "service:$(hostname)" -collection-agent "http://localhost:8126"
```

If your datadog agent is reachable under a different address, you can modify the `-collection-agent` parameter accordingly.

0 comments on commit e450d1e

Please sign in to comment.