Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cgroups #29621

Merged
merged 8 commits into from
Feb 16, 2024
Merged

Cgroups #29621

Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .chloggen/cgroups.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: hostmetricsreceiver

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add support for cgroup information as a new resource attribute on process metrics, as `process.cgroup`
atoulme marked this conversation as resolved.
Show resolved Hide resolved

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [29282]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: []
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ type Config struct {
// the collector does not have permission for.
MuteProcessIOError bool `mapstructure:"mute_process_io_error,omitempty"`

// MuteProcessCgroupError is a flag that will mute the error encountered when trying to read the cgroup of a process
// the collector does not have permission for.
MuteProcessCgroupError bool `mapstructure:"mute_process_cgroup_error,omitempty"`
atoulme marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this option from the start? It's an optional metric and can be disabled if it results in errors

Copy link
Member

@andrzej-stencel andrzej-stencel Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need an option to mute errors even for optional metrics.

The problem is (I think) that the errors in the process scraper are usually huge - reporting an error for every process on the system, and that they're repeatable - reporting the same thing on every scrape. Perhaps going forward we could do something more clever than adding another "mute" option. Here are some thoughts:

  1. Make the error messages shorter by aggregating the duplicate error messages,
  2. Only display a specific type of error when it happens for the first time,
  3. Add metrics counting the occurrences of each type of error - especially important if we do 2. as the default behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A solution I'd come up with for this is a new type of error that a scraper can report to log errors at a different level. This combined with option 3 would be a pretty good outcome I think; there would be metrics reported for the different types of errors, and these errors could be logged at debug level by the scraper controller.

This is the issue I have open on the collector repo with accompanying PR: open-telemetry/opentelemetry-collector#8293


// ResilientProcessScraping is a flag that will let the collector continue reading a process even when
// the collector does not have permission to read it's executable path (Linux)
MuteProcessExeError bool `mapstructure:"mute_process_exe_error,omitempty"`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,7 @@ Process threads count.

| Name | Description | Values | Enabled |
| ---- | ----------- | ------ | ------- |
| process.cgroup | cgroup associated with the process. | Any Str | false |
| process.command | The command used to launch the process (i.e. the command name). On Linux based systems, can be set to the zeroth string in proc/[pid]/cmdline. On Windows, can be set to the first parameter extracted from GetCommandLineW. | Any Str | true |
| process.command_line | The full command used to launch the process as a single string representing the full command. On Windows, can be set to the result of GetCommandLineW. Do not set this if you have to assemble it just for monitoring; use process.command_args instead. | Any Str | true |
| process.executable.name | The name of the process executable. On Linux based systems, can be set to the Name in proc/[pid]/status. On Windows, can be set to the base name of GetProcessImageFileNameW. | Any Str | true |
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ all_set:
process.threads:
enabled: true
resource_attributes:
process.cgroup:
enabled: true
process.command:
enabled: true
process.command_line:
Expand Down Expand Up @@ -71,6 +73,8 @@ none_set:
process.threads:
enabled: false
resource_attributes:
process.cgroup:
enabled: false
process.command:
enabled: false
process.command_line:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ resource_attributes:
description: The username of the user that owns the process.
enabled: true
type: string
process.cgroup:
description: cgroup associated with the process.
atoulme marked this conversation as resolved.
Show resolved Hide resolved
enabled: false
type: string

attributes:
direction:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,14 @@ package processscraper // import "github.com/open-telemetry/opentelemetry-collec

import (
"context"
"os"
"path/filepath"
"runtime"
"strconv"
"strings"
"time"

"github.com/shirou/gopsutil/v3/common"
"github.com/shirou/gopsutil/v3/cpu"
"github.com/shirou/gopsutil/v3/process"
"go.opentelemetry.io/collector/pdata/pcommon"
Expand All @@ -31,8 +35,9 @@ type processMetadata struct {
}

type executableMetadata struct {
name string
path string
name string
path string
cgroup string
}

type commandMetadata struct {
Expand All @@ -46,6 +51,7 @@ func (m *processMetadata) buildResource(rb *metadata.ResourceBuilder) pcommon.Re
rb.SetProcessParentPid(int64(m.parentPid))
rb.SetProcessExecutableName(m.executable.name)
rb.SetProcessExecutablePath(m.executable.path)
rb.SetProcessCgroup(m.executable.cgroup)
if m.command != nil {
rb.SetProcessCommand(m.command.command)
if m.command.commandLineSlice != nil {
Expand Down Expand Up @@ -91,14 +97,15 @@ type processHandle interface {
NumFDsWithContext(context.Context) (int32, error)
// If gatherUsed is true, the currently used value will be gathered and added to the resulting RlimitStat.
RlimitUsageWithContext(ctx context.Context, gatherUsed bool) ([]process.RlimitStat, error)
CgroupWithContext(ctx context.Context) (string, error)
}

type gopsProcessHandles struct {
handles []*process.Process
handles []wrappedProcessHandle
}

func (p *gopsProcessHandles) Pid(index int) int32 {
return p.handles[index].Pid
return p.handles[index].Process.Pid
}

func (p *gopsProcessHandles) At(index int) processHandle {
Expand All @@ -109,13 +116,51 @@ func (p *gopsProcessHandles) Len() int {
return len(p.handles)
}

type wrappedProcessHandle struct {
*process.Process
}

func (p wrappedProcessHandle) CgroupWithContext(ctx context.Context) (string, error) {
pid := p.Process.Pid
statPath := getEnvWithContext(ctx, string(common.HostProcEnvKey), "/proc", strconv.Itoa(int(pid)), "cgroup")
contents, err := os.ReadFile(statPath)
if err != nil {
return "", err
}

return strings.TrimSuffix(string(contents), "\n"), nil
}

// copied from gopsutil:
atoulme marked this conversation as resolved.
Show resolved Hide resolved
// GetEnvWithContext retrieves the environment variable key. If it does not exist it returns the default.
// The context may optionally contain a map superseding os.EnvKey.
func getEnvWithContext(ctx context.Context, key string, dfault string, combineWith ...string) string {
var value string
if env, ok := ctx.Value(common.EnvKey).(common.EnvMap); ok {
value = env[common.EnvKeyType(key)]
}
if value == "" {
value = os.Getenv(key)
}
if value == "" {
value = dfault
}
segments := append([]string{value}, combineWith...)

return filepath.Join(segments...)
}

func getProcessHandlesInternal(ctx context.Context) (processHandles, error) {
processes, err := process.ProcessesWithContext(ctx)
if err != nil {
return nil, err
}
wrapped := make([]wrappedProcessHandle, len(processes))
for i, p := range processes {
wrapped[i] = wrappedProcessHandle{Process: p}
}

return &gopsProcessHandles{handles: processes}, nil
return &gopsProcessHandles{handles: wrapped}, nil
}

func parentPid(ctx context.Context, handle processHandle, pid int32) (int32, error) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -203,8 +203,15 @@ func (s *scraper) getProcessMetadata() ([]*processMetadata, error) {
}
continue
}
cgroup, err := getProcessCgroup(ctx, handle)
if err != nil {
if !s.config.MuteProcessCgroupError {
errs.AddPartial(1, fmt.Errorf("error reading process cgroup for pid %v: %w", pid, err))
}
continue
}

executable := &executableMetadata{name: name, path: exe}
executable := &executableMetadata{name: name, path: exe, cgroup: cgroup}

// filter processes by name
if (s.includeFS != nil && !s.includeFS.Matches(executable.name)) ||
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ func getProcessName(ctx context.Context, proc processHandle, _ string) (string,
return name, nil
}

func getProcessCgroup(_ context.Context, _ processHandle) (string, error) {
return "", nil
}

func getProcessExecutable(ctx context.Context, proc processHandle) (string, error) {
cmdline, err := proc.CmdlineWithContext(ctx)
if err != nil {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,15 @@ func getProcessExecutable(ctx context.Context, proc processHandle) (string, erro
return exe, nil
}

func getProcessCgroup(ctx context.Context, proc processHandle) (string, error) {
cgroup, err := proc.CgroupWithContext(ctx)
if err != nil {
return "", err
}

return cgroup, nil
}

func getProcessCommand(ctx context.Context, proc processHandle) (*commandMetadata, error) {
cmdline, err := proc.CmdlineSliceWithContext(ctx)
if err != nil {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@ func getProcessName(context.Context, processHandle, string) (string, error) {
return "", nil
}

func getProcessCgroup(ctx context.Context, proc processHandle) (string, error) {
return "", nil
}

func getProcessExecutable(context.Context, processHandle) (string, error) {
return "", nil
}
Expand Down
Loading