forked from open-telemetry/opentelemetry-collector-contrib
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[extension/cgroupruntime]: Initial implementation (open-telemetry#35472)
**Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> This PR adds the initial implementation of a new component to dynamically set the values of `GOMEMLIMIT` and `GOMAXPROCS` used by the Go runtime. Those values are normally manually aligned with the cgroup resource limit to prevent cpu throttling or out of memory scenarios. The component would ease the manual steps of configuring these environment variables in K8s deployments (e.g Helm [templates](https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/templates/_helpers.tpl#L169)) in addition to have fine-grained values (e.g. 90% of the resource memory limits). **Link to tracking Issue:** <Issue number if applicable> open-telemetry#30289 **Testing:** <Describe what testing was performed and which tests were added.> Unit testing for the component has been added (config and extension start/stop). But ideally, an integration test that actually asserts the runtime modifications should be added as well. The extension relies on "github.com/KimMachineGun/automemlimit/memlimit" and "go.uber.org/automaxprocs/maxprocs" packages for the runtime modifications, but they don't provide a way to mock the "cgroups" file system which is the one they read to get the resource quota limits. - Automemlimit package tests expect to run in a cgroup environment: https://github.com/KimMachineGun/automemlimit/blob/main/memlimit/cgroups_test.go#L18 - Automaxprocs does not expose the cpu quota retrieval https://github.com/uber-go/automaxprocs/blob/master/maxprocs/maxprocs.go#L41 Any suggestion on how to perform this integration tests in the contrib repository? One possibility is to use the https://github.com/containerd/cgroups package to set the quota, but this requires privileged permissions (also in the GHA) **Documentation:** <Describe the documentation added.> --------- Co-authored-by: Pablo Baeyens <[email protected]>
- Loading branch information
1 parent
69236b0
commit 60b58fb
Showing
23 changed files
with
684 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Use this changelog template to create an entry for release notes. | ||
|
||
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' | ||
change_type: new_component | ||
|
||
# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver) | ||
component: extension/cgroupruntime | ||
|
||
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). | ||
note: Initial implementation for cgroupruntime extension. | ||
|
||
# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists. | ||
issues: [30289] | ||
|
||
# (Optional) One or more lines of additional information to render under the primary note. | ||
# These lines will be padded with 2 spaces and then inserted directly into the document. | ||
# Use pipe (|) for multiline entries. | ||
subtext: | ||
|
||
# If your change doesn't affect end users or the exported elements of any package, | ||
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label. | ||
# Optional: The change log or logs in which this entry should be included. | ||
# e.g. '[user]' or '[user, api]' | ||
# Include 'user' if the change is relevant to end users. | ||
# Include 'api' if there is a change to a library API. | ||
# Default: '[user]' | ||
change_logs: [] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
include ../../Makefile.Common |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Cgroup Go runtime extension | ||
|
||
|
||
<!-- status autogenerated section --> | ||
| Status | | | ||
| ------------- |-----------| | ||
| Stability | [development] | | ||
| Distributions | [contrib] | | ||
| Issues | [![Open issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aopen%20label%3Aextension%2Fcgroupruntime%20&label=open&color=orange&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aopen+is%3Aissue+label%3Aextension%2Fcgroupruntime) [![Closed issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aclosed%20label%3Aextension%2Fcgroupruntime%20&label=closed&color=blue&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aclosed+is%3Aissue+label%3Aextension%2Fcgroupruntime) | | ||
| [Code Owners](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/CONTRIBUTING.md#becoming-a-code-owner) | [@mx-psi](https://www.github.com/mx-psi), [@rogercoll](https://www.github.com/rogercoll) | | ||
|
||
[development]: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/component-stability.md#development | ||
[contrib]: https://github.com/open-telemetry/opentelemetry-collector-releases/tree/main/distributions/otelcol-contrib | ||
<!-- end autogenerated section --> | ||
|
||
## Overview | ||
|
||
The OpenTelemetry Cgroup Auto-Config Extension is designed to optimize Go runtime performance in containerized environments by automatically configuring GOMAXPROCS and GOMEMLIMIT based on the Linux cgroup filesystem. This extension leverages [automaxprocs](https://github.com/uber-go/automaxprocs) and [automemlimit](https://github.com/KimMachineGun/automemlimit) packages to dynamically adjust Go runtime variables, ensuring efficient resource usage aligned with container limits. | ||
|
||
## Configuration | ||
|
||
The following settings can be configured: | ||
|
||
- `gomaxprocs`: Configures the behavior of setting `GOMAXPROCS`, the maximum number of CPUs for Go runtime. Options: | ||
- `enabled`: A boolean value to enable or disable automatic configuration of `GOMAXPROCS` based on the system’s cgroup settings (default: true). | ||
|
||
- `gomemlimit`: Configures the behavior of setting `GOMEMLIMIT`, the maximum memory limit for Go runtime. Options: | ||
- `enabled`: A boolean value to enable or disable automatic configuration of `GOMEMLIMIT` (default: true). | ||
- `ratio`: A floating-point value between 0 and 1 that represents the fraction of the detected memory limit to allocate for the Go runtime (default: 0.9). | ||
|
||
## Examples | ||
|
||
```yaml | ||
extension: | ||
# processor name: cgroupruntime | ||
cgroupruntime: | ||
gomaxprocs: | ||
enabled: true | ||
gomemlimit: | ||
enabled: true | ||
ratio: 0.8 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
package cgroupruntimeextension // import "github.com/open-telemetry/opentelemetry-collector-contrib/extension/cgroupruntimeextension" | ||
|
||
import "errors" // Config contains the configuration for the cgroup runtime extension. | ||
|
||
type Config struct { | ||
GoMaxProcs GoMaxProcsConfig `mapstructure:"gomaxprocs"` | ||
GoMemLimit GoMemLimitConfig `mapstructure:"gomemlimit"` | ||
} | ||
|
||
type GoMaxProcsConfig struct { | ||
Enabled bool `mapstructure:"enabled"` | ||
} | ||
|
||
type GoMemLimitConfig struct { | ||
Enabled bool `mapstructure:"enabled"` | ||
Ratio float64 `mapstructure:"ratio"` | ||
} | ||
|
||
// Validate checks if the extension configuration is valid | ||
func (cfg *Config) Validate() error { | ||
if cfg.GoMemLimit.Ratio <= 0 || cfg.GoMemLimit.Ratio > 1 { | ||
return errors.New("gomemlimit ratio must be in the (0.0,1.0] range") | ||
} | ||
return nil | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
package cgroupruntimeextension | ||
|
||
import ( | ||
"path/filepath" | ||
"testing" | ||
|
||
"github.com/stretchr/testify/assert" | ||
"github.com/stretchr/testify/require" | ||
"go.opentelemetry.io/collector/component" | ||
"go.opentelemetry.io/collector/confmap/confmaptest" | ||
|
||
"github.com/open-telemetry/opentelemetry-collector-contrib/extension/cgroupruntimeextension/internal/metadata" | ||
) | ||
|
||
func TestLoadConfig(t *testing.T) { | ||
t.Parallel() | ||
|
||
tests := []struct { | ||
id component.ID | ||
expected component.Config | ||
unmarshalErrorMessage string | ||
validateErrorMessage string | ||
}{ | ||
{ | ||
id: component.NewID(metadata.Type), | ||
expected: &Config{ | ||
GoMaxProcs: GoMaxProcsConfig{Enabled: true}, | ||
GoMemLimit: GoMemLimitConfig{ | ||
Enabled: true, | ||
Ratio: 0.9, | ||
}, | ||
}, | ||
}, | ||
{ | ||
id: component.NewIDWithName(metadata.Type, "invalid_ratio"), | ||
validateErrorMessage: "gomemlimit ratio must be in the (0.0,1.0] range", | ||
}, | ||
{ | ||
id: component.NewIDWithName(metadata.Type, "invalid_ratio_disabled"), | ||
validateErrorMessage: "gomemlimit ratio must be in the (0.0,1.0] range", | ||
}, | ||
{ | ||
id: component.NewIDWithName(metadata.Type, "invalid_ratio_negative"), | ||
validateErrorMessage: "gomemlimit ratio must be in the (0.0,1.0] range", | ||
}, | ||
{ | ||
id: component.NewIDWithName(metadata.Type, "invalid_ratio_type"), | ||
unmarshalErrorMessage: "decoding failed due to the following error(s):\n\n'gomemlimit.ratio' expected type 'float64', got unconvertible type 'string', value: 'not_valid'", | ||
}, | ||
} | ||
|
||
for _, tt := range tests { | ||
t.Run(tt.id.String(), func(t *testing.T) { | ||
cm, err := confmaptest.LoadConf(filepath.Join("testdata", "config.yaml")) | ||
require.NoError(t, err) | ||
|
||
factory := NewFactory() | ||
cfg := factory.CreateDefaultConfig() | ||
|
||
sub, err := cm.Sub(tt.id.String()) | ||
require.NoError(t, err) | ||
|
||
if tt.unmarshalErrorMessage != "" { | ||
assert.ErrorContains(t, sub.Unmarshal(cfg), tt.unmarshalErrorMessage) | ||
return | ||
} | ||
require.NoError(t, sub.Unmarshal(cfg)) | ||
|
||
if tt.validateErrorMessage != "" { | ||
assert.EqualError(t, component.ValidateConfig(cfg), tt.validateErrorMessage) | ||
return | ||
} | ||
|
||
assert.NoError(t, component.ValidateConfig(cfg)) | ||
assert.Equal(t, tt.expected, cfg) | ||
}) | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
//go:generate mdatagen metadata.yaml | ||
|
||
package cgroupruntimeextension // import "github.com/open-telemetry/opentelemetry-collector-contrib/extension/cgroupruntimeextension" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
package cgroupruntimeextension // import "github.com/open-telemetry/opentelemetry-collector-contrib/extension/cgroupruntimeextension" | ||
|
||
import ( | ||
"context" | ||
"runtime" | ||
"runtime/debug" | ||
|
||
"go.opentelemetry.io/collector/component" | ||
"go.uber.org/zap" | ||
) | ||
|
||
type ( | ||
undoFunc func() | ||
maxProcsFn func() (undoFunc, error) | ||
memLimitWithRatioFn func(float64) (undoFunc, error) | ||
) | ||
|
||
type cgroupRuntimeExtension struct { | ||
config *Config | ||
logger *zap.Logger | ||
|
||
// runtime modifiers | ||
maxProcsFn | ||
undoMaxProcsFn undoFunc | ||
|
||
memLimitWithRatioFn | ||
undoMemLimitFn undoFunc | ||
} | ||
|
||
func newCgroupRuntime(cfg *Config, logger *zap.Logger, maxProcsFn maxProcsFn, memLimitFn memLimitWithRatioFn) *cgroupRuntimeExtension { | ||
return &cgroupRuntimeExtension{ | ||
config: cfg, | ||
logger: logger, | ||
maxProcsFn: maxProcsFn, | ||
memLimitWithRatioFn: memLimitFn, | ||
} | ||
} | ||
|
||
func (c *cgroupRuntimeExtension) Start(_ context.Context, _ component.Host) error { | ||
var err error | ||
if c.config.GoMaxProcs.Enabled { | ||
c.undoMaxProcsFn, err = c.maxProcsFn() | ||
if err != nil { | ||
return err | ||
} | ||
|
||
c.logger.Info("GOMAXPROCS has been set", | ||
zap.Int("GOMAXPROCS", runtime.GOMAXPROCS(-1)), | ||
) | ||
} | ||
|
||
if c.config.GoMemLimit.Enabled { | ||
c.undoMemLimitFn, err = c.memLimitWithRatioFn(c.config.GoMemLimit.Ratio) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
c.logger.Info("GOMEMLIMIT has been set", | ||
zap.Int64("GOMEMLIMIT", debug.SetMemoryLimit(-1)), | ||
) | ||
} | ||
return nil | ||
} | ||
|
||
func (c *cgroupRuntimeExtension) Shutdown(_ context.Context) error { | ||
if c.undoMaxProcsFn != nil { | ||
c.undoMaxProcsFn() | ||
} | ||
if c.undoMemLimitFn != nil { | ||
c.undoMemLimitFn() | ||
} | ||
|
||
return nil | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
package cgroupruntimeextension | ||
|
||
import ( | ||
"context" | ||
"testing" | ||
|
||
"github.com/stretchr/testify/require" | ||
"go.opentelemetry.io/collector/component/componenttest" | ||
"go.opentelemetry.io/collector/extension/extensiontest" | ||
) | ||
|
||
func TestExtension(t *testing.T) { | ||
tests := []struct { | ||
name string | ||
config *Config | ||
expectedCalls int | ||
}{ | ||
{ | ||
name: "all enabled", | ||
config: &Config{ | ||
GoMaxProcs: GoMaxProcsConfig{ | ||
Enabled: true, | ||
}, | ||
GoMemLimit: GoMemLimitConfig{ | ||
Enabled: true, | ||
Ratio: 0.5, | ||
}, | ||
}, | ||
expectedCalls: 4, | ||
}, | ||
{ | ||
name: "everything disabled", | ||
config: &Config{ | ||
GoMaxProcs: GoMaxProcsConfig{ | ||
Enabled: false, | ||
}, | ||
GoMemLimit: GoMemLimitConfig{ | ||
Enabled: false, | ||
}, | ||
}, | ||
expectedCalls: 0, | ||
}, | ||
} | ||
|
||
for _, test := range tests { | ||
t.Run(test.name, func(t *testing.T) { | ||
allCalls := 0 | ||
var _err error | ||
setterMock := func() (undoFunc, error) { | ||
allCalls++ | ||
return func() { allCalls++ }, _err | ||
} | ||
settings := extensiontest.NewNopSettings() | ||
cg := newCgroupRuntime(test.config, settings.Logger, setterMock, func(_ float64) (undoFunc, error) { return setterMock() }) | ||
ctx := context.Background() | ||
|
||
err := cg.Start(ctx, componenttest.NewNopHost()) | ||
require.NoError(t, err) | ||
|
||
require.NoError(t, cg.Shutdown(ctx)) | ||
require.Equal(t, test.expectedCalls, allCalls) | ||
}) | ||
} | ||
} |
Oops, something went wrong.