Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profiler: add enable flag to control profiler activation #2840

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

korECM
Copy link

@korECM korECM commented Aug 31, 2024

What does this PR do?

This PR introduces a new environment variable DD_PROFILING_ENABLED to control the profiler's behavior in a way similar to DD_TRACE_ENABLED. By default, DD_PROFILING_ENABLED is set to true, meaning profiling will be enabled if profiler.Start() is called in the application code. If DD_PROFILING_ENABLED is set to false, profiling will be disabled even if profiler.Start() is called. This allows the application code to always call profiler.Start() while dynamically adjusting profiling through the environment variable.

Motivation

Fixes #2834

The motivation for this PR is to simplify the control of profiling behavior across multiple applications. By introducing DD_PROFILING_ENABLED, developers can avoid the cumbersome task of managing environment variables within the application code and instead control profiling through a single environment variable.

Additional Information

This PR includes the following changes:

  • Addition of enable field in profiler config struct.
  • Update to defaultConfig function to read DD_PROFILING_ENABLED environment variable.
  • Conditional check in Start function to skip profiling if enable field in profiler config is false.
  • Unit tests for DD_PROFILING_ENABLED in options_test.go and profiler_test.go.

Reviewer's Checklist

  • Changed code has unit tests for its functionality at or near 100% coverage.
  • System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
  • There is a benchmark for any new code, or changes to existing code.
  • If this interacts with the agent in a new way, a system test has been added.
  • Add an appropriate team label so this PR gets put in the right place for the release notes.
  • Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild.

Add a new 'enable' field to the profiler config, controlled by the
DD_PROFILING_ENABLED environment variable. This allows users to
disable profiling even when the Start() function is called.

The enable flag defaults to true, maintaining backwards compatibility.
When set to false, the profiler will not start, providing a simple way
to toggle profiling without code changes.

Update tests to cover the new functionality and add logging for the
new configuration option.
@korECM korECM requested a review from a team as a code owner August 31, 2024 08:33
Copy link
Member

@felixge felixge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @nsrip-dd could you also take a look please? Kicking of a CI run now.

@@ -146,6 +147,7 @@ func logStartup(c *config) {
"execution_trace_size_limit": c.traceConfig.Limit,
"endpoint_count_enabled": c.endpointCountEnabled,
"custom_profiler_label_keys": c.customProfilerLabels,
"enable": c.enable,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same change needs to be added to profiler/telemetry.go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, let's just remove this. If enable is false we won't send anything at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow. Having enable in the debug log is useful for debugging, e.g. when a customer reports that profiling isn't working.

And having it in telemetry is useful for us to understand how our users use this flag?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think keeping this information in the debug log would make it easier to handle future reports about the profiler.
I'm fine with either option, so please feel free to share your thoughts and I'll implement them accordingly!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having enable in the debug log is useful for debugging, e.g. when a customer reports that profiling isn't working.

Agreed. I added this comment too hastily and didn't notice that the startup log happens before the check for DD_PROFILING_ENABLED. Let's keep this after all.

And having it in telemetry is useful for us to understand how our users use this flag?

As this PR stands right now, the telemetry client won't start if DD_PROFILING_ENABLED=false. My gut feeling is that we shouldn't start telemetry if we don't start the profiler. IMO this is lower priority than making sure DD_PROFILING_ENABLED=false works, and we can address telemetry in a followup. WDYT @felixge?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also added it to profiler/telemetry.go.
3aac285 (#2840)

// So we should not have an activeProfiler
assert.Nil(t, activeProfiler)
mu.Unlock()
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this and the other test case.

NIT: Some of the existing test suite has a lot of tests like this that assert on the internal state of things. However, in general we prefer tests that verify user-visible behavior these days. I.e. having a test that checks that a disabled profiler doesn't send data would be nice. In practice this might be a difficult test to write in a non-flaky manner, so I'm okay with keeping the testing as proposed in this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, should we keep the test that checks activeProfiler, and additionally create a test that verifies no profiling data is sent when DD_PROFILER_ENABLED=false?
As you mentioned, since I'm not very familiar with the codebase, implementing this might be a bit challenging, but I'll give it a try. Thank you for the suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a test for DD_PROFILING_ENABLED=false I sketched up while reviewing this:

func TestEnabledFalse(t *testing.T) {
        t.Setenv("DD_PROFILING_ENABLED", "false")
        ch := startTestProfiler(t, 1, WithPeriod(10*time.Millisecond), WithProfileTypes())
        select {
        case <-ch:
                t.Fatal("received profile when profiler should have been disabled")
        case <-time.After(time.Second):
                // This test might succeed incorrectly on an overloaded
                // CI server, but is very likely to fail locally given a
                // buggy implementation
        }
}

Feel free to add it to the PR if it makes sense to you. We can keep the other tests for now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented additional changes in f71e851 (#2840).

However, the tests you've already suggested seem sufficient, so I couldn't find any points to modify or add. If there are any areas that need improvement, please feel free to let me know!

@@ -146,6 +147,7 @@ func logStartup(c *config) {
"execution_trace_size_limit": c.traceConfig.Limit,
"endpoint_count_enabled": c.endpointCountEnabled,
"custom_profiler_label_keys": c.customProfilerLabels,
"enable": c.enable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, let's just remove this. If enable is false we won't send anything at all.

@@ -208,6 +210,7 @@ func defaultConfig() (*config, error) {
} else {
c.agentURL = url.String() + "/profiling/v1/input"
}
c.enable = internal.BoolEnv("DD_PROFILING_ENABLED", true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to support DD_PROFILING_ENABLED=auto set via the Datadog admission controller. Right now this will work with the value auto, but will log a warning saying it's an invalid boolean. Let's perhaps check for auto explicitly and then check the boolean? Something like:

if os.Getenv("DD_PROFILING_ENABLED") == "auto" {
    c.enable = true
} else {
    c.enable = internal.BoolEnv("DD_PROFILING_ENABLED", true)
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied suggested changes here
ab2acd9 (#2840)

Updated profiler options to automatically enable profiling if the environment variable "DD_PROFILING_ENABLED" is set to "auto". This change delegates the decision to the Datadog admission controller when "auto" is specified.
Implemented a new test to verify that no profiles are received when the profiler is disabled. This helps ensure the profiler respects the DD_PROFILING_ENABLED environment variable.
@korECM
Copy link
Author

korECM commented Oct 14, 2024

Hi @felixge @nsrip-dd

I hope you're doing well! I noticed that my PR has been pending review for about a month. If you have some time, I would greatly appreciate it if you could take a look.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

proposal: adding DD_PROFILING_ENABLED environment variable to control profiling
3 participants