Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profiler: add enable flag to control profiler activation #2840

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions profiler/options.go
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ type config struct {
logStartup bool
traceConfig executionTraceConfig
endpointCountEnabled bool
enable bool
}

// logStartup records the configuration to the configured logger in JSON format
Expand Down Expand Up @@ -146,6 +147,7 @@ func logStartup(c *config) {
"execution_trace_size_limit": c.traceConfig.Limit,
"endpoint_count_enabled": c.endpointCountEnabled,
"custom_profiler_label_keys": c.customProfilerLabels,
"enable": c.enable,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same change needs to be added to profiler/telemetry.go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, let's just remove this. If enable is false we won't send anything at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow. Having enable in the debug log is useful for debugging, e.g. when a customer reports that profiling isn't working.

And having it in telemetry is useful for us to understand how our users use this flag?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think keeping this information in the debug log would make it easier to handle future reports about the profiler.
I'm fine with either option, so please feel free to share your thoughts and I'll implement them accordingly!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having enable in the debug log is useful for debugging, e.g. when a customer reports that profiling isn't working.

Agreed. I added this comment too hastily and didn't notice that the startup log happens before the check for DD_PROFILING_ENABLED. Let's keep this after all.

And having it in telemetry is useful for us to understand how our users use this flag?

As this PR stands right now, the telemetry client won't start if DD_PROFILING_ENABLED=false. My gut feeling is that we shouldn't start telemetry if we don't start the profiler. IMO this is lower priority than making sure DD_PROFILING_ENABLED=false works, and we can address telemetry in a followup. WDYT @felixge?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also added it to profiler/telemetry.go.
3aac285 (#2840)

}
b, err := json.Marshal(info)
if err != nil {
Expand Down Expand Up @@ -208,6 +210,7 @@ func defaultConfig() (*config, error) {
} else {
c.agentURL = url.String() + "/profiling/v1/input"
}
c.enable = internal.BoolEnv("DD_PROFILING_ENABLED", true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to support DD_PROFILING_ENABLED=auto set via the Datadog admission controller. Right now this will work with the value auto, but will log a warning saying it's an invalid boolean. Let's perhaps check for auto explicitly and then check the boolean? Something like:

if os.Getenv("DD_PROFILING_ENABLED") == "auto" {
    c.enable = true
} else {
    c.enable = internal.BoolEnv("DD_PROFILING_ENABLED", true)
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied suggested changes here
ab2acd9 (#2840)

if v := os.Getenv("DD_PROFILING_UPLOAD_TIMEOUT"); v != "" {
d, err := time.ParseDuration(v)
if err != nil {
Expand Down
15 changes: 15 additions & 0 deletions profiler/options_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,21 @@ func TestEnvVars(t *testing.T) {
assert.Equal(t, "http://localhost:6218/profiling/v1/input", cfg.agentURL)
})

t.Run("DD_PROFILING_ENABLED", func(t *testing.T) {
t.Run("default", func(t *testing.T) {
cfg, err := defaultConfig()
require.NoError(t, err)
assert.Equal(t, true, cfg.enable)
})

t.Run("override", func(t *testing.T) {
t.Setenv("DD_PROFILING_ENABLED", "false")
cfg, err := defaultConfig()
require.NoError(t, err)
assert.Equal(t, false, cfg.enable)
})
})

t.Run("DD_PROFILING_UPLOAD_TIMEOUT", func(t *testing.T) {
t.Setenv("DD_PROFILING_UPLOAD_TIMEOUT", "3s")
cfg, err := defaultConfig()
Expand Down
3 changes: 3 additions & 0 deletions profiler/profiler.go
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,9 @@ func Start(opts ...Option) error {
if err != nil {
return err
}
if !p.cfg.enable {
return nil
}
activeProfiler = p
activeProfiler.run()
traceprof.SetProfilerEnabled(true)
Expand Down
14 changes: 14 additions & 0 deletions profiler/profiler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,20 @@ func TestStart(t *testing.T) {
mu.Unlock()
})

t.Run("dd_profiling_not_enabled", func(t *testing.T) {
t.Setenv("DD_PROFILING_ENABLED", "false")
if err := Start(); err != nil {
t.Fatal(err)
}
defer Stop()

mu.Lock()
// if DD_PROFILING_ENABLED is false, the profiler should not be started even if Start() is called
// So we should not have an activeProfiler
assert.Nil(t, activeProfiler)
mu.Unlock()
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this and the other test case.

NIT: Some of the existing test suite has a lot of tests like this that assert on the internal state of things. However, in general we prefer tests that verify user-visible behavior these days. I.e. having a test that checks that a disabled profiler doesn't send data would be nice. In practice this might be a difficult test to write in a non-flaky manner, so I'm okay with keeping the testing as proposed in this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, should we keep the test that checks activeProfiler, and additionally create a test that verifies no profiling data is sent when DD_PROFILER_ENABLED=false?
As you mentioned, since I'm not very familiar with the codebase, implementing this might be a bit challenging, but I'll give it a try. Thank you for the suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a test for DD_PROFILING_ENABLED=false I sketched up while reviewing this:

func TestEnabledFalse(t *testing.T) {
        t.Setenv("DD_PROFILING_ENABLED", "false")
        ch := startTestProfiler(t, 1, WithPeriod(10*time.Millisecond), WithProfileTypes())
        select {
        case <-ch:
                t.Fatal("received profile when profiler should have been disabled")
        case <-time.After(time.Second):
                // This test might succeed incorrectly on an overloaded
                // CI server, but is very likely to fail locally given a
                // buggy implementation
        }
}

Feel free to add it to the PR if it makes sense to you. We can keep the other tests for now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented additional changes in f71e851 (#2840).

However, the tests you've already suggested seem sufficient, so I couldn't find any points to modify or add. If there are any areas that need improvement, please feel free to let me know!


t.Run("options", func(t *testing.T) {
if err := Start(); err != nil {
t.Fatal(err)
Expand Down
Loading