Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for collecting CLR event through EventPipe #1291

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

chrisnas
Copy link
Contributor

  • Add -eventpipe option to activate the EventPipe collection
  • Add -providers provider:keyword:verbosity:tags,... option to allow fine grained tuning of dotnet-trace execution
  • Add -sdk-path to point to an already installed dotnet SDK (otherwise it will be install in /tmp/dotnet_sdk_tool)
  • When the collection is done, a trace.nettrace and eventpipe.log files will be part of the resulting zip file that perfview is now able to leverage

Copy link
Member

@brianrob brianrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisnas, @gleocadie thank you very much for your contribution! I have a few questions and comments, but this is looking quite good.

@@ -642,6 +642,10 @@ usePerf=1
# Use LTTng
useLTTng=1

# Use EventPipe to collect CLR events
useEventPipe=0
sdkAndToolDir="/tmp/dotnet_sdk_tool"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we make this something like /tmp/perfcollect-dotnet-sdk so that it's clear where it came from?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

@@ -1377,18 +1382,33 @@ ProcessArguments()
elif [ "-nolttng" == "$arg" ]
then
useLTTng=0
elif [ "-eventpipe" == "$arg" ]
then
useEventPipe=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this - if you enable EventPipe, LTTng is disabled.

useLTTng=0
elif [ "-providers" == "$arg" ]
then
providers=$rawvalue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to propose that if -eventpipe isn't specified then this writes a FatalError that -eventpipe must be specified. This ensures that users of LTTng don't inadvertently specify this, but get no results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one. I will add a validation step after the arguments are parsed.

elif [ "-noperf" == "$arg" ]
then
usePerf=0
elif [ "-gccollectonly" == "$arg" ]
then
gcCollectOnly=1
usePerf=0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have mis-understood, but I think you were wanting a way to make it possible to capture -gccollectonly with perf enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that I would do that in a second step (PR), we are not blocked now. what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, no problem.

BuildEventPipeArgs()
{
if [ "$collectionPid" == "" ]
then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should have an argument validation step after the arguments are parsed to handle this and some of the other possible argument-related issues that I mentioned above. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are aligned :) I will add a validation step

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

WriteStatus "Installing dotnet sdk in $sdkAndToolDir"
ResetText
RunSilent "mkdir $sdkAndToolDir"
RunSilent "curl -OL https://dot.net/v1/dotnet-install.sh"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the SDK directory should be created first so that you can download dotnet-install.sh and store it in this directory as opposed to the current working directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see thank, I missed that.

then
FatalError "dotnet-trace tool was installed correctly."
fi
LogAppend 'dotnet-trace version:' `$sdkAndToolDir/dotnet trace --version`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the version information to the log unconditionally!

@@ -2153,6 +2312,11 @@ ProcessArguments $@
# Ensure prerequisites are installed.
EnsurePrereqsInstalled

if [ "$useEventPipe" == "1" ] && [ "$1" != "stop" ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like this to follow the same pattern as we do for the other prerequisites, and make download and install of the SDK part of perfcollect install. This ensures that any disk changes other than for the traces themselves are intentional on the part of the user.

Copy link
Contributor

@gleocadie gleocadie Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brianrob one question: the .NET sdk and the tool will unconditionally be installed in /tmp/perfcollect-dotnet-sdk when using perfcollect install.
The DiscoverCommands and InitializeLog functions will take use this path but not the one provided by -sdk-path option.
I was wondering if the -sdk-path was still worth. Maybe we can remove it no. What do you think ? (I might have missed something)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You bring up a good point here. I was trying to have more flexibility here, but I feel like the more I think about it, the more complicated things get, and that we should go back to a more simple plan as you are suggesting.

Here's what I think we should do, let me know what you think:

  1. If there is a global dotnet SDK installed, use it. If not, install to /tmp/perfcollect-dotnet-sdk.
  2. Install dotnet-trace using whichever SDK we have.
  3. When collecting, discover the SDK to use based on whether or not there is a global one or one in /tmp.

This ensures that if someone doesn't want to install another SDK, but already has one that they can use it. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me.

@@ -642,6 +642,10 @@ usePerf=1
# Use LTTng
useLTTng=1

# Use EventPipe to collect CLR events
useEventPipe=0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to propose that instead of using the term eventpipe that we use dotnetTrace and/or dotnet-trace since the actual tool being used here is dotnet-trace. This is more of a forward-looking thing, so that it's clear what collector is being used should there be multiple, or if people don't know what eventpipe is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

then
EnsureDotNetTraceToolIsInstalled
fi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a regression test that uses dotnet-trace?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I missed the test folder. I will add a regression test.

Copy link
Contributor

@gleocadie gleocadie Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brianrob I tried to add a regression test (it allowed me to find stuff I forgot to add in the script: installing curl). But, to run the script with -dotnet-trace, we need an .NET app running in the container. I'm not sure it's possible to do that in the current state. Do you want me to add this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be great.

@gleocadie gleocadie force-pushed the PR_perfcollect_eventpipe branch 2 times, most recently from f307984 to e8b179f Compare November 3, 2020 20:02
Base automatically changed from master to main February 2, 2021 23:16
@ezsilmar
Copy link
Contributor

Hi, I'd like to push this PR forward to be able to merge criteo-forks@747f2a8 so that we stop relying on the perfview fork internally :)

I discussed with @chrisnas and @gleocadie, and it seems that some tests were missing. I also see that 1 test in the current build failed, but the build result is long gone. @brianrob could you please re-trigger the test if there's such an option?

@brianrob
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@brianrob
Copy link
Member

Thanks @ezsilmar. Just triggered a new CI run.

@ezsilmar
Copy link
Contributor

Hello! @brianrob I got some time to come back to this PR and would be glad to get a code review.

I mainly fought test instability:

  • Disable test parallelization: this was already the case for most test projects
  • Better handling of shared directories in EtlTestBase
  • In perfcollect install for Ubuntu removed the packages that are missing, used linux-tools-generic instead
  • In container tests for dotnet-trace added a sleep after launching the test program

About the last point, something weird is happening. If I attach to the process with dotnet-trace right after the process is started, dotnet-trace hangs forever printing Stopping the trace. This may take up to minutes depending on the application being traced. If I wait for about a couple of seconds it works fine. This behavior reproduces in the github build pipeline, so there's probably a bug in dotnet-trace.

@ezsilmar
Copy link
Contributor

The test failing currently is OOM of CanReadV4EventPipeTraceBiggerThan4GB, it passes on my machine.

@ezsilmar ezsilmar force-pushed the PR_perfcollect_eventpipe branch 2 times, most recently from 02df7c7 to 65bd20b Compare February 3, 2023 13:42
@ezsilmar
Copy link
Contributor

ezsilmar commented Feb 3, 2023

Hello, this PR is hanging for almost 3 years but it is still relevant in our context, and I think it'd be beneficial for the community as well.

To remind what this is all about, we often use PerfView on Windows to analyze the behavior of dotnet apps running on Linux. Relevant to this PR, PerfView can understand:

  • perf CPU samples, collected with perfcollect
  • lttng text data file, collected with perfcollect
  • nettrace file, collected with dotnet-trace

In the days of net2, using perf+lttng was the only way. Perfcollect greatly eased the process by combining events and cpu samples into a single .trace.zip file. Later, dotnet-trace became a thing making perfcollect almost abandoned (at least that's my feeling). While for the events dotnet-trace is much more convenient than lttng, it was never intended to match capabilities of perf. Thus today when we need both cpu sampling and events we deal with two separate artifacts: a perfcollect output and a nettrace file.

This PR is a quality of life change that allows the nettrace file to be packaged in .trace.zip, alongside perf data. A nice side-effect is we can zip .nettrace file which is important for sharing long sessions. The PR also modifies perfcollect to be able to use dotnet-trace under the hood, however this part is not important for my particular usecase as we run perf and zip directly in our troubleshooting code.

If modifying perfcollect is not something you'd like to support, we could just merge the change in PerfViewData.cs that tries to read .nettrace from .trace.zip: it's small and beneficial on its own. Wdyt?

Mentioning @brianrob as the last reviewer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants