Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perfevent configuration error for AMD chips #1759

Open
Osmanyasal opened this issue May 5, 2023 · 10 comments
Open

Perfevent configuration error for AMD chips #1759

Osmanyasal opened this issue May 5, 2023 · 10 comments

Comments

@Osmanyasal
Copy link

Hello
we're using the performance copilot as a tool in one of our projects but we're having some issues related to monitoring AMD pmu events.
I'd be very happy if you save some time and help me out with this issue.

we list all the available pmu in our machines and we generally use "hardware-specific" PMUs to monitor some predefined events. we're updating /var/lib/pcp/pmdas/perfevent/perfevent.conf file and reinstalling it with our configuration. this works well for intel cpus but it doesn't work for amd cpus. here are the details

this is on our intel machine, the perfevent.conf file and when we install it it works
image
This is one of our amd machines, as you can see it gives errors.
image
with PCP we list all available pmus along with their available events

This is for intel
image
And this is for the amd

image
ps: kernel paranoid is -1 on all of my machines

I can monitor [perf] events with success on both computers but this is not what we're interested in.
My opinion is amd pmu names are not recognized by PCP but we couldn't fix it.

thanks in advance.
Osman

@natoscott
Copy link
Member

@jpwhite4 @hkshaw1990 any clues for our friend here?

@Osmanyasal
Copy link
Author

hello again we have been trying on different amd machines with different architectures but still no good do you have any updates regarding this issue?

@natoscott
Copy link
Member

@Osmanyasal seems like not - if you could provide a remote login to such a system, I could take a quick look for you.

@Osmanyasal
Copy link
Author

Osmanyasal commented May 15, 2023

I don't think i can because they're our school's computers.
If you can describe us a starting point we can check in order to understand what's wrong.
@FatihTasyaran

@natoscott
Copy link
Member

@Osmanyasal I was able to get a reservation an AMD machine today.

You'll find the list of supported names for your platform gets reported by the PCP perfevent agent in the file /var/log/pcp/pmcd/perfevent.log once its been ./Install'd for the first time.

You should be able to find the events you're interested in there and add them to a new section of perfevent.conf for your processor family (in /var/lib/pcp/pmdas/perfevent). I had no problems doing so with latest PCP code, so hopefully this is enough to get you started too.

cheers.

@Osmanyasal
Copy link
Author

That's great it works now.
but the issue is. we use showevtinfo (program in pcp) to report pmu names along with corresponding events and this tool reports pmu name as "amd64_fam17h_zen2 (AMD64 Fam17h Zen2)" so we took the first part as out pmu name but in log files it supports amd64_fam17h only (without _zen2).

@Osmanyasal
Copy link
Author

Osmanyasal commented May 18, 2023

however, when i checked the log file located at /var/log/pcp/pmcd/perfevent.log for our zen3 machine it only supports perf:: events. there's no other pmus such as amd64_fam19h. but when i checked the showevtinfo it says machine supports amd64_fam19h and there're many events related to.
I installed pcp version 6.0.4-1 what could be the issue here?

@natoscott
Copy link
Member

| [...] showevtinfo (program in pcp)

This isn't a program from PCP, so I don't know what it is listing. The PMDA logfile is the one source of truth for PCP, those are all the hardware events that the kernel tells us about.

| [...] what could be the issue here?

The only other possible thing that might be involved would be a security system like SELinux - it might be preventing events being visible from a daemon (like pmdaperfevent) that are visible in a less restricted context like an interactive shell.

Either way, I don't think there's a PCP issue here (we regularly test with selinux here @red Hat and there's no known issues).

@Osmanyasal
Copy link
Author

Sorry for my misleading previous entry.
showevtinfo is a demo program provided by libpfm4 that lists all available pmus and related events on the system.

since pcp uses libpfm4 for pmu event monitoring (as far as i know) i expect anything reported form libpfm4 should be valid for pcp as well -which it is.

I set kernel.paranoid to -1 to see and report pmu events. all these works for zen2 but didn't work on our zen3 machine, pcp doesn't display any pmus other than perf on our zen3 machine which i couldn't see why.

would you elaborate this phrase
"The only other possible thing that might be involved would be a security system like SELinux - it might be preventing events being visible from a daemon (like pmdaperfevent) that are visible in a less restricted context like an interactive shell."

any breadcrumbs would be appreciated
thank you in advance
Osman.

@natoscott
Copy link
Member

If it was an selinux issue (unlikely) when you look in your syslog file you would see lots of AVC errors when pmdaperfevent attempts access via the kernel interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants