Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RX7900 (gfx11) Cards Fan Control is Not Functional #140

Open
PorcelainMouse opened this issue Apr 21, 2023 · 10 comments
Open

RX7900 (gfx11) Cards Fan Control is Not Functional #140

PorcelainMouse opened this issue Apr 21, 2023 · 10 comments

Comments

@PorcelainMouse
Copy link

I don't actually think this is a problem with gpu-utils code, but I'm not sure. On my 7900 card, gpu-ls & -mon seem to work fine, but gpu-pac doesn't detect any writable cards, even when setting feature mask correctly. Also, I cannot manually change /sys/.../pwm1_enable value by writing to it; the write succeeds, but value is unchanged.

I see lots of complaints about this all the way back to launch date, but I think it's weird that it is still broken, considering how long it's been, and that fan control seems like a hardware core capability, not just some software feature that is nice to have, and that the /sys interface has been really stable for a long time, and all those bits seem to be there for this card, it seems very odd that isn't not functional.

@Ricks-Lab
Copy link
Owner

Can you provide the output of gpu-ls --raw

@PorcelainMouse
Copy link
Author

Thank you! Yes, but later. So sorry. Card was horribly unstable under load; I'm very worried. I'm hoping it's just too hot due to lack of fan, so I had to remove the card. Will get another chance to test next weekend.

@Ricks-Lab
Copy link
Owner

No worries. I found plenty to work on from the results for my RX6600.

@PorcelainMouse
Copy link
Author

Do you want that with or without feature mask set as described by gpu-pac?

@Ricks-Lab
Copy link
Owner

I think it needs to be with the feature mask set. I need the information after this header in the raw output:

### File: pp_od_clk_voltage, SensorKey: pp_od_clk_voltage, Label: read/write driver file

@Ricks-Lab
Copy link
Owner

I have been making changes to better support GPUs with Voltage Offset setting. I know this is available in RX66xx. The requested output above should confirm if RX77xx has this capability also. Incompletely tested code can be bound in the new branch: pp_feature_refine

@PorcelainMouse
Copy link
Author

OMG, finally got my card back from RMA...no change, symptoms exactly as before. I don't know what I'm gong to do.

But, I got the data you requested. I really hope this helps.
gpu-ls-raw-mask.txt

Since last we spoke, there's been loads of new kernels, and even a new OpenCL lib for my distro that was supposed to make things better. But, everything looks the same, AFAICT.

@Ricks-Lab
Copy link
Owner

The pp_od_clk_voltage file appears empty. This could indicate driver doesn't support overclocking. Are you running the latest version from the repository? I suggest cloning the latest from GitHub and explicitly run that version. I did make some changes to support latest GPUs, but I still don't have a RX7900 to try. What other issues do you have that caused you to RMA it?

@PorcelainMouse
Copy link
Author

You mean the latest AMDGPU? I'm not quite sure what you mean. I'm using the latest available for my distribution, which uses a very recent kernel, more recent than most distros. The only way I know to get a different one is to build it myself, from Torvald's upstream. I've compiled many kernels back in the day, but it's been about 20 years since then; I don't relish the idea. I assume I can build it, but then I have to hack my distro's kernel management, which is yet another hurdle. I'm sure I could figure it out, but I'm avoiding it.

I see some version information, but it's really hard to know, because the internal versions are different than version numbers AMD reports. It's hard to know how to match these up. When I first reported this issue, the internal version was 58.86.0, now it's 78.75.0, FWIW. But, sure, there is clearly a newer version of MESA, although I don't think that is related. But, AFAIK, the newest AMDGPU is in Torvald's copy and I'm relatively close to that. I cannot install AMDGPU-PRO, not that I want to, but it's not even an option.

My system is very unstable. It have frequent crashes while crunching with OpenCL. However, this is not unfamiliar, as I had similar behavior with 5700XT before I got fan control working and 6800XT when I forget to start fan control. If edge temps exceed 52 deg, I see increased computation errors, higher and I get crashes. So, this is consistent with prior experience and I currently don't have any evidence this is a different issue related to the RDNA3 arch or 7900 series hardware, specifically, or software. Perhaps, the situation with the 7900xtx is worse; it seems to happen in 20-60 minutes, where as my 5700xt only crashed that frequently after several months of running hot without fan control.

@PorcelainMouse
Copy link
Author

Actually, I take that back. There is a newer version of x11-amdgpu, but I'm using wayland. MESA is current (23.1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants