Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SB3: system total power limit at 25 watts, including display backlight #117

Open
yatli opened this issue Feb 18, 2022 · 55 comments
Open

SB3: system total power limit at 25 watts, including display backlight #117

yatli opened this issue Feb 18, 2022 · 55 comments
Labels
D: Surface Book 3 Device: Surface Book 3

Comments

@yatli
Copy link

yatli commented Feb 18, 2022

This is observed by using s-tui to continuously monitor the power consumption, core frequency and utilization while compiling the kernel.

Running a "cold start", the system power peaks at 40 watts, and then drops to 25 watts (could be a thermal trip point, or a "long time power limit"?), where the CPU package draws 16 watts, resulting in all-core frequency at 1.6GHz. Core temperature stabilizes at 71C.

Curiously, I lowered the screen brightness to 5%, and the CPU package starts to draw more power (20+ watts) and all-core frequency goes up to 2.0GHz. Core temperature stabilizes at 78C.

So this does not solely look like a thermal throttle -- but where does that 25W limit come from?

@qzed
Copy link
Member

qzed commented Feb 18, 2022

Could be some total system limit for sustained power imposed by firmware. Did you try changing the performance mode?

@yatli
Copy link
Author

yatli commented Feb 18, 2022

I don't have "performance mode" -- there's no /sys/bus/surface_aggregator/devices/01:03:01:00:01/perf_mode.
I did try all the profiles -- no difference it seems.

@qzed
Copy link
Member

qzed commented Feb 18, 2022

Yeah sorry, meant platform profiles (those and performance modes are the same).

@yatli
Copy link
Author

yatli commented Feb 18, 2022

Edit: see below.

After some more experiment with a stress tool I find it's thermal-related and not bound to a sustained loaded time. The peak power could sustain for 2 seconds, 5 seconds and so on.

And the power limit kicks in when the core goes over 90C degrees.
I'll see if I can work out a thermald configuration for this model.

@yatli
Copy link
Author

yatli commented Feb 18, 2022

Just feel this is funny that I should share the story with you guys :D

@yatli yatli closed this as completed Feb 18, 2022
@yatli yatli reopened this Feb 18, 2022
@yatli
Copy link
Author

yatli commented Feb 18, 2022

Have to reopen this because thermald/rapl is not effective against this problem. It's like a power budget, not thermal throttle -- the same 25 watts can well push the core to 80C (dimmed screen) but a thermal limit at 70C (like ~20 watts in total) already triggers the power limit!

@yatli
Copy link
Author

yatli commented Feb 18, 2022

btw, the fan is not as fast (loud) as in Windows. A profile configuration issue?

@yatli
Copy link
Author

yatli commented Feb 18, 2022

So, is there a way I can intercept SAM messages under windows?

@qzed
Copy link
Member

qzed commented Feb 18, 2022

btw, the fan is not as fast (loud) as in Windows. A profile configuration issue?

AFAIK the fan is entirely controlled by SAM, and the only influence we or Windows can have on that is the platform profile/performance mode. Try setting that to "best performance".

So, is there a way I can intercept SAM messages under windows?

Yes, although it requires a bit of configuration and the dumps are not easily readable (you're of course welcome to write a better parser for that). There's an overview on how to do that here: https://github.com/linux-surface/surface-aggregator-module/wiki/Development. A script for transforming the output into json files is here. Documentation on the commands we know so far is here. An overall documentation on the protocol can be found at https://www.kernel.org/doc/html/latest/driver-api/surface_aggregator/ssh.html.

@yatli
Copy link
Author

yatli commented Feb 18, 2022

Here are the two related entries when I toggle between Better Performance vs. Best Performance:

{"ctrl": {"type": 128, "len": 12, "pad": 0, "seq": 42}, "cmd": {"type": 128, "tc": 3, "outgoing": 1, "incoming": 0, "iid": 0, "rqid_lo": 83, "rqid_hi": 3, "cid": 3}, "payload": [3, 0, 0, 0]}
{"ctrl": {"type": 128, "len": 12, "pad": 0, "seq": 46}, "cmd": {"type": 128, "tc": 3, "outgoing": 1, "incoming": 0, "iid": 0, "rqid_lo": 87, "rqid_hi": 3, "cid": 3}, "payload": [4, 0, 0, 0]}

...which matches the documented values.
le32 though?

@yatli
Copy link
Author

yatli commented Feb 18, 2022

Just checked the module. le32 indeed no problem.

@qzed
Copy link
Member

qzed commented Feb 18, 2022

Yeah, le32 is correct.

@qzed
Copy link
Member

qzed commented Feb 18, 2022

All multi-byte values in the protocol are little-endian.

@yatli
Copy link
Author

yatli commented Feb 18, 2022

Let me check if profile get can return the correct answer. If so, the problem may be somewhere else?

@qzed
Copy link
Member

qzed commented Feb 18, 2022

I'm fairly sure that getting/setting profiles is working correctly. There may be other commands for the EC that I've missed as I've mostly developed this on the SB2 and through reverse engineering of the Windows drivers. But it's also possible that there are other factors in play besides the EC.

@yatli
Copy link
Author

yatli commented Feb 18, 2022

Yeah, I just checked. Performance profile readback is correct.
The capture log is a huge stream of category 21(0x15) plus these two 0x03, and a few 0x01.
Maybe I should also check the 0x01 ones.
(Rebooting)

@yatli
Copy link
Author

yatli commented Feb 18, 2022

oh and this: warning: expected SYN, skipping data until next SYN

@qzed
Copy link
Member

qzed commented Feb 18, 2022

Could be a bug in the parser or missing/corrupted data. Essentially this says that it expected the start of a message but got something else, so it's looking for the next start of a message.

@yatli
Copy link
Author

yatli commented Feb 18, 2022

With only cat 21 filtered:

{"ctrl": {"type": 128, "len": 12, "pad": 0, "seq": 42}, "cmd": {"type": 128, "tc": 3, "outgoing": 1, "incoming": 0, "iid": 0, "rqid_lo": 83, "rqid_hi": 3, "cid": 3}, "payload": [3, 0, 0, 0]}
{"ctrl": {"type": 64, "len": 0, "pad": 0, "seq": 42}}
{"ctrl": {"type": 64, "len": 0, "pad": 0, "seq": 43}}
{"ctrl": {"type": 128, "len": 8, "pad": 0, "seq": 44}, "cmd": {"type": 128, "tc": 1, "outgoing": 1, "incoming": 0, "iid": 0, "rqid_lo": 85, "rqid_hi": 3, "cid": 51}, "payload": []}
{"ctrl": {"type": 64, "len": 0, "pad": 0, "seq": 44}}
{"ctrl": {"type": 128, "len": 8, "pad": 0, "seq": 45}, "cmd": {"type": 128, "tc": 1, "outgoing": 1, "incoming": 0, "iid": 0, "rqid_lo": 86, "rqid_hi": 3, "cid": 52}, "payload": []}
{"ctrl": {"type": 64, "len": 0, "pad": 0, "seq": 45}}
{"ctrl": {"type": 128, "len": 9, "pad": 0, "seq": 226}, "cmd": {"type": 128, "tc": 1, "outgoing": 0, "incoming": 1, "iid": 0, "rqid_lo": 86, "rqid_hi": 3, "cid": 52}, "payload": [0]}
{"ctrl": {"type": 128, "len": 12, "pad": 0, "seq": 46}, "cmd": {"type": 128, "tc": 3, "outgoing": 1, "incoming": 0, "iid": 0, "rqid_lo": 87, "rqid_hi": 3, "cid": 3}, "payload": [4, 0, 0, 0]}
{"ctrl": {"type": 64, "len": 0, "pad": 0, "seq": 46}}
{"ctrl": {"type": 64, "len": 0, "pad": 0, "seq": 47}}

Message 43 and 47 are missing. Intriguing!

@yatli
Copy link
Author

yatli commented Feb 18, 2022

The bug is caused probably because IRPComp piggyback'ed another HID message

This is message 42:

ID = 103109
Time = 2/18/2022 7:02:29 PM
Type = IRP
Device object = 0xFFFF9002959EBB70
Device name = \Device\00000040
Driver object = 0xFFFF900298294DF0
Driver name = \Driver\iaLPSS2_UART2
Result value = 0x103
Result constant = STATUS_PENDING
Major function = Write
Minor function = Normal
IRP address = 0xFFFF9002959B4AA0
File object = 0xFFFF9002B1EF5DF0
IRP flags = 0x0
Length = L: 22
Key = K: 0x0
Offset = O: 0x0
Thread ID = 10608
Process ID = 4
IRQL = Passive
Previous mode = KernelMode
Requestor mode = KernelMode
IOSB.Status value = 0x0
IOSB.Status constant = STATUS_SUCCESS
IOSB.Information = 0 (0x0000000000000000)
Requestor PID = 0
Emulated = False
Associated data = True
Data stripped = False
Data size = 22
Data (Hexer)
  00:	aa 55 80 0c 00 2a b1 a9 80 03 01 00 00 53 03 03	ªU?..*±©?....S..
  10:	03 00 00 00 1c b0                              	.....°          

And a few lines down, there's 0x2b, message 43:

ID = 103113
Time = 2/18/2022 7:02:29 PM
Type = IRP
Device object = 0xFFFF9002959EBB70
Device name = \Device\00000040
Driver object = 0xFFFF900298294DF0
Driver name = \Driver\iaLPSS2_UART2
Result value = 0x103
Result constant = STATUS_PENDING
Major function = Write
Minor function = Normal
IRP address = 0xFFFF9002B85D5A20
File object = 0xFFFF9002B1EF5DF0
IRP flags = 0x0
Length = L: 19
Key = K: 0x0
Offset = O: 0x0
Thread ID = 10608
Process ID = 4
IRQL = Passive
Previous mode = KernelMode
Requestor mode = KernelMode
IOSB.Status value = 0x0
IOSB.Status constant = STATUS_SUCCESS
IOSB.Information = 0 (0x0000000000000000)
Requestor PID = 0
Emulated = False
Associated data = True
Data stripped = False
Data size = 19
Data (Hexer)
  00:	aa 55 80 09 00 2b 60 52 80 17 02 00 01 54 03 01	ªU?..+`R?....T..
  10:	00 75 42                                       	.uB             

Also, message 47 pairs with 46:

ID = 103481
Time = 2/18/2022 7:02:31 PM
Type = IRP
Device object = 0xFFFF9002959EBB70
Device name = \Device\00000040
Driver object = 0xFFFF900298294DF0
Driver name = \Driver\iaLPSS2_UART2
Result value = 0x103
Result constant = STATUS_PENDING
Major function = Write
Minor function = Normal
IRP address = 0xFFFF9002B12C24A0
File object = 0xFFFF9002B93E05F0
IRP flags = 0x0
Length = L: 19
Key = K: 0x0
Offset = O: 0x0
Thread ID = 13296
Process ID = 4
IRQL = Passive
Previous mode = KernelMode
Requestor mode = KernelMode
IOSB.Status value = 0x0
IOSB.Status constant = STATUS_SUCCESS
IOSB.Information = 0 (0x0000000000000000)
Requestor PID = 0
Emulated = False
Associated data = True
Data stripped = False
Data size = 19
Data (Hexer)
  00:	aa 55 80 09 00 2f e4 12 80 17 02 00 01 58 03 01	ªU?../ä.?....X..
  10:	00 47 0d                                       	.G.             

0x17 BKL

🤔

@qzed
Copy link
Member

qzed commented Feb 18, 2022

Yeah, unfortunately the json converter script isn't very robust. I had the plan to write a better decoder at some point but never found the time for it...

@yatli
Copy link
Author

yatli commented Feb 18, 2022

Any ideas on the BKL message? It could be a random coincidence but intriguing :p

@qzed
Copy link
Member

qzed commented Feb 18, 2022

BKL could mean backlight maybe? Just a guess though.

@qzed
Copy link
Member

qzed commented Feb 18, 2022

Never seen that one before

@yatli
Copy link
Author

yatli commented Feb 18, 2022

Makes sense, could be backlight related stuff since I'm adjusting power profiles.
I'm wondering if the module handles the piggybacks and incomplete frames?

@qzed
Copy link
Member

qzed commented Feb 18, 2022

Not sure what you mean by piggybacks. It's an exclusive UART channel on the driver side (IRPMon might pick up other stuff that's not sent via that channel). If we receive an incomplete or invalid frame. the driver rejects it (by sending the appropriate packet with type 0x04) and skips to the next SYN bytes (aa 55). The EC does the same on its side.

The communication above is from host to EC though, so it's host initiated. We probably won't see any of the TC=0x17 messages unless we either send them ourselves or we set up the appropriate events (in case that subsystem can actually send events).

@qzed
Copy link
Member

qzed commented Feb 18, 2022

If you want to play around, you can load the debug module on linux (modprobe surface_aggregator_cdev) and use the ctrl.py script to talk to the EC. So you could send the messages from your log.

@yatli
Copy link
Author

yatli commented Feb 18, 2022

aha. pretend we are 0x17 experts. 😎

@qzed
Copy link
Member

qzed commented Feb 18, 2022

SurfaceOemPanel.dll has some SAM related strings. So could really have something to do with the backlight.

@yatli
Copy link
Author

yatli commented Feb 18, 2022

SAM waiting for CMD_APPLY_PROFILE_BKL ?
I don't know, need to talk to it.
Hopefully there are other SB3 users around who can confirm the 25W limit (and not a stupid setup issue on my side)

@qzed
Copy link
Member

qzed commented Feb 18, 2022

Maybe... Could definitely be some sort of callback like "power stuff has changed, run some updates please". The OEM panel driver references a PowerSettingRegisterNotification function. Any chance the command is being sent when you change the backlight?

Also I just noticed another thing... The parser doesn't seem to handle Target IDs (or at least doesn't include them in the output). And the commands above have TID=0x02 instead of the more common TID=0x01. scratch that, I'm stupid and that's outgoing/incoming.

@yatli
Copy link
Author

yatli commented Feb 20, 2022

Any chance the command is being sent when you change the backlight?

I'll check this one when I boot back to Windows.

Currently playing with /dev/surface/aggregator :p

@yatli
Copy link
Author

yatli commented Feb 20, 2022

yatli@YATAO-SB ..surface-aggregator-module/scripts/ssam (git)-[master] % cat test.sh
#!/bin/bash

sudo surface profile set performance
sudo ./ctrl.py request 0x17 0x02 0x01 0x01 0x01 0x00
yatli@YATAO-SB ..surface-aggregator-module/scripts/ssam (git)-[master] % ./test.sh
Platform profile set to 'performance'
Traceback (most recent call last):
  File "/home/yatli/git/surface-aggregator-module/scripts/ssam/./ctrl.py", line 49, in <module>
    main()
  File "/home/yatli/git/surface-aggregator-module/scripts/ssam/./ctrl.py", line 43, in main
    rsp = ctrl.request(rqst)
  File "/home/yatli/git/surface-aggregator-module/scripts/ssam/libssam.py", line 355, in request
    return _request(self.fd, request)
  File "/home/yatli/git/surface-aggregator-module/scripts/ssam/libssam.py", line 263, in _request
    raise OSError(-raw.status, errno.errorcode.get(-raw.status))
TimeoutError: [Errno 110] ETIMEDOUT

Hmm.. Guess it's a "no thanks"?

@qzed
Copy link
Member

qzed commented Feb 20, 2022

That's not necessarily an error, it just times out waiting for a response. Based on your json log above it's likely that the command doesn't have any response (besides the standard ACK frame). Can you try ./ctrl.py request 0x17 0x02 0x01 0x01 0x00 0x00? The second-to-last byte essentially tells the driver whether it should expect a response (0x01) or not (0x00).

@yatli
Copy link
Author

yatli commented Feb 20, 2022

ooh I thought it wasn't even ACK'ed. Thanks for the tip!


Unfortunately it seems the profiles are not loading correctly still.
Rebooting for more dumps;

@yatli
Copy link
Author

yatli commented Feb 20, 2022

FYI jq complements the parser script pretty well:

 ┌─┐ C:\Users\yatli\git\surface-aggregator-module\scripts\irpmon                                                                                                                    master ◇ ┆ yatli
 └─╜ ╰▷ python irpmon_to_json.py dump5.log | jq -c '(. | keys[]) as $i | [$i, .[$i]] | select(.[0] >= 401) | select(.[0] <= 513) | select(.[1].cmd.tc != 21)'
warning: expected SYN, skipping data until next SYN
data dropped: 0x2 0x0 0x1 0x39 0x20 0x1 0x0 0xb7 0xfd 0x0 0x0 0x0 0x0 0xd5 0x83 0x1 0x1 0x7e 0x57
[401,{"ctrl":{"type":128,"len":12,"pad":0,"seq":15},"cmd":{"type":128,"tc":3,"outgoing":1,"incoming":0,"iid":0,"rqid_lo":56,"rqid_hi":32,"cid":3},"payload":[3,0,0,0],"time":"2/20/2022 4:38:36 PM"}]
[402,{"ctrl":{"type":64,"len":0,"pad":0,"seq":15}}]
[404,{"ctrl":{"type":64,"len":0,"pad":0,"seq":16}}]
[508,{"ctrl":{"type":128,"len":12,"pad":0,"seq":17},"cmd":{"type":128,"tc":3,"outgoing":1,"incoming":0,"iid":0,"rqid_lo":58,"rqid_hi":32,"cid":3},"payload":[4,0,0,0],"time":"2/20/2022 4:38:37 PM"}]
[510,{"ctrl":{"type":64,"len":0,"pad":0,"seq":17}}]
[511,{"ctrl":{"type":128,"len":9,"pad":0,"seq":18},"cmd":{"type":128,"tc":23,"outgoing":2,"incoming":0,"iid":1,"rqid_lo":59,"rqid_hi":32,"cid":1},"payload":[0],"time":"2/20/2022 4:38:37 PM"}]
[512,{"ctrl":{"type":64,"len":0,"pad":0,"seq":18}}]

Again, one tc=0x17,cid=1 per tc=3,cid=3.

@yatli
Copy link
Author

yatli commented Feb 20, 2022

Okay another interesting finding!

  • On Linux, when the power limit kicks in, core temperature is around 71C.
  • On Windows, there's no power limit whether the power profile is set to Recommended, Better Performance or Best Performance, core temperature goes up to 99C -- that's when the core thermal throttle is activated.
  • On Windows, if I disconnect power, the performance is limited. Core temperature is around 71C.

@qzed
Copy link
Member

qzed commented Feb 20, 2022

So could be battery/charger related.

I wonder if it's only pulling power directly from the battery instead of battery and charger.

@yatli
Copy link
Author

yatli commented Feb 20, 2022

Could you help take a look at this, I've hand annotated some logs when I pull the plug then plug back:

// incoming: power adapter [2] status changed.
{"ctrl":{"type":128,"len":8,"pad":0,"seq":16},"cmd":{"type":128,"tc":2,"outgoing":0,"incoming":2,"iid":1,"rqid_lo":2,"rqid_hi":0,"cid":23},"payload":[],"time":"2/20/2022 6:15:30 PM","entry":21}
// ack
{"ctrl":{"type":64,"len":0,"pad":0,"seq":16},"time":"2/20/2022 6:15:30 PM","entry":24}
// req #176: ACPI/PSRC[2] (platform power source)
{"ctrl":{"type":128,"len":8,"pad":0,"seq":135},"cmd":{"type":128,"tc":2,"outgoing":2,"incoming":0,"iid":1,"rqid_lo":176,"rqid_hi":63,"cid":13},"payload":[],"time":"2/20/2022 6:15:30 PM","entry":24}
// ack #176
{"ctrl":{"type":64,"len":0,"pad":0,"seq":135},"time":"2/20/2022 6:15:30 PM","entry":29}
// rsp #176: ACPI/PSRC[2]=[0,0,0,0]
{"ctrl":{"type":128,"len":12,"pad":0,"seq":17},"cmd":{"type":128,"tc":2,"outgoing":0,"incoming":2,"iid":1,"rqid_lo":176,"rqid_hi":63,"cid":13},"payload":[0,0,0,0],"time":"2/20/2022 6:15:30 PM","entry":29}
// ack #176
{"ctrl":{"type":64,"len":0,"pad":0,"seq":17},"time":"2/20/2022 6:15:30 PM","entry":33}
// req #177: ACPI/_BTP[2]=[0,0,0,0] (battery trip points)
{"ctrl":{"type":128,"len":12,"pad":0,"seq":136},"cmd":{"type":128,"tc":2,"outgoing":2,"incoming":0,"iid":1,"rqid_lo":177,"rqid_hi":63,"cid":4},"payload":[0,0,0,0],"time":"2/20/2022 6:15:30 PM","entry":33}
// ack #177
{"ctrl":{"type":64,"len":0,"pad":0,"seq":136},"time":"2/20/2022 6:15:30 PM","entry":37}
// req #178: ACPI/_BTP[2]=[226,228,0,0]
{"ctrl":{"type":128,"len":12,"pad":0,"seq":137},"cmd":{"type":128,"tc":2,"outgoing":2,"incoming":0,"iid":1,"rqid_lo":178,"rqid_hi":63,"cid":4},"payload":[226,228,0,0],"time":"2/20/2022 6:15:30 PM","entry":37}
// incoming: power adapter [1] status changed.
{"ctrl":{"type":128,"len":8,"pad":0,"seq":18},"cmd":{"type":128,"tc":2,"outgoing":0,"incoming":1,"iid":1,"rqid_lo":2,"rqid_hi":0,"cid":23},"payload":[],"time":"2/20/2022 6:15:30 PM","entry":39}
// ack #178
{"ctrl":{"type":64,"len":0,"pad":0,"seq":137},"time":"2/20/2022 6:15:30 PM","entry":41}
// ack power adapter status changed
{"ctrl":{"type":64,"len":0,"pad":0,"seq":18},"time":"2/20/2022 6:15:30 PM","entry":43}
// req #179: ACPI/_BST[1] (battery present status)
{"ctrl":{"type":128,"len":8,"pad":0,"seq":138},"cmd":{"type":128,"tc":2,"outgoing":1,"incoming":0,"iid":1,"rqid_lo":179,"rqid_hi":63,"cid":3},"payload":[],"time":"2/20/2022 6:15:30 PM","entry":43}
// ack #179
{"ctrl":{"type":64,"len":0,"pad":0,"seq":138},"time":"2/20/2022 6:15:30 PM","entry":45}
// incoming: TpUpdate(?)
{"ctrl":{"type":128,"len":29,"pad":0,"seq":19},"cmd":{"type":128,"tc":2,"outgoing":0,"incoming":1,"iid":1,"rqid_lo":2,"rqid_hi":0,"cid":83},"payload":[1,54,30,1,0,42,168,0,0,170,0,74,33,115,54,0,0,228,37,0,0],"time":"2/20/2022 6:15:30 PM","entry":45}
// ack 139 (missing!)
{"ctrl":{"type":64,"len":0,"pad":0,"seq":139},"time":"2/20/2022 6:15:30 PM","entry":51}
// ack TpUpdate(?)
{"ctrl":{"type":64,"len":0,"pad":0,"seq":19},"time":"2/20/2022 6:15:30 PM","entry":53}
// req #181: ACPI/_BST[2]
{"ctrl":{"type":128,"len":8,"pad":0,"seq":140},"cmd":{"type":128,"tc":2,"outgoing":2,"incoming":0,"iid":1,"rqid_lo":181,"rqid_hi":63,"cid":3},"payload":[],"time":"2/20/2022 6:15:30 PM","entry":53}
// ack #181
{"ctrl":{"type":64,"len":0,"pad":0,"seq":140},"time":"2/20/2022 6:15:30 PM","entry":55}
// rsp #179: ACPI/_BST[1] = [1,0,0,0,8,0,0,0,248,82,0,0,74,33,0,0]
{"ctrl":{"type":128,"len":24,"pad":0,"seq":20},"cmd":{"type":128,"tc":2,"outgoing":0,"incoming":1,"iid":1,"rqid_lo":179,"rqid_hi":63,"cid":3},"payload":[1,0,0,0,8,0,0,0,248,82,0,0,74,33,0,0],"time":"2/20/2022 6:15:30 PM","entry":55}
// ack #179
{"ctrl":{"type":64,"len":0,"pad":0,"seq":20},"time":"2/20/2022 6:15:30 PM","entry":59}
// rsp #180: ACPI/PSRC[1] = [0,0,0,0] (rqid 180, request missing!)
{"ctrl":{"type":128,"len":12,"pad":0,"seq":21},"cmd":{"type":128,"tc":2,"outgoing":0,"incoming":1,"iid":1,"rqid_lo":180,"rqid_hi":63,"cid":13},"payload":[0,0,0,0],"time":"2/20/2022 6:15:30 PM","entry":59}
// ack #180
{"ctrl":{"type":64,"len":0,"pad":0,"seq":21},"time":"2/20/2022 6:15:30 PM","entry":62}
// req #182: ACPI/_BTP[1]=[0,0,0,0]
{"ctrl":{"type":128,"len":12,"pad":0,"seq":141},"cmd":{"type":128,"tc":2,"outgoing":1,"incoming":0,"iid":1,"rqid_lo":182,"rqid_hi":63,"cid":4},"payload":[0,0,0,0],"time":"2/20/2022 6:15:30 PM","entry":62}
// ack #182
{"ctrl":{"type":64,"len":0,"pad":0,"seq":141},"time":"2/20/2022 6:15:30 PM","entry":65}
// rsp #181: ACPI/_BST[2]=[1,0,0,0,165,4,0,0,10,230,0,0,241,49,0,0]
{"ctrl":{"type":128,"len":24,"pad":0,"seq":22},"cmd":{"type":128,"tc":2,"outgoing":0,"incoming":2,"iid":1,"rqid_lo":181,"rqid_hi":63,"cid":3},"payload":[1,0,0,0,165,4,0,0,10,230,0,0,241,49,0,0],"time":"2/20/2022 6:15:30 PM","entry":65}
// ack #181
{"ctrl":{"type":64,"len":0,"pad":0,"seq":22},"time":"2/20/2022 6:15:30 PM","entry":69}
// req #183: ACPI/_BST[2]
{"ctrl":{"type":128,"len":8,"pad":0,"seq":142},"cmd":{"type":128,"tc":2,"outgoing":2,"incoming":0,"iid":1,"rqid_lo":183,"rqid_hi":63,"cid":3},"payload":[],"time":"2/20/2022 6:15:30 PM","entry":69}
// ack #183
{"ctrl":{"type":64,"len":0,"pad":0,"seq":142},"time":"2/20/2022 6:15:30 PM","entry":71}
// rsp #183: ACPI/_BST[2]=[1,0,0,0,165,4,0,0,10,230,0,0,241,49,0,0]
{"ctrl":{"type":128,"len":24,"pad":0,"seq":23},"cmd":{"type":128,"tc":2,"outgoing":0,"incoming":2,"iid":1,"rqid_lo":183,"rqid_hi":63,"cid":3},"payload":[1,0,0,0,165,4,0,0,10,230,0,0,241,49,0,0],"time":"2/20/2022 6:15:30 PM","entry":71}
// ack #183
{"ctrl":{"type":64,"len":0,"pad":0,"seq":23},"time":"2/20/2022 6:15:30 PM","entry":74}
// req #184: ACPI/_BTP[1] = [140,82,0,0]
{"ctrl":{"type":128,"len":12,"pad":0,"seq":143},"cmd":{"type":128,"tc":2,"outgoing":1,"incoming":0,"iid":1,"rqid_lo":184,"rqid_hi":63,"cid":4},"payload":[140,82,0,0],"time":"2/20/2022 6:15:30 PM","entry":74}
// ack #184
{"ctrl":{"type":64,"len":0,"pad":0,"seq":143},"time":"2/20/2022 6:15:30 PM","entry":79}
// req #185: profile=1
{"ctrl":{"type":128,"len":12,"pad":0,"seq":144},"cmd":{"type":128,"tc":3,"outgoing":1,"incoming":0,"iid":0,"rqid_lo":185,"rqid_hi":63,"cid":3},"payload":[1,0,0,0],"time":"2/20/2022 6:15:30 PM","entry":79}
// ack #185
{"ctrl":{"type":64,"len":0,"pad":0,"seq":144},"time":"2/20/2022 6:15:30 PM","entry":83}
// req #186: ACPI/_BST[1]
{"ctrl":{"type":128,"len":8,"pad":0,"seq":145},"cmd":{"type":128,"tc":2,"outgoing":1,"incoming":0,"iid":1,"rqid_lo":186,"rqid_hi":63,"cid":3},"payload":[],"time":"2/20/2022 6:15:30 PM","entry":83}
// ack #186
{"ctrl":{"type":64,"len":0,"pad":0,"seq":145},"time":"2/20/2022 6:15:30 PM","entry":86}
// rsp #186: ACPI/_BST[1]=[1,0,0,0,8,0,0,0,248,82,0,0,74,33,0,0]
{"ctrl":{"type":128,"len":24,"pad":0,"seq":24},"cmd":{"type":128,"tc":2,"outgoing":0,"incoming":1,"iid":1,"rqid_lo":186,"rqid_hi":63,"cid":3},"payload":[1,0,0,0,8,0,0,0,248,82,0,0,74,33,0,0],"time":"2/20/2022 6:15:30 PM","entry":86}
// ack #186
{"ctrl":{"type":64,"len":0,"pad":0,"seq":24},"time":"2/20/2022 6:15:30 PM","entry":89}
// req #187: ACPI/_BST[2]
{"ctrl":{"type":128,"len":8,"pad":0,"seq":146},"cmd":{"type":128,"tc":2,"outgoing":2,"incoming":0,"iid":1,"rqid_lo":187,"rqid_hi":63,"cid":3},"payload":[],"time":"2/20/2022 6:15:30 PM","entry":89}
// ack #187
{"ctrl":{"type":64,"len":0,"pad":0,"seq":146},"time":"2/20/2022 6:15:30 PM","entry":92}
// rsp #187: ACPI/_BST[2]=[1,0,0,0,165,4,0,0,10,230,0,0,241,49,0,0]
{"ctrl":{"type":128,"len":24,"pad":0,"seq":25},"cmd":{"type":128,"tc":2,"outgoing":0,"incoming":2,"iid":1,"rqid_lo":187,"rqid_hi":63,"cid":3},"payload":[1,0,0,0,165,4,0,0,10,230,0,0,241,49,0,0],"time":"2/20/2022 6:15:30 PM","entry":92}
// req #188: tc=0x17, cid=1, payload=[0]
{"ctrl":{"type":128,"len":9,"pad":0,"seq":147},"cmd":{"type":128,"tc":23,"outgoing":2,"incoming":0,"iid":1,"rqid_lo":188,"rqid_hi":63,"cid":1},"payload":[0],"time":"2/20/2022 6:15:30 PM","entry":94}
// ack #187
{"ctrl":{"type":64,"len":0,"pad":0,"seq":25},"time":"2/20/2022 6:15:30 PM","entry":98}
// ack #188
{"ctrl":{"type":64,"len":0,"pad":0,"seq":147},"time":"2/20/2022 6:15:30 PM","entry":100}
// req #189: ACPI/_BST[1]
{"ctrl":{"type":128,"len":8,"pad":0,"seq":148},"cmd":{"type":128,"tc":2,"outgoing":1,"incoming":0,"iid":1,"rqid_lo":189,"rqid_hi":63,"cid":3},"payload":[],"time":"2/20/2022 6:15:30 PM","entry":100}
// ack #189
{"ctrl":{"type":64,"len":0,"pad":0,"seq":148},"time":"2/20/2022 6:15:30 PM","entry":103}
// rsp #189: ACPI/_BST[1]=[1,0,0,0,8,0,0,0,248,82,0,0,74,33,0,0]
{"ctrl":{"type":128,"len":24,"pad":0,"seq":26},"cmd":{"type":128,"tc":2,"outgoing":0,"incoming":1,"iid":1,"rqid_lo":189,"rqid_hi":63,"cid":3},"payload":[1,0,0,0,8,0,0,0,248,82,0,0,74,33,0,0],"time":"2/20/2022 6:15:30 PM","entry":103}
// ack #189
{"ctrl":{"type":64,"len":0,"pad":0,"seq":26},"time":"2/20/2022 6:15:30 PM","entry":106}
// req #190: ACPI/_BST[2]
{"ctrl":{"type":128,"len":8,"pad":0,"seq":149},"cmd":{"type":128,"tc":2,"outgoing":2,"incoming":0,"iid":1,"rqid_lo":190,"rqid_hi":63,"cid":3},"payload":[],"time":"2/20/2022 6:15:30 PM","entry":106}
// ack #190
{"ctrl":{"type":64,"len":0,"pad":0,"seq":149},"time":"2/20/2022 6:15:30 PM","entry":109}
// rsp #190: ACPI/_BST[2]=[1,0,0,0,165,4,0,0,10,230,0,0,241,49,0,0]
{"ctrl":{"type":128,"len":24,"pad":0,"seq":27},"cmd":{"type":128,"tc":2,"outgoing":0,"incoming":2,"iid":1,"rqid_lo":190,"rqid_hi":63,"cid":3},"payload":[1,0,0,0,165,4,0,0,10,230,0,0,241,49,0,0],"time":"2/20/2022 6:15:30 PM","entry":109}
// ack #190
{"ctrl":{"type":64,"len":0,"pad":0,"seq":27},"time":"2/20/2022 6:15:30 PM","entry":112}
// req #191: profile=1
{"ctrl":{"type":128,"len":12,"pad":0,"seq":150},"cmd":{"type":128,"tc":3,"outgoing":1,"incoming":0,"iid":0,"rqid_lo":191,"rqid_hi":63,"cid":3},"payload":[1,0,0,0],"time":"2/20/2022 6:15:30 PM","entry":112}
// ack #191
{"ctrl":{"type":64,"len":0,"pad":0,"seq":150},"time":"2/20/2022 6:15:30 PM","entry":117}
// req #192: ACPI/_BST[1]
{"ctrl":{"type":128,"len":8,"pad":0,"seq":151},"cmd":{"type":128,"tc":2,"outgoing":1,"incoming":0,"iid":1,"rqid_lo":192,"rqid_hi":63,"cid":3},"payload":[],"time":"2/20/2022 6:15:30 PM","entry":117}
// ack #192
{"ctrl":{"type":64,"len":0,"pad":0,"seq":151},"time":"2/20/2022 6:15:30 PM","entry":120}
// rsp #192: ACPI/_BST[1] = [1,0,0,0,8,0,0,0,248,82,0,0,74,33,0,0]
{"ctrl":{"type":128,"len":24,"pad":0,"seq":28},"cmd":{"type":128,"tc":2,"outgoing":0,"incoming":1,"iid":1,"rqid_lo":192,"rqid_hi":63,"cid":3},"payload":[1,0,0,0,8,0,0,0,248,82,0,0,74,33,0,0],"time":"2/20/2022 6:15:30 PM","entry":120}
// ack #192
{"ctrl":{"type":64,"len":0,"pad":0,"seq":28},"time":"2/20/2022 6:15:30 PM","entry":123}
// req #193: ACPI/_BST[2]
{"ctrl":{"type":128,"len":8,"pad":0,"seq":152},"cmd":{"type":128,"tc":2,"outgoing":2,"incoming":0,"iid":1,"rqid_lo":193,"rqid_hi":63,"cid":3},"payload":[],"time":"2/20/2022 6:15:30 PM","entry":123}
// ack #193
{"ctrl":{"type":64,"len":0,"pad":0,"seq":152},"time":"2/20/2022 6:15:30 PM","entry":126}
// rsp #193: ACPI/_BST[2]=[1,0,0,0,165,4,0,0,10,230,0,0,241,49,0,0]
{"ctrl":{"type":128,"len":24,"pad":0,"seq":29},"cmd":{"type":128,"tc":2,"outgoing":0,"incoming":2,"iid":1,"rqid_lo":193,"rqid_hi":63,"cid":3},"payload":[1,0,0,0,165,4,0,0,10,230,0,0,241,49,0,0],"time":"2/20/2022 6:15:30 PM","entry":126}
// ack #193
{"ctrl":{"type":64,"len":0,"pad":0,"seq":29},"time":"2/20/2022 6:15:30 PM","entry":129}
// req #194: tc=0x17, cid=1, payload=[0]
{"ctrl":{"type":128,"len":9,"pad":0,"seq":153},"cmd":{"type":128,"tc":23,"outgoing":2,"incoming":0,"iid":1,"rqid_lo":194,"rqid_hi":63,"cid":1},"payload":[0],"time":"2/20/2022 6:15:30 PM","entry":129}
// ack #194
{"ctrl":{"type":64,"len":0,"pad":0,"seq":153},"time":"2/20/2022 6:15:31 PM","entry":135}
// req #195: ACPI/_BST[1]
{"ctrl":{"type":128,"len":8,"pad":0,"seq":154},"cmd":{"type":128,"tc":2,"outgoing":1,"incoming":0,"iid":1,"rqid_lo":195,"rqid_hi":63,"cid":3},"payload":[],"time":"2/20/2022 6:15:31 PM","entry":135}
// ack #195
{"ctrl":{"type":64,"len":0,"pad":0,"seq":154},"time":"2/20/2022 6:15:31 PM","entry":139}
// rsp #195: ACPI/_BST[1]=[1,0,0,0,37,12,0,0,248,82,0,0,25,33,0,0]
{"ctrl":{"type":128,"len":24,"pad":0,"seq":30},"cmd":{"type":128,"tc":2,"outgoing":0,"incoming":1,"iid":1,"rqid_lo":195,"rqid_hi":63,"cid":3},"payload":[1,0,0,0,37,12,0,0,248,82,0,0,25,33,0,0],"time":"2/20/2022 6:15:31 PM","entry":139}
// ack #195
{"ctrl":{"type":64,"len":0,"pad":0,"seq":30},"time":"2/20/2022 6:15:31 PM","entry":143}
// req #196: ACPI/_BST[2]
{"ctrl":{"type":128,"len":8,"pad":0,"seq":155},"cmd":{"type":128,"tc":2,"outgoing":2,"incoming":0,"iid":1,"rqid_lo":196,"rqid_hi":63,"cid":3},"payload":[],"time":"2/20/2022 6:15:31 PM","entry":143}
// ack #196
{"ctrl":{"type":64,"len":0,"pad":0,"seq":155},"time":"2/20/2022 6:15:31 PM","entry":147}
{"ctrl":{"type":128,"len":8,"pad":0,"seq":156},"cmd":{"type":128,"tc":2,"outgoing":1,"incoming":0,"iid":1,"rqid_lo":197,"rqid_hi":63,"cid":3},"payload":[],"time":"2/20/2022 6:15:31 PM","entry":147}
{"ctrl":{"type":64,"len":0,"pad":0,"seq":156},"time":"2/20/2022 6:15:31 PM","entry":149}
{"ctrl":{"type":128,"len":24,"pad":0,"seq":31},"cmd":{"type":128,"tc":2,"outgoing":0,"incoming":1,"iid":1,"rqid_lo":197,"rqid_hi":63,"cid":3},"payload":[1,0,0,0,37,12,0,0,248,82,0,0,25,33,0,0],"time":"2/20/2022 6:15:31 PM","entry":149}
{"ctrl":{"type":64,"len":0,"pad":0,"seq":31},"time":"2/20/2022 6:15:31 PM","entry":153}
{"ctrl":{"type":128,"len":24,"pad":0,"seq":32},"cmd":{"type":128,"tc":2,"outgoing":0,"incoming":2,"iid":1,"rqid_lo":196,"rqid_hi":63,"cid":3},"payload":[1,0,0,0,171,9,0,0,10,230,0,0,214,48,0,0],"time":"2/20/2022 6:15:31 PM","entry":153}
{"ctrl":{"type":64,"len":0,"pad":0,"seq":32},"time":"2/20/2022 6:15:31 PM","entry":157}

@qzed
Copy link
Member

qzed commented Feb 20, 2022

Looks all fairly standard to me. It gets a "power status changed" notification and then queries the power and power adapter state (_BST/PSRC) and sets some battery trip point (used for notifications when the battery reaches a certain level, _BTP).

The only thing that I don't know much about is the TpUpdate thing. That has something to do with DPTF (Intel's Dynamic Power and Thermal Framework). So it's possible (maybe even likely) that this is related to the power limit. But that's an incoming notification, so there's no way that this can directly influence the power limit. Meaning if it has something to do with it, the driver that picks that notification up does something to change the power limit without communicating back to the EC.

At the moment we're just ignoring this message on Linux, meaning we ACK it (so for all intents and purposes the communication above should be mostly equal on Windows and Linux), but we don't do anything else with it (i.e. there's no driver to pick it up).

@yatli
Copy link
Author

yatli commented Feb 20, 2022

When setting _BTP, it always first clear it: // req #182: ACPI/_BTP[1]=[0,0,0,0]
then set it in a following request: // req #184: ACPI/_BTP[1] = [140,82,0,0]

Does that match the linux driver behavior?

edit: nvm I thought it's a safety feature but it looks like it's just a programmable battery meter

Thanks for taking a look. I'm done for the night. Recharging. :)

@qzed
Copy link
Member

qzed commented Feb 20, 2022

Yeah, _BTP is just a means to get a notification when the battery level moves over or under that programmed threshold. Basically, at that point you're guaranteed to get a "battery status changed" notification (which might also be sent regardless by the EC if the battery drops a couple of percent).

@qzed
Copy link
Member

qzed commented Feb 20, 2022

I guess since everything is implemented by the EC, targeting the Windows driver, it could have some side-effects. But I think that's probably rather unlikely. I don't think Linux really sets trip points (that's a thing that userspace tools like UPower should do, I think, and I'm not sure if that behavior is consistent with Windows), so you could try to play around with that, but again, I think it's rather unlikely that this has any side-effects.

@yatli
Copy link
Author

yatli commented Mar 8, 2022

Update: I have observed the same power clamping in Windows with AC plugged in. Just need to push it a bit harder.

This can be worked around with ThrottleStop with the following turbo power limit settings:

image

... so it turbo boosts to ~40W, and drops back to ~25W (cpu only, not system total) and stabilizes.

I'll try to figure out the equivalent for linux.

@yatli
Copy link
Author

yatli commented Mar 10, 2022

This is the best I can achieve:

{
    "MSR_PKG_POWER_LIMIT": 6053258197434505,
    "first": {
        "MSR_PKG_POWER_LIMIT": 6053258197434505,
        "power_limit": 17.125,
        "power_enabled": true,
        "power_clamping_limit": true,
        "power_limit_time_window": 3.0,
        "__types.SimpleNamespace": true
    },
    "second": {
        "MSR_PKG_POWER_LIMIT": 1409384,
        "power_limit": 45.0,
        "power_enabled": true,
        "power_clamping_limit": true,
        "power_limit_time_window": 1.0,
        "__types.SimpleNamespace": true
    },
    "lock": false,
    "__types.SimpleNamespace": true
}

TL;DR 3 secs of max turbo, then power limit the cpu package to 19W.

(in Windows I can achieve 6 secs max turbo, then down to 22W, but given that the fan is not spinning as fast, let's be conservative and not melt the machine).

This allows the system to sustain a total power of 27W without dropping to 25W.

But ideally, we should fix the fan problem.

@qzed
Copy link
Member

qzed commented Mar 10, 2022

But ideally, we should fix the fan problem.

The fan is controlled by the platform profile. Did you set that to "best performance"?

@yatli
Copy link
Author

yatli commented Mar 11, 2022

Yeah I set it with surface profile set performance, but as we discussed, it seems not working.

@qzed
Copy link
Member

qzed commented Mar 11, 2022

Well, as far as I can tell there's no way to control the fans other than setting the profile, which controls the fan curve. At least on my SB2, the dGPU fan behaves similarly with the profiles. Did you monitor temperatures when comparing by any chance?

@yatli
Copy link
Author

yatli commented Mar 11, 2022

Yeah I keep s-tui around for monitoring temperature -- not as good as ThrottleStop on Windows though because I don't see hardware throttle "reasons".

On turbo max, it ramps up quickly to 100C. Then the power limit kicks in, temperature drops back to 80C.
Depending on the cooling capacity (determined by fan speed), the temperature will then either go up or down.

@yatli
Copy link
Author

yatli commented Mar 22, 2022

Update -- I'm seeing another kind of throttling.

Yesterday I put it in sleep-then-hibernate but the battery is drained to 0%.

On boot today, I see very slow screen update in GRUB, and when the system boots up, I see the cores are throttled down to 200MHz, BAT at 0%.

Interestingly, as BAT goes up (lucky it still does!), the throttling decreases. Now I'm at BAT=18% and the throttling goes up to 900MHz all core.

That reminds me of the "original theory" you had -- it doesn't think it's plugged in so it schedules power limits as if it's on battery.

@qzed qzed added the D: Surface Book 3 Device: Surface Book 3 label Apr 28, 2022
@qzed
Copy link
Member

qzed commented Apr 28, 2022

I wonder if something similar can also be observed on other non-Surface devices. Could be some Intel related thing?

qzed pushed a commit that referenced this issue Apr 20, 2023
commit f76b3a3 upstream.

XFS allows CoW on non-shared extents to combat fragmentation[1].  The old
non-shared extent could be mwrited before, its dax entry is marked dirty.

This results in a WARNing:

[   28.512349] ------------[ cut here ]------------
[   28.512622] WARNING: CPU: 2 PID: 5255 at fs/dax.c:390 dax_insert_entry+0x342/0x390
[   28.513050] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables
[   28.515462] CPU: 2 PID: 5255 Comm: fsstress Kdump: loaded Not tainted 6.3.0-rc1-00001-g85e1481e19c1-dirty #117
[   28.515902] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.1-1-1 04/01/2014
[   28.516307] RIP: 0010:dax_insert_entry+0x342/0x390
[   28.516536] Code: 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 8b 45 20 48 83 c0 01 e9 e2 fe ff ff 48 8b 45 20 48 83 c0 01 e9 cd fe ff ff <0f> 0b e9 53 ff ff ff 48 8b 7c 24 08 31 f6 e8 1b 61 a1 00 eb 8c 48
[   28.517417] RSP: 0000:ffffc9000845fb18 EFLAGS: 00010086
[   28.517721] RAX: 0000000000000053 RBX: 0000000000000155 RCX: 000000000018824b
[   28.518113] RDX: 0000000000000000 RSI: ffffffff827525a6 RDI: 00000000ffffffff
[   28.518515] RBP: ffffea00062092c0 R08: 0000000000000000 R09: ffffc9000845f9c8
[   28.518905] R10: 0000000000000003 R11: ffffffff82ddb7e8 R12: 0000000000000155
[   28.519301] R13: 0000000000000000 R14: 000000000018824b R15: ffff88810cfa76b8
[   28.519703] FS:  00007f14a0c94740(0000) GS:ffff88817bd00000(0000) knlGS:0000000000000000
[   28.520148] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.520472] CR2: 00007f14a0c8d000 CR3: 000000010321c004 CR4: 0000000000770ee0
[   28.520863] PKRU: 55555554
[   28.521043] Call Trace:
[   28.521219]  <TASK>
[   28.521368]  dax_fault_iter+0x196/0x390
[   28.521595]  dax_iomap_pte_fault+0x19b/0x3d0
[   28.521852]  __xfs_filemap_fault+0x234/0x2b0
[   28.522116]  __do_fault+0x30/0x130
[   28.522334]  do_fault+0x193/0x340
[   28.522586]  __handle_mm_fault+0x2d3/0x690
[   28.522975]  handle_mm_fault+0xe6/0x2c0
[   28.523259]  do_user_addr_fault+0x1bc/0x6f0
[   28.523521]  exc_page_fault+0x60/0x140
[   28.523763]  asm_exc_page_fault+0x22/0x30
[   28.524001] RIP: 0033:0x7f14a0b589ca
[   28.524225] Code: c5 fe 7f 07 c5 fe 7f 47 20 c5 fe 7f 47 40 c5 fe 7f 47 60 c5 f8 77 c3 66 0f 1f 84 00 00 00 00 00 40 0f b6 c6 48 89 d1 48 89 fa <f3> aa 48 89 d0 c5 f8 77 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90
[   28.525198] RSP: 002b:00007fff1dea1c98 EFLAGS: 00010202
[   28.525505] RAX: 000000000000001e RBX: 000000000014a000 RCX: 0000000000006046
[   28.525895] RDX: 00007f14a0c82000 RSI: 000000000000001e RDI: 00007f14a0c8d000
[   28.526290] RBP: 000000000000006f R08: 0000000000000004 R09: 000000000014a000
[   28.526681] R10: 0000000000000008 R11: 0000000000000246 R12: 028f5c28f5c28f5c
[   28.527067] R13: 8f5c28f5c28f5c29 R14: 0000000000011046 R15: 00007f14a0c946c0
[   28.527449]  </TASK>
[   28.527600] ---[ end trace 0000000000000000 ]---


To be able to delete this entry, clear its dirty mark before
invalidate_inode_pages2_range().

[1] https://lore.kernel.org/linux-xfs/20230321151339.GA11376@frogsfrogsfrogs/

Link: https://lkml.kernel.org/r/[email protected]
Fixes: f80e166 ("fsdax: invalidate pages when CoW")
Signed-off-by: Shiyang Ruan <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Darrick J. Wong <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
qzed pushed a commit that referenced this issue May 3, 2023
XFS allows CoW on non-shared extents to combat fragmentation[1].  The old
non-shared extent could be mwrited before, its dax entry is marked dirty. 

This results in a WARNing:

[   28.512349] ------------[ cut here ]------------
[   28.512622] WARNING: CPU: 2 PID: 5255 at fs/dax.c:390 dax_insert_entry+0x342/0x390
[   28.513050] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables
[   28.515462] CPU: 2 PID: 5255 Comm: fsstress Kdump: loaded Not tainted 6.3.0-rc1-00001-g85e1481e19c1-dirty #117
[   28.515902] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.1-1-1 04/01/2014
[   28.516307] RIP: 0010:dax_insert_entry+0x342/0x390
[   28.516536] Code: 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 8b 45 20 48 83 c0 01 e9 e2 fe ff ff 48 8b 45 20 48 83 c0 01 e9 cd fe ff ff <0f> 0b e9 53 ff ff ff 48 8b 7c 24 08 31 f6 e8 1b 61 a1 00 eb 8c 48
[   28.517417] RSP: 0000:ffffc9000845fb18 EFLAGS: 00010086
[   28.517721] RAX: 0000000000000053 RBX: 0000000000000155 RCX: 000000000018824b
[   28.518113] RDX: 0000000000000000 RSI: ffffffff827525a6 RDI: 00000000ffffffff
[   28.518515] RBP: ffffea00062092c0 R08: 0000000000000000 R09: ffffc9000845f9c8
[   28.518905] R10: 0000000000000003 R11: ffffffff82ddb7e8 R12: 0000000000000155
[   28.519301] R13: 0000000000000000 R14: 000000000018824b R15: ffff88810cfa76b8
[   28.519703] FS:  00007f14a0c94740(0000) GS:ffff88817bd00000(0000) knlGS:0000000000000000
[   28.520148] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.520472] CR2: 00007f14a0c8d000 CR3: 000000010321c004 CR4: 0000000000770ee0
[   28.520863] PKRU: 55555554
[   28.521043] Call Trace:
[   28.521219]  <TASK>
[   28.521368]  dax_fault_iter+0x196/0x390
[   28.521595]  dax_iomap_pte_fault+0x19b/0x3d0
[   28.521852]  __xfs_filemap_fault+0x234/0x2b0
[   28.522116]  __do_fault+0x30/0x130
[   28.522334]  do_fault+0x193/0x340
[   28.522586]  __handle_mm_fault+0x2d3/0x690
[   28.522975]  handle_mm_fault+0xe6/0x2c0
[   28.523259]  do_user_addr_fault+0x1bc/0x6f0
[   28.523521]  exc_page_fault+0x60/0x140
[   28.523763]  asm_exc_page_fault+0x22/0x30
[   28.524001] RIP: 0033:0x7f14a0b589ca
[   28.524225] Code: c5 fe 7f 07 c5 fe 7f 47 20 c5 fe 7f 47 40 c5 fe 7f 47 60 c5 f8 77 c3 66 0f 1f 84 00 00 00 00 00 40 0f b6 c6 48 89 d1 48 89 fa <f3> aa 48 89 d0 c5 f8 77 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90
[   28.525198] RSP: 002b:00007fff1dea1c98 EFLAGS: 00010202
[   28.525505] RAX: 000000000000001e RBX: 000000000014a000 RCX: 0000000000006046
[   28.525895] RDX: 00007f14a0c82000 RSI: 000000000000001e RDI: 00007f14a0c8d000
[   28.526290] RBP: 000000000000006f R08: 0000000000000004 R09: 000000000014a000
[   28.526681] R10: 0000000000000008 R11: 0000000000000246 R12: 028f5c28f5c28f5c
[   28.527067] R13: 8f5c28f5c28f5c29 R14: 0000000000011046 R15: 00007f14a0c946c0
[   28.527449]  </TASK>
[   28.527600] ---[ end trace 0000000000000000 ]---


To be able to delete this entry, clear its dirty mark before
invalidate_inode_pages2_range().

[1] https://lore.kernel.org/linux-xfs/20230321151339.GA11376@frogsfrogsfrogs/

Link: https://lkml.kernel.org/r/[email protected]
Fixes: f80e166 ("fsdax: invalidate pages when CoW")
Signed-off-by: Shiyang Ruan <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Darrick J. Wong <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
qzed pushed a commit that referenced this issue Oct 22, 2023
[ Upstream commit 3eef855 ]

Using netconsole netpoll_poll_dev could be called from interrupt
context, thus using disable_irq() would cause the following kernel
warning with CONFIG_DEBUG_ATOMIC_SLEEP enabled:

  BUG: sleeping function called from invalid context at kernel/irq/manage.c:137
  in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 10, name: ksoftirqd/0
  CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G        W         5.15.42-00075-g816b502b2298-dirty #117
  Hardware name: aml (r1) (DT)
  Call trace:
   dump_backtrace+0x0/0x270
   show_stack+0x14/0x20
   dump_stack_lvl+0x8c/0xac
   dump_stack+0x18/0x30
   ___might_sleep+0x150/0x194
   __might_sleep+0x64/0xbc
   synchronize_irq+0x8c/0x150
   disable_irq+0x2c/0x40
   stmmac_poll_controller+0x140/0x1a0
   netpoll_poll_dev+0x6c/0x220
   netpoll_send_skb+0x308/0x390
   netpoll_send_udp+0x418/0x760
   write_msg+0x118/0x140 [netconsole]
   console_unlock+0x404/0x500
   vprintk_emit+0x118/0x250
   dev_vprintk_emit+0x19c/0x1cc
   dev_printk_emit+0x90/0xa8
   __dev_printk+0x78/0x9c
   _dev_warn+0xa4/0xbc
   ath10k_warn+0xe8/0xf0 [ath10k_core]
   ath10k_htt_txrx_compl_task+0x790/0x7fc [ath10k_core]
   ath10k_pci_napi_poll+0x98/0x1f4 [ath10k_pci]
   __napi_poll+0x58/0x1f4
   net_rx_action+0x504/0x590
   _stext+0x1b8/0x418
   run_ksoftirqd+0x74/0xa4
   smpboot_thread_fn+0x210/0x3c0
   kthread+0x1fc/0x210
   ret_from_fork+0x10/0x20

Since [0] .ndo_poll_controller is only needed if driver doesn't or
partially use NAPI. Because stmmac does so, stmmac_poll_controller
can be removed fixing the above warning.

[0] commit ac3d9dd ("netpoll: make ndo_poll_controller() optional")

Cc: <[email protected]> # 5.15.x
Fixes: 47dd7a5 ("net: add support for STMicroelectronics Ethernet controllers")
Signed-off-by: Remi Pommarel <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/1c156a6d8c9170bd6a17825f2277115525b4d50f.1696429960.git.repk@triplefau.lt
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
qzed pushed a commit that referenced this issue Nov 20, 2023
Using netconsole netpoll_poll_dev could be called from interrupt
context, thus using disable_irq() would cause the following kernel
warning with CONFIG_DEBUG_ATOMIC_SLEEP enabled:

  BUG: sleeping function called from invalid context at kernel/irq/manage.c:137
  in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 10, name: ksoftirqd/0
  CPU: 0 PID: 10 Comm: ksoftirqd/0 Tainted: G        W         5.15.42-00075-g816b502b2298-dirty #117
  Hardware name: aml (r1) (DT)
  Call trace:
   dump_backtrace+0x0/0x270
   show_stack+0x14/0x20
   dump_stack_lvl+0x8c/0xac
   dump_stack+0x18/0x30
   ___might_sleep+0x150/0x194
   __might_sleep+0x64/0xbc
   synchronize_irq+0x8c/0x150
   disable_irq+0x2c/0x40
   stmmac_poll_controller+0x140/0x1a0
   netpoll_poll_dev+0x6c/0x220
   netpoll_send_skb+0x308/0x390
   netpoll_send_udp+0x418/0x760
   write_msg+0x118/0x140 [netconsole]
   console_unlock+0x404/0x500
   vprintk_emit+0x118/0x250
   dev_vprintk_emit+0x19c/0x1cc
   dev_printk_emit+0x90/0xa8
   __dev_printk+0x78/0x9c
   _dev_warn+0xa4/0xbc
   ath10k_warn+0xe8/0xf0 [ath10k_core]
   ath10k_htt_txrx_compl_task+0x790/0x7fc [ath10k_core]
   ath10k_pci_napi_poll+0x98/0x1f4 [ath10k_pci]
   __napi_poll+0x58/0x1f4
   net_rx_action+0x504/0x590
   _stext+0x1b8/0x418
   run_ksoftirqd+0x74/0xa4
   smpboot_thread_fn+0x210/0x3c0
   kthread+0x1fc/0x210
   ret_from_fork+0x10/0x20

Since [0] .ndo_poll_controller is only needed if driver doesn't or
partially use NAPI. Because stmmac does so, stmmac_poll_controller
can be removed fixing the above warning.

[0] commit ac3d9dd ("netpoll: make ndo_poll_controller() optional")

Cc: <[email protected]> # 5.15.x
Fixes: 47dd7a5 ("net: add support for STMicroelectronics Ethernet controllers")
Signed-off-by: Remi Pommarel <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/1c156a6d8c9170bd6a17825f2277115525b4d50f.1696429960.git.repk@triplefau.lt
Signed-off-by: Jakub Kicinski <[email protected]>
qzed pushed a commit that referenced this issue Apr 1, 2024
…terfaces

[ Upstream commit cb5942b ]

wilc_netdev_cleanup currently triggers a KASAN warning, which can be
observed on interface registration error path, or simply by
removing the module/unbinding device from driver:

echo spi0.1 > /sys/bus/spi/drivers/wilc1000_spi/unbind

==================================================================
BUG: KASAN: slab-use-after-free in wilc_netdev_cleanup+0x508/0x5cc
Read of size 4 at addr c54d1ce8 by task sh/86

CPU: 0 PID: 86 Comm: sh Not tainted 6.8.0-rc1+ #117
Hardware name: Atmel SAMA5
 unwind_backtrace from show_stack+0x18/0x1c
 show_stack from dump_stack_lvl+0x34/0x58
 dump_stack_lvl from print_report+0x154/0x500
 print_report from kasan_report+0xac/0xd8
 kasan_report from wilc_netdev_cleanup+0x508/0x5cc
 wilc_netdev_cleanup from wilc_bus_remove+0xc8/0xec
 wilc_bus_remove from spi_remove+0x8c/0xac
 spi_remove from device_release_driver_internal+0x434/0x5f8
 device_release_driver_internal from unbind_store+0xbc/0x108
 unbind_store from kernfs_fop_write_iter+0x398/0x584
 kernfs_fop_write_iter from vfs_write+0x728/0xf88
 vfs_write from ksys_write+0x110/0x1e4
 ksys_write from ret_fast_syscall+0x0/0x1c

[...]

Allocated by task 1:
 kasan_save_track+0x30/0x5c
 __kasan_kmalloc+0x8c/0x94
 __kmalloc_node+0x1cc/0x3e4
 kvmalloc_node+0x48/0x180
 alloc_netdev_mqs+0x68/0x11dc
 alloc_etherdev_mqs+0x28/0x34
 wilc_netdev_ifc_init+0x34/0x8ec
 wilc_cfg80211_init+0x690/0x910
 wilc_bus_probe+0xe0/0x4a0
 spi_probe+0x158/0x1b0
 really_probe+0x270/0xdf4
 __driver_probe_device+0x1dc/0x580
 driver_probe_device+0x60/0x140
 __driver_attach+0x228/0x5d4
 bus_for_each_dev+0x13c/0x1a8
 bus_add_driver+0x2a0/0x608
 driver_register+0x24c/0x578
 do_one_initcall+0x180/0x310
 kernel_init_freeable+0x424/0x484
 kernel_init+0x20/0x148
 ret_from_fork+0x14/0x28

Freed by task 86:
 kasan_save_track+0x30/0x5c
 kasan_save_free_info+0x38/0x58
 __kasan_slab_free+0xe4/0x140
 kfree+0xb0/0x238
 device_release+0xc0/0x2a8
 kobject_put+0x1d4/0x46c
 netdev_run_todo+0x8fc/0x11d0
 wilc_netdev_cleanup+0x1e4/0x5cc
 wilc_bus_remove+0xc8/0xec
 spi_remove+0x8c/0xac
 device_release_driver_internal+0x434/0x5f8
 unbind_store+0xbc/0x108
 kernfs_fop_write_iter+0x398/0x584
 vfs_write+0x728/0xf88
 ksys_write+0x110/0x1e4
 ret_fast_syscall+0x0/0x1c
 [...]

David Mosberger-Tan initial investigation [1] showed that this
use-after-free is due to netdevice unregistration during vif list
traversal. When unregistering a net device, since the needs_free_netdev has
been set to true during registration, the netdevice object is also freed,
and as a consequence, the corresponding vif object too, since it is
attached to it as private netdevice data. The next occurrence of the loop
then tries to access freed vif pointer to the list to move forward in the
list.

Fix this use-after-free thanks to two mechanisms:
- navigate in the list with list_for_each_entry_safe, which allows to
  safely modify the list as we go through each element. For each element,
  remove it from the list with list_del_rcu
- make sure to wait for RCU grace period end after each vif removal to make
  sure it is safe to free the corresponding vif too (through
  unregister_netdev)

Since we are in a RCU "modifier" path (not a "reader" path), and because
such path is expected not to be concurrent to any other modifier (we are
using the vif_mutex lock), we do not need to use RCU list API, that's why
we can benefit from list_for_each_entry_safe.

[1] https://lore.kernel.org/linux-wireless/[email protected]/

Fixes: 8399918 ("staging: wilc1000: use RCU list to maintain vif interfaces list")
Signed-off-by: Alexis Lothoré <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
D: Surface Book 3 Device: Surface Book 3
Projects
None yet
Development

No branches or pull requests

2 participants