Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pro 7/Laptop 3: Touchscreen support #22

Open
aleksfadini opened this issue Oct 24, 2019 · 44 comments
Open

Pro 7/Laptop 3: Touchscreen support #22

aleksfadini opened this issue Oct 24, 2019 · 44 comments
Assignees
Labels
A: IPTS Area: Issue in IPTS driver C: missing feature Category: Missing feature D: Surface Laptop 3 Device: Surface Laptop 3 D: Surface Pro 7 Device: Surface Pro 7 P: medium Priority: Medium priority

Comments

@aleksfadini
Copy link

Any chance that this could work on the new Surface Pro 7 as well?
Jakeday kernel battery/acpi does not work.

I tried to run "make" in /modules and got this:

`
make -C /lib/modules/"5.3.6-1-surface"/build M= modules
make[1]: *** /lib/modules/5.3.6-1-surface/build: No such file or directory. Stop.
make: *** [Makefile:29: all] Error 2

`

@qzed
Copy link
Member

qzed commented Oct 25, 2019

We've discussed the Laptop 3 on IRC. I think the Pro 7 should be fairly similar (meaning that we would need to write some sort of stub-driver which provides a battery interface and talks to the EC, luckily the last part is the same from the previous generations). Can you upload an acpidump (sudo acpidump > acpidump.out)?

Regarding the make error: Have you installed a new kernel and did not reboot? What distro are you running? In any case, if it doesn't work with the surface-linux kernel, it will, at the moment, not work with the module either.

@aleksfadini
Copy link
Author

aleksfadini commented Oct 25, 2019

Hey man! I will post that as soon as I get home. It would mean a lot if you could implement the battery on the Surface Pro 7, it seems you are the only one who is doing this.
To answer your question, I did use the surface-linux in the arch repo. I tried to compile from source as a desperate move :)
Edit: I used your surface module on the surface pro 6 for a year and it changed my life, since the surface is great for coding apps and test them with touch in the same ecosystem in Linux. Big fan of your work, just sent you a donation brother

@aleksfadini
Copy link
Author

Here is my acpi-dump (had to add .txt to upload it to GH)
acpidump.out.txt

@aleksfadini
Copy link
Author

On a side note, if you could include this patch for the surface linux kernel, you could easily fix the touchscreen as well (and maybe it is a similar situation for the surface book 3):

-touchscreen (related to intel_lpss blacklist otherwise wont boot)

Here is more info on touchscreen issue:

First Boot
To boot fully without the boot process hanging one must blacklist the 'intel_lpss_pci' module, which can be done on a live media installation by selecting the boot option you want and pressing 'e', navigating to the end of the boot options string, and adding 'modprobe.blacklist=intel_lpss_pci'. This should allow booting into the full OS and further installation to the internal SSD.

Intel LPSS
To fix the intel_lpss_module the patch from this paste must be applied to the kernel.

https://pastebin.com/sqPv8ShP

Remember to be cautious about running code or installing patches from third party sources if you do not know what it does.

This patch should enable the system to boot without blacklisting the module, further enabling access to the touchscreen.

from https://wiki.archlinux.org/index.php/Dell_XPS_13_2-in-1_(7390)

@qzed qzed self-assigned this Oct 27, 2019
@qzed qzed changed the title Surface Pro 7 Surface Pro 7: Battery status Oct 27, 2019
@qzed
Copy link
Member

qzed commented Oct 27, 2019

Big fan of your work, just sent you a donation brother

Thank you, this is much appreciated!

I can't give you a concrete time-frame when the battery stuff will work, but I hope that it doesn't take too long. At the moment, my first priority is getting the keyboard to work on the Laptop 3, but I will work on battery status on both devices after that. In essence, all those things are fairly similar to the previous generations, it's just that some ACPI stuff has changed.

If you know anything about ACPI: the battery definitions, including _BIX and _BST have been removed and it seems to rely solely on the EC requests. Those are the same as on previous generations, though, so I think the difficult part is done. Unfortunately I don't know much about the Linux battery subsystem and thus I don't have any idea how long this will take to implement. The keyboard should be simpler to implement as it seems that this is basically just a matter of different IDs.

On a side note, if you could include this patch for the surface linux kernel, you could easily fix the touchscreen as well (and maybe it is a similar situation for the surface book 3):

Thank you for the link. Could you try the patch (I could compile a kernel for you if you want me to)? Also do you know what type of touchscreen the device uses (e.g. I2C)?

@aleksfadini
Copy link
Author

aleksfadini commented Oct 27, 2019

I would definitely try a kernel if you were so kind to compile one. Maybe it could be surface-linux-pro7 or something like that, in your arch surface repo?

I didn't know the keyboard of he sf laptop 3 doesn't work.
With touchscreen and battery I would be basically done with what I need for my surface 7, the rest doesn't matter (camera suspend etc).

@qzed
Copy link
Member

qzed commented Oct 27, 2019

@aleksfadini I've compiled and uploaded a kernel with the patch: https://github.com/qzed/linux-surface/releases/tag/v5.3-testing.1

@aleksfadini
Copy link
Author

That is awesome! Will install them with sudo pacman -U and report back asap. Infinite thanks.

@aleksfadini
Copy link
Author

aleksfadini commented Oct 27, 2019

Can confirm, with the v5.3-testing.1 patched kernel the boot does not hang, so I could remove the kernel parameter:
'modprobe.blacklist=intel_lpss_pci
And still boot normally!
Now I should be able to configure the touch screen. Looking into what commands to use for diagnostics and to figure out what type it is (I2C?)

Thanks again!!

@aleksfadini
Copy link
Author

aleksfadini commented Oct 27, 2019

By the way, installinf the v5.3-testing.1 on arch with sudo pacman -U gives this error:

`
lding image from preset: /etc/mkinitcpio.d/linux-surface-dev.preset: 'default'
15 -> -k /boot/vmlinuz-linux-surface-dev -c /etc/mkinitcpio.conf -g
14 /boot/initramfs-linux-surface-dev.img
13 ==> Starting build: 5.3.0-1-surface-dev
12 -> Running build hook: [base]
11 -> Running build hook: [udev]
10 -> Running build hook: [autodetect]
9 -> Running build hook: [modconf]
8 -> Running build hook: [block]
7 -> Running build hook: [filesystems]
6 -> Running build hook: [keyboard]
5 -> Running build hook: [fsck]
4 ==> ERROR: module not found: 'surface_acpi'
3 ==> Generating module dependencies
2 ==> Creating gzip-compressed initcpio image:
1 /boot/initramfs-linux-surface-dev.img
17 ==> WARNING: errors were encountered during the build. The image may not be
1 complete.
2 ==> Building image from preset: /etc/mkinitcpio.d/linux-surface-dev.preset:
3 'fallback'
4 -> -k /boot/vmlinuz-linux-surface-dev -c /etc/mkinitcpio.conf -g
5 /boot/initramfs-linux-surface-dev-fallback.img
6 -S autodetect
7 ==> Starting build: 5.3.0-1-surface-dev
8 -> Running build hook: [base]
9 -> Running build hook: [udev]
10 -> Running build hook: [modconf]
11 -> Running build hook: [block]
12 ==> WARNING: Possibly missing firmware for module: aic94xx
13 ==> WARNING: Possibly missing firmware for module: wd719x
14 -> Running build hook: [filesystems]
15 -> Running build hook: [keyboard]
16 -> Running build hook: [fsck]
17 ==> ERROR: module not found: 'surface_acpi'
18 ==> Generating module dependencies
19 ==> Creating gzip-compressed initcpio image:
20 /boot/initramfs-linux-surface-dev-fallback.img
21 ==> WARNING: errors were encountered during the build. The image may not be
22 complete.
23 error: command failed to execute correctly

`

However the kernel works fine!

less /proc/bus/input/devices does not show a touch screen yet.

@qzed
Copy link
Member

qzed commented Oct 28, 2019

With regards to the error: It looks like you've installed surface_acpi via dkms in the past, but something may have gone wrong. You could try and play around with dkms to remove it.

@qzed
Copy link
Member

qzed commented Oct 28, 2019

Thanks for testing, I'll include the patch then.

@aleksfadini
Copy link
Author

aleksfadini commented Oct 28, 2019

With regards to the error: It looks like you've installed surface_acpi via dkms in the past, but something may have gone wrong. You could try and play around with dkms to remove it.

Yes! that's correct. I tried things I didn't quite understand in the hope to get the battery going. Will work on it.

Output of less /proc/bus/input/devices:
`
I: Bus=0019 Vendor=0000 Product=0005 Version=0000
1 N: Name="Lid Switch"
2 P: Phys=PNP0C0D/button/input0
3 S: Sysfs=/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0D:00/input/input0
4 U: Uniq=
5 H: Handlers=event0
6 B: PROP=0
7 B: EV=21
8 B: SW=1
9
10 I: Bus=0019 Vendor=0000 Product=0006 Version=0000
11 N: Name="Video Bus"
12 P: Phys=LNXVIDEO/video/input0
13 S: Sysfs=/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input1
14 U: Uniq=
15 H: Handlers=kbd event1
16 B: PROP=0
17 B: EV=3
18 B: KEY=3e000b00000000 0 0 0
19
20 I: Bus=0003 Vendor=045e Product=09c0 Version=0111
21 N: Name="Microsoft Surface Type Cover Keyboard"
22 P: Phys=usb-0000:00:14.0-8/input0
23 S: Sysfs=/devices/pci0000:00/0000:00:14.0/usb3/3-8/3-8:1.0/0003:045E:09C0.0001/input/input2
24 U: Uniq=
25 H: Handlers=sysrq kbd event2 leds
26 B: PROP=0
27 B: EV=120013
28 B: KEY=1000000000007 ff800000000007ff febeffdff3cfffff fffffffffffffffe
29 B: MSC=10
30 B: LED=7
31
32 I: Bus=0003 Vendor=045e Product=09c0 Version=0111
33 N: Name="Microsoft Surface Type Cover Consumer Control"
34 P: Phys=usb-0000:00:14.0-8/input0
35 S: Sysfs=/devices/pci0000:00/0000:00:14.0/usb3/3-8/3-8:1.0/0003:045E:09C0.0001/input/input4
36 U: Uniq=
37 H: Handlers=kbd event3
38 B: PROP=0
39 B: EV=1f
40 B: KEY=3f000301ff 0 0 483ffff17aff32d bfd4444600000000 1 130ff38b17c000 677bfad9415fed 19ed68000004400 10000002
41 B: REL=1040
42 B: ABS=100000000
43 B: MSC=10
44
45 I: Bus=0003 Vendor=045e Product=09c0 Version=0111
46 N: Name="Microsoft Surface Type Cover Touchpad"
47 P: Phys=usb-0000:00:14.0-8/input0
48 S: Sysfs=/devices/pci0000:00/0000:00:14.0/usb3/3-8/3-8:1.0/0003:045E:09C0.0001/input/input6
49 U: Uniq=
50 H: Handlers=event4 mouse0
51 B: PROP=5
52 B: EV=1b
53 B: KEY=e520 10000 0 0 0 0
54 B: ABS=260800000000003
55 B: MSC=20
56
57 I: Bus=0010 Vendor=001f Product=0001 Version=0100
58 N: Name="PC Speaker"
59 P: Phys=isa0061/input0
60 S: Sysfs=/devices/platform/pcspkr/input/input21
61 U: Uniq=
62 H: Handlers=kbd event5
63 B: PROP=0
64 B: EV=40001
65 B: SND=6
66
67 I: Bus=0000 Vendor=0000 Product=0000 Version=0000
68 N: Name="HDA Intel PCH Mic"
69 P: Phys=ALSA
70 S: Sysfs=/devices/pci0000:00/0000:00:1f.3/sound/card0/input22
71 U: Uniq=
72 H: Handlers=event6
73 B: PROP=0
74 B: EV=21
75 B: SW=10
76
77 I: Bus=0000 Vendor=0000 Product=0000 Version=0000
78 N: Name="HDA Intel PCH Headphone"
79 P: Phys=ALSA
80 S: Sysfs=/devices/pci0000:00/0000:00:1f.3/sound/card0/input23
81 U: Uniq=
82 H: Handlers=event7
83 B: PROP=0
84 B: EV=21
85 B: SW=4

`

Do you think the touchpad is still not being seen by the kernel? How do I find it if it is I2C?

EDIT: I meant touchscreen, not touchpad

@aleksfadini
Copy link
Author

I got this:

dmesg | grep i2c [ 1.631795] Modules linked in: i915(+) intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart mei hid_multitouch usbhid hid_generic i2c_hid hid_sensor_hub hid [ 1.632107] Modules linked in: i915(+) intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart mei hid_multitouch usbhid hid_generic i2c_hid hid_sensor_hub hid [ 1.633518] Modules linked in: i915(+) intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart mei hid_multitouch usbhid hid_generic i2c_hid hid_sensor_hub hid [ 10.606125] nls_cp437 intel_uncore intel_spi_pci intel_spi spi_nor vfat snd_timer fat intel_rapl_perf pcspkr cfg80211 snd idma64 mtd soundcore mei_me intel_lpss_pci intel_lpss intel_ish_ipc int3403_thermal int340x_thermal_zone rfkill intel_ishtp wmi surface_sam_san surfacepro3_button surface_sam_ssh evdev soc_button_array mac_hid int3400_thermal acpi_tad acpi_thermal_rel dptf_power ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 xhci_pci xhci_hcd crc32c_intel intel_ipts i915 intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart mei hid_multitouch usbhid hid_generic i2c_hid hid_sensor_hub hid

@qzed
Copy link
Member

qzed commented Oct 28, 2019

Yeah, looks like the touchscre

You mean touchscreen? Looks like it's not present there.

How do I find it if it is I2C?

Unfortunately, I2C devices are difficult to identify. There's no mechanism built into the specification as for example on USB or PCIe. You'd likely have to look around on Windows if you can find anything there. A good place to start would be the device manager. You could try to look for the device by its connection (View->Devices by connection) and figure out its "BIOS device name" property, if it has one (right click -> properties -> details -> search for the entry in the drop-down menu). Also other stuff like which driver it uses.

@aleksfadini
Copy link
Author

I entirely eradicated Windows. I realize this could be particularly bad for updating firmware reasons too. Maybe I can repartition and dual boot... Touchscreen is priority number 1, battery number 2. Where can I find you on IRC? If we could find some way to fix the touchscreen that would really help.

@qzed
Copy link
Member

qzed commented Oct 28, 2019

I realize this could be particularly bad for updating firmware reasons too.

Thanks to @StollD updating the firmware can also be done from Linux and is arguably better (you can also down-grade, unlike on Windows). The problem is only in getting the latest firmware files. It usually takes some time until MS uploads them to their page.

Apart form that I'd recommend you still keep Windows around for a while, it's a new device after all.

Where can I find you on IRC?

freenode/##linux-surface, when I'm at home

@Hugal31
Copy link

Hugal31 commented Oct 30, 2019

I was told to dump my Windows devices information here about the Battery and the Touchscreen.

The battery:

Device {BFD21D0B-D701-435B-AF19-4A20677898D3}\SurfaceBattery\3&c7da34a&0&SID_DEVICE_03

Driver Name: oem16.inf
Class Guid: {72631e54-78a4-11d0-bcf7-00aa00b7b32a}
Driver Date: 08/22/2019
Driver Version: 1.60.139.0
Driver Provider: Surface
Driver Section: SurfaceBattery_Device.NT
Driver Rank: 0xFF0000
Matching Device Id: {BFD21D0B-D701-435B-AF19-4A20677898D3}\SurfaceBattery
Outranked Drivers: 
Device Updated: false
Parent Device: ACPI\MSHW0116\2&daba3ff&1

Intel(R) Precise Touch Device:

Device PCI\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30\3&11583659&0&B4

Driver Name: oem68.inf
Class Guid: {745a17a0-74d3-11d0-b6fe-00a0c90f57da}
Driver Date: 07/30/2019
Driver Version: 2.1.0.59
Driver Provider: Intel Corporation
Driver Section: precisetouch.Inst.NT
Driver Rank: 0xFF2001
Matching Device Id: PCI\VEN_8086&DEV_34E4
Outranked Drivers: 
Device Updated: false
Parent Device: ACPI\PNP0A08\0
Its children

Bluetooth OOB Coupling Channel

Device HID\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30&Col09\4&1f1b4878&0&0008

Driver Name: BthOob.inf
Class Guid: {745a17a0-74d3-11d0-b6fe-00a0c90f57da}
Driver Date: 06/21/2006
Driver Version: 10.0.18362.145
Driver Provider: Microsoft
Driver Section: BthOob.Install.NT
Driver Rank: 0xFF0004
Matching Device Id: HID\VID_045E&UP:FFF4_U:0001
Outranked Drivers: input.inf:HID_DEVICE_UPR:FF00-FFFF:00FF1006 input.inf:HID_DEVICE:00FF1007
Device Updated: false
Parent Device: PCI\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30\3&11583659&0&B4

HID-compliant pen

Device HID\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30&Col08\4&1f1b4878&0&0007

Driver Name: input.inf
Class Guid: {745a17a0-74d3-11d0-b6fe-00a0c90f57da}
Driver Date: 06/21/2006
Driver Version: 10.0.18362.175
Driver Provider: Microsoft
Driver Section: HID_Raw_Inst.NT
Driver Rank: 0xFF1005
Matching Device Id: HID_DEVICE_UP:000D_U:0002
Outranked Drivers: input.inf:HID_DEVICE:00FF1006
Device Updated: false
Parent Device: PCI\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30\3&11583659&0&B4

HID-compliant touch screen

Device HID\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30&Col01\4&1f1b4878&0&0000

Driver Name: input.inf
Class Guid: {745a17a0-74d3-11d0-b6fe-00a0c90f57da}
Driver Date: 06/21/2006
Driver Version: 10.0.18362.175
Driver Provider: Microsoft
Driver Section: HID_Raw_Inst.NT
Driver Rank: 0xFF1006
Matching Device Id: HID_DEVICE_UPR:FF00-FFFF
Outranked Drivers: input.inf:HID_DEVICE:00FF1007
Device Updated: false
Parent Device: PCI\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30\3&11583659&0&B4

HID-compliant vendor-defined device

Device HID\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30&Col01\4&1f1b4878&0&0000

Driver Name: input.inf
Class Guid: {745a17a0-74d3-11d0-b6fe-00a0c90f57da}
Driver Date: 06/21/2006
Driver Version: 10.0.18362.175
Driver Provider: Microsoft
Driver Section: HID_Raw_Inst.NT
Driver Rank: 0xFF1006
Matching Device Id: HID_DEVICE_UPR:FF00-FFFF
Outranked Drivers: input.inf:HID_DEVICE:00FF1007
Device Updated: false
Parent Device: PCI\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30\3&11583659&0&B4

HID-compliant vendor-defined device

Device HID\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30&Col06\4&1f1b4878&0&0005

Driver Name: input.inf
Class Guid: {745a17a0-74d3-11d0-b6fe-00a0c90f57da}
Driver Date: 06/21/2006
Driver Version: 10.0.18362.175
Driver Provider: Microsoft
Driver Section: HID_Raw_Inst.NT
Driver Rank: 0xFF1006
Matching Device Id: HID_DEVICE_UPR:FF00-FFFF
Outranked Drivers: input.inf:HID_DEVICE:00FF1007
Device Updated: false
Parent Device: PCI\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30\3&11583659&0&B4

HID-compliant vendor-defined device

Device HID\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30&Col0A\4&1f1b4878&0&0009

Driver Name: input.inf
Class Guid: {745a17a0-74d3-11d0-b6fe-00a0c90f57da}
Driver Date: 06/21/2006
Driver Version: 10.0.18362.175
Driver Provider: Microsoft
Driver Section: HID_Raw_Inst.NT
Driver Rank: 0xFF1006
Matching Device Id: HID_DEVICE_UPR:FF00-FFFF
Outranked Drivers: input.inf:HID_DEVICE:00FF1007
Device Updated: false
Parent Device: PCI\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30\3&11583659&0&B4

Surface Digitize Component Firmware Update

Device HID\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30&Col05\4&1f1b4878&0&0004

Driver Name: input.inf
Class Guid: {745a17a0-74d3-11d0-b6fe-00a0c90f57da}
Driver Date: 06/21/2006
Driver Version: 10.0.18362.175
Driver Provider: Microsoft
Driver Section: HID_Raw_Inst.NT
Driver Rank: 0xFF1006
Matching Device Id: HID_DEVICE_UPR:FF00-FFFF
Outranked Drivers: input.inf:HID_DEVICE:00FF1007
Device Updated: false
Parent Device: PCI\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30\3&11583659&0&B4

Surface Integration Firmware Update

Device HID\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30&Col03\4&1f1b4878&0&0002
Driver Name: input.inf
Class Guid: {745a17a0-74d3-11d0-b6fe-00a0c90f57da}
Driver Date: 06/21/2006
Driver Version: 10.0.18362.175
Driver Provider: Microsoft
Driver Section: HID_Raw_Inst.NT
Driver Rank: 0xFF1006
Matching Device Id: HID_DEVICE_UPR:FF00-FFFF
Outranked Drivers: input.inf:HID_DEVICE:00FF1007
Device Updated: false
Parent Device: PCI\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30\3&11583659&0&B4

Surface Touch Pen Processor

Device HID\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30&Col04\4&1f1b4878&0&0003

Driver Name: heat.inf
Class Guid: {745a17a0-74d3-11d0-b6fe-00a0c90f57da}
Driver Date: 06/21/2006
Driver Version: 10.0.18362.1
Driver Provider: Microsoft
Driver Section: HeatHid.Inst.NT
Driver Rank: 0xFF0005
Matching Device Id: HID_DEVICE_UP:000D_U:000F
Outranked Drivers: input.inf:HID_DEVICE:00FF1006
Device Updated: false
Parent Device: PCI\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30\3&11583659&0&B4

Surface Virtual Function Enum Device

Device HID\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30&Col04\4&1f1b4878&0&0003

Driver Name: heat.inf
Class Guid: {745a17a0-74d3-11d0-b6fe-00a0c90f57da}
Driver Date: 06/21/2006
Driver Version: 10.0.18362.1
Driver Provider: Microsoft
Driver Section: HeatHid.Inst.NT
Driver Rank: 0xFF0005
Matching Device Id: HID_DEVICE_UP:000D_U:000F
Outranked Drivers: input.inf:HID_DEVICE:00FF1006
Device Updated: false
Parent Device: PCI\VEN_8086&DEV_34E4&SUBSYS_00391414&REV_30\3&11583659&0&B4

And the driver files:
firmware.zip

I hope it helps. Thanks for all your work!

@StollD
Copy link
Member

StollD commented Oct 30, 2019

Intel(R) Precise Touch Device:

This makes me wonder: Could the intel variants still be using IPTS, but maybe it got modified so much that the old driver from intel we are using cannot detect it anymore?

If you look at the firmware packages from Microsoft, the SP7 and the SL3-Intel still contain the itouch driver, that is also in the downloads for older surfaces with IPTS. However it has seen some updates since it's last release in a surface product as it seems.

The driver in the firmware package for SB2 references the following PCI devices:

  • 8086:9d3e (thats what I see on my SB2)
  • 8086:a13e (this might be the for SB1 / SP4 who seem to use different IPTS hardware)

The new driver has all of these:

  • 8086:9d3e
  • 8086:a13e
  • 8086:9de4
  • 8086:a364
  • 8086:34e4
  • 8086:3de4
  • 8086:02e4
  • 8086:06e4

All these new IDs are commented with stuff like CNP (cannonpoint?), ICL (icelake?) etc, so I guess they refactored IPTS somewhere after Kabylake.

%precisetouch.DeviceDesc%    = precisetouch.Inst, PCI\VEN_8086&DEV_9D3E
%precisetouch.DeviceDesc%    = precisetouch.Inst, PCI\VEN_8086&DEV_A13E

;CNP_PCH-LP HECI3 HWID
%precisetouch.DeviceDesc%    = precisetouch.Inst, PCI\VEN_8086&DEV_9DE4
;CNP_PCH-H HECI3 HWID
%precisetouch.DeviceDesc%    = precisetouch.Inst, PCI\VEN_8086&DEV_A364

;ICL_PCH-LP HECI3 HWID
%precisetouch.DeviceDesc%    = precisetouch.Inst, PCI\VEN_8086&DEV_34E4
;ICL_PCH-H HECI3 HWID
%precisetouch.DeviceDesc%    = precisetouch.Inst, PCI\VEN_8086&DEV_3DE4

;CML_PCH-LP HECI3 HWID
%precisetouch.DeviceDesc%    = precisetouch.Inst, PCI\VEN_8086&DEV_02E4
;CMP_PCH-H HECI3 HWID
%precisetouch.DeviceDesc%    = precisetouch.Inst, PCI\VEN_8086&DEV_06E4

What is missing from the firmware bundle is the driver that used to provide the driver level firmware, which makes me guess that they might have moved it into the touch firmware of the BIOS somehow. We will have to figure that out somehow.

@Hugal31 could you boot into linux and post the output of these two commands?

# lspci will not show proper device IDs by default...
$ lspci -nn
$ ls -l /sys/bus/mei/devices/

Maybe we can identify the new MEI ID for IPTS that way, and try getting the driver to probe.

Also, the IPTS driver on Windows seems to store some settings in the registry, maybe that can help us with where it gets the driver firmware from: Could you run regedit, and search for itouch or IntTouch under HKEY_LOCAL_MACHINE? And then just screenshot everything you can find under it (or export the key, IIRC that was possible but I am not sure).

The older drivers seem to use HKLM\SYSTEM\CurrentControlSet\Services\IntTouch so that might be worth checking out first (the new driver doesn't have the full path specified).

@aleksfadini
Copy link
Author

aleksfadini commented Oct 30, 2019

@StollD thank you for looking into this. Using SP7 here as well, but no Windows installed, so here are the outputs I can give:

$ lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Device [8086:8a12] (rev 03)
00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:8a52] (rev 07)
00:04.0 Signal processing controller [1180]: Intel Corporation Device [8086:8a03] (rev 03)
00:0d.0 USB controller [0c03]: Intel Corporation Ice Lake Thunderbolt 3 USB Controller [8086:8a13] (rev 03)
00:12.0 Serial controller [0700]: Intel Corporation Device [8086:34fc] (rev 30)
00:14.0 USB controller [0c03]: Intel Corporation Ice Lake-LP USB 3.1 xHCI Host Controller [8086:34ed] (rev 30)
00:14.2 RAM memory [0500]: Intel Corporation Device [8086:34ef] (rev 30)
00:14.3 Network controller [0280]: Intel Corporation Device [8086:34f0] (rev 30)
00:15.0 Serial bus controller [0c80]: Intel Corporation Ice Lake-LP Serial IO I2C Controller #0 [8086:34e8] (rev 30)
00:16.0 Communication controller [0780]: Intel Corporation Device [8086:34e0] (rev 30)
00:16.4 Communication controller [0780]: Intel Corporation Device [8086:34e4] (rev 30)
00:19.0 Serial bus controller [0c80]: Intel Corporation Ice Lake-LP Serial IO I2c Controller qzed/linux-surfacegen5-acpi#4 [8086:34c5] (rev 30)
00:1d.0 PCI bridge [0604]: Intel Corporation Ice Lake-LP PCI Express Root Port linux-surface/surface-aggregator-module#9 [8086:34b0] (rev 30)
00:1e.0 Communication controller [0780]: Intel Corporation Ice Lake-LP Serial IO UART Controller #0 [8086:34a8] (rev 30)
00:1f.0 ISA bridge [0601]: Intel Corporation Ice Lake-LP LPC Controller [8086:3482] (rev 30)
00:1f.3 Audio device [0403]: Intel Corporation Device [8086:34c8] (rev 30)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Ice Lake-LP SPI Controller [8086:34a4] (rev 30)
01:00.0 Non-Volatile memory controller [0108]: SK hynix Device [1c5c:1327]

$ ls -l /sys/bus/mei/devices/
total 0
lrwxrwxrwx 1 root root 0 Oct 30 14:30 mei::082ee5a7-7c25-470a-9643-0c06f0466ea1:00 -> ../../../devices/pci0000:00/0000:00:16.0/mei::082ee5a7-7c25-470a-9643-0c06f0466ea1:00
lrwxrwxrwx 1 root root 0 Oct 30 14:30 mei::309dcde8-ccb1-4062-8f78-600115a34327:01 -> ../../../devices/pci0000:00/0000:00:16.0/mei::309dcde8-ccb1-4062-8f78-600115a34327:01
lrwxrwxrwx 1 root root 0 Oct 30 14:30 mei::42b3ce2f-bd9f-485a-96ae-26406230b1ff:01 -> ../../../devices/pci0000:00/0000:00:16.0/mei::42b3ce2f-bd9f-485a-96ae-26406230b1ff:01
lrwxrwxrwx 1 root root 0 Oct 30 14:30 mei::4fcc395c-a9e5-4647-bc68-47bad7cc6bd3:01 -> ../../../devices/pci0000:00/0000:00:16.0/mei::4fcc395c-a9e5-4647-bc68-47bad7cc6bd3:01
lrwxrwxrwx 1 root root 0 Oct 30 14:30 mei::55213584-9a29-4916-badf-0fb7ed682aeb:02 -> ../../../devices/pci0000:00/0000:00:16.0/mei::55213584-9a29-4916-badf-0fb7ed682aeb:02
lrwxrwxrwx 1 root root 0 Oct 30 14:30 mei::6861ec7b-d07a-4673-856c-7f22b4d55769:01 -> ../../../devices/pci0000:00/0000:00:16.0/mei::6861ec7b-d07a-4673-856c-7f22b4d55769:01
lrwxrwxrwx 1 root root 0 Oct 30 14:30 mei::8c2f4425-77d6-4755-aca3-891fdbc66a58:01 -> ../../../devices/pci0000:00/0000:00:16.0/mei::8c2f4425-77d6-4755-aca3-891fdbc66a58:01
lrwxrwxrwx 1 root root 0 Oct 30 14:30 mei::8e6a6715-9abc-4043-88ef-9e39c6f63e0f:02 -> ../../../devices/pci0000:00/0000:00:16.0/mei::8e6a6715-9abc-4043-88ef-9e39c6f63e0f:02
lrwxrwxrwx 1 root root 0 Oct 30 14:14 mei::b638ab7e-94e2-4ea2-a552-d1c54b627f04:01 -> ../../../devices/pci0000:00/0000:00:16.0/mei::b638ab7e-94e2-4ea2-a552-d1c54b627f04:01
lrwxrwxrwx 1 root root 0 Oct 30 14:30 mei::dd17041c-09ea-4b17-a271-5b989867ec65:01 -> ../../../devices/pci0000:00/0000:00:16.0/mei::dd17041c-09ea-4b17-a271-5b989867ec65:01
lrwxrwxrwx 1 root root 0 Oct 30 14:30 mei::f908627d-13bf-4a04-b91f-a64e9245323d:01 -> ../../../devices/pci0000:00/0000:00:16.0/mei::f908627d-13bf-4a04-b91f-a64e9245323d:01

EDIT:
@Hugal31 maybe you can do the windows part?

@StollD You guys are awesome for doing this! I wanted to donate something but couldnt find your payal on your website. You could think of adding it :)

@aleksfadini aleksfadini changed the title Surface Pro 7: Battery status Surface Pro 7: Battery status / touchscreen status Oct 30, 2019
@StollD
Copy link
Member

StollD commented Oct 30, 2019

Ok, I forgot that the kernel mei module needs to know about the PCI device first (in fact IPTS still patches in the device for surface because upstream doesn't have it).

You can either recompile the kernel to add the IDs (I will prepare a patch for that later), but for the moment you can bind the driver to the device manually:

$ echo '0000:00:16.4' | sudo tee /sys/module/mei_me/driver/pci:mei_me/bind

After that please rerun ls -l /sys/bus/mei/devices/ and post the output. If we are lucky, the MEI ID for IPTS hasn't changed (just the PCI ID), and the IPTS driver would probe without modifications.

@aleksfadini
Copy link
Author

]$ echo '0000:00:16.4' | sudo tee /sys/module/mei_me/driver/pci:mei_me/bind
[sudo] password for aleks:
tee: '/sys/module/mei_me/driver/pci:mei_me/bind': No such file or directory
0000:00:16.4

Does that mean I'm not lucky?

@StollD
Copy link
Member

StollD commented Oct 30, 2019

No, it means I am too stupid to type correctly. Please try again with drivers instead of driver. Oops.

@aleksfadini
Copy link
Author

No, it means I don't know much about Linux and thankfully we have people like you.

That folder didn't seem to work either, I tried this:

$ sudo echo '0000:00:16.4' | sudo tee /sys/module/mei_me/drivers/pci:mei_me/module/drivers/bind
tee: '/sys/module/mei_me/drivers/pci:mei_me/module/drivers/bind': Permission denied
0000:00:16.4

@StollD
Copy link
Member

StollD commented Oct 30, 2019

$ sudo echo '0000:00:16.4' | sudo tee /sys/module/mei_me/drivers/pci:mei_me/module/drivers/bind you have a duplicated /module/drivers in there.

The full command should be echo '0000:00:16.4' | sudo tee /sys/module/mei_me/drivers/pci:mei_me/bind. I should have just posted that again, earlier, sorry.

@aleksfadini
Copy link
Author

Well, I tried that after your second message about replacing driver with drivers and I got this:
$ echo '0000:00:16.4' | sudo tee /sys/module/mei_me/drivers/pci:mei_me/bind
0000:00:16.4
tee: '/sys/module/mei_me/drivers/pci:mei_me/bind': No such device

@aleksfadini
Copy link
Author

I guess THAT means that I'm not lucky!

qzed pushed a commit that referenced this issue Jan 7, 2021
[ Upstream commit 4e79f02 ]

When running in BE mode on LPAE hardware with a PA-to-VA translation
that exceeds 4 GB, we patch bits 39:32 of the offset into the wrong
byte of the opcode. So fix that, by rotating the offset in r0 to the
right by 8 bits, which will put the 8-bit immediate in bits 31:24.

Note that this will also move bit #22 in its correct place when
applying the rotation to the constant #0x400000.

Fixes: d9a790d ("ARM: 7883/1: fix mov to mvn conversion in case of 64 bit phys_addr_t and BE")
Acked-by: Nicolas Pitre <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
qzed pushed a commit that referenced this issue Jan 7, 2021
[ Upstream commit 4e79f02 ]

When running in BE mode on LPAE hardware with a PA-to-VA translation
that exceeds 4 GB, we patch bits 39:32 of the offset into the wrong
byte of the opcode. So fix that, by rotating the offset in r0 to the
right by 8 bits, which will put the 8-bit immediate in bits 31:24.

Note that this will also move bit #22 in its correct place when
applying the rotation to the constant #0x400000.

Fixes: d9a790d ("ARM: 7883/1: fix mov to mvn conversion in case of 64 bit phys_addr_t and BE")
Acked-by: Nicolas Pitre <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
@kasperk81
Copy link

so I guess they refactored IPTS somewhere after Kabylake.

upstream patches for kabylake IPTS torvalds/linux@4afc339 and skylake torvalds/linux@372a829 seem related.

qzed pushed a commit that referenced this issue May 21, 2021
[ Upstream commit e125fbe ]

This patch fixes an deadlock issue when dlm_lowcomms_close() is called.
When dlm_lowcomms_close() is called the clusters_root.subsys.su_mutex is
held to remove configfs items. At this time we flushing (e.g.
cancel_work_sync()) the workers of send and recv workqueue. Due the fact
that we accessing configfs items (mark values), these workers will lock
clusters_root.subsys.su_mutex as well which are already hold by
dlm_lowcomms_close() and ends in a deadlock situation.

[67170.703046] ======================================================
[67170.703965] WARNING: possible circular locking dependency detected
[67170.704758] 5.11.0-rc4+ #22 Tainted: G        W
[67170.705433] ------------------------------------------------------
[67170.706228] dlm_controld/280 is trying to acquire lock:
[67170.706915] ffff9f2f475a6948 ((wq_completion)dlm_recv){+.+.}-{0:0}, at: __flush_work+0x203/0x4c0
[67170.708026]
               but task is already holding lock:
[67170.708758] ffffffffa132f878 (&clusters_root.subsys.su_mutex){+.+.}-{3:3}, at: configfs_rmdir+0x29b/0x310
[67170.710016]
               which lock already depends on the new lock.

The new behaviour adds the mark value to the node address configuration
which doesn't require to held the clusters_root.subsys.su_mutex by
accessing mark values in a separate datastructure. However the mark
values can be set now only after a node address was set which is the
case when the user is using dlm_controld.

Signed-off-by: Alexander Aring <[email protected]>
Signed-off-by: David Teigland <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
qzed pushed a commit that referenced this issue May 21, 2021
[ Upstream commit e125fbe ]

This patch fixes an deadlock issue when dlm_lowcomms_close() is called.
When dlm_lowcomms_close() is called the clusters_root.subsys.su_mutex is
held to remove configfs items. At this time we flushing (e.g.
cancel_work_sync()) the workers of send and recv workqueue. Due the fact
that we accessing configfs items (mark values), these workers will lock
clusters_root.subsys.su_mutex as well which are already hold by
dlm_lowcomms_close() and ends in a deadlock situation.

[67170.703046] ======================================================
[67170.703965] WARNING: possible circular locking dependency detected
[67170.704758] 5.11.0-rc4+ #22 Tainted: G        W
[67170.705433] ------------------------------------------------------
[67170.706228] dlm_controld/280 is trying to acquire lock:
[67170.706915] ffff9f2f475a6948 ((wq_completion)dlm_recv){+.+.}-{0:0}, at: __flush_work+0x203/0x4c0
[67170.708026]
               but task is already holding lock:
[67170.708758] ffffffffa132f878 (&clusters_root.subsys.su_mutex){+.+.}-{3:3}, at: configfs_rmdir+0x29b/0x310
[67170.710016]
               which lock already depends on the new lock.

The new behaviour adds the mark value to the node address configuration
which doesn't require to held the clusters_root.subsys.su_mutex by
accessing mark values in a separate datastructure. However the mark
values can be set now only after a node address was set which is the
case when the user is using dlm_controld.

Signed-off-by: Alexander Aring <[email protected]>
Signed-off-by: David Teigland <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
kitakar5525 pushed a commit to kitakar5525/linux-kernel that referenced this issue Jul 26, 2021
The problem occurs between dev_get_by_index() and dev_xdp_attach_link().
At this point, dev_xdp_uninstall() is called. Then xdp link will not be
detached automatically when dev is released. But link->dev already
points to dev, when xdp link is released, dev will still be accessed,
but dev has been released.

dev_get_by_index()        |
link->dev = dev           |
                          |      rtnl_lock()
                          |      unregister_netdevice_many()
                          |          dev_xdp_uninstall()
                          |      rtnl_unlock()
rtnl_lock();              |
dev_xdp_attach_link()     |
rtnl_unlock();            |
                          |      netdev_run_todo() // dev released
bpf_xdp_link_release()    |
    /* access dev.        |
       use-after-free */  |

[   45.966867] BUG: KASAN: use-after-free in bpf_xdp_link_release+0x3b8/0x3d0
[   45.967619] Read of size 8 at addr ffff00000f9980c8 by task a.out/732
[   45.968297]
[   45.968502] CPU: 1 PID: 732 Comm: a.out Not tainted 5.13.0+ linux-surface#22
[   45.969222] Hardware name: linux,dummy-virt (DT)
[   45.969795] Call trace:
[   45.970106]  dump_backtrace+0x0/0x4c8
[   45.970564]  show_stack+0x30/0x40
[   45.970981]  dump_stack_lvl+0x120/0x18c
[   45.971470]  print_address_description.constprop.0+0x74/0x30c
[   45.972182]  kasan_report+0x1e8/0x200
[   45.972659]  __asan_report_load8_noabort+0x2c/0x50
[   45.973273]  bpf_xdp_link_release+0x3b8/0x3d0
[   45.973834]  bpf_link_free+0xd0/0x188
[   45.974315]  bpf_link_put+0x1d0/0x218
[   45.974790]  bpf_link_release+0x3c/0x58
[   45.975291]  __fput+0x20c/0x7e8
[   45.975706]  ____fput+0x24/0x30
[   45.976117]  task_work_run+0x104/0x258
[   45.976609]  do_notify_resume+0x894/0xaf8
[   45.977121]  work_pending+0xc/0x328
[   45.977575]
[   45.977775] The buggy address belongs to the page:
[   45.978369] page:fffffc00003e6600 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4f998
[   45.979522] flags: 0x7fffe0000000000(node=0|zone=0|lastcpupid=0x3ffff)
[   45.980349] raw: 07fffe0000000000 fffffc00003e6708 ffff0000dac3c010 0000000000000000
[   45.981309] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[   45.982259] page dumped because: kasan: bad access detected
[   45.982948]
[   45.983153] Memory state around the buggy address:
[   45.983753]  ffff00000f997f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   45.984645]  ffff00000f998000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   45.985533] >ffff00000f998080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   45.986419]                                               ^
[   45.987112]  ffff00000f998100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   45.988006]  ffff00000f998180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   45.988895] ==================================================================
[   45.989773] Disabling lock debugging due to kernel taint
[   45.990552] Kernel panic - not syncing: panic_on_warn set ...
[   45.991166] CPU: 1 PID: 732 Comm: a.out Tainted: G    B             5.13.0+ linux-surface#22
[   45.991929] Hardware name: linux,dummy-virt (DT)
[   45.992448] Call trace:
[   45.992753]  dump_backtrace+0x0/0x4c8
[   45.993208]  show_stack+0x30/0x40
[   45.993627]  dump_stack_lvl+0x120/0x18c
[   45.994113]  dump_stack+0x1c/0x34
[   45.994530]  panic+0x3a4/0x7d8
[   45.994930]  end_report+0x194/0x198
[   45.995380]  kasan_report+0x134/0x200
[   45.995850]  __asan_report_load8_noabort+0x2c/0x50
[   45.996453]  bpf_xdp_link_release+0x3b8/0x3d0
[   45.997007]  bpf_link_free+0xd0/0x188
[   45.997474]  bpf_link_put+0x1d0/0x218
[   45.997942]  bpf_link_release+0x3c/0x58
[   45.998429]  __fput+0x20c/0x7e8
[   45.998833]  ____fput+0x24/0x30
[   45.999247]  task_work_run+0x104/0x258
[   45.999731]  do_notify_resume+0x894/0xaf8
[   46.000236]  work_pending+0xc/0x328
[   46.000697] SMP: stopping secondary CPUs
[   46.001226] Dumping ftrace buffer:
[   46.001663]    (ftrace buffer empty)
[   46.002110] Kernel Offset: disabled
[   46.002545] CPU features: 0x00000001,23202c00
[   46.003080] Memory Limit: none

Fixes: aa8d3a7 ("bpf, xdp: Add bpf_link-based XDP attachment API")
Reported-by: Abaci <[email protected]>
Signed-off-by: Xuan Zhuo <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Reviewed-by: Dust Li <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
qzed pushed a commit that referenced this issue Aug 1, 2021
[ Upstream commit 5acc7d3 ]

The problem occurs between dev_get_by_index() and dev_xdp_attach_link().
At this point, dev_xdp_uninstall() is called. Then xdp link will not be
detached automatically when dev is released. But link->dev already
points to dev, when xdp link is released, dev will still be accessed,
but dev has been released.

dev_get_by_index()        |
link->dev = dev           |
                          |      rtnl_lock()
                          |      unregister_netdevice_many()
                          |          dev_xdp_uninstall()
                          |      rtnl_unlock()
rtnl_lock();              |
dev_xdp_attach_link()     |
rtnl_unlock();            |
                          |      netdev_run_todo() // dev released
bpf_xdp_link_release()    |
    /* access dev.        |
       use-after-free */  |

[   45.966867] BUG: KASAN: use-after-free in bpf_xdp_link_release+0x3b8/0x3d0
[   45.967619] Read of size 8 at addr ffff00000f9980c8 by task a.out/732
[   45.968297]
[   45.968502] CPU: 1 PID: 732 Comm: a.out Not tainted 5.13.0+ #22
[   45.969222] Hardware name: linux,dummy-virt (DT)
[   45.969795] Call trace:
[   45.970106]  dump_backtrace+0x0/0x4c8
[   45.970564]  show_stack+0x30/0x40
[   45.970981]  dump_stack_lvl+0x120/0x18c
[   45.971470]  print_address_description.constprop.0+0x74/0x30c
[   45.972182]  kasan_report+0x1e8/0x200
[   45.972659]  __asan_report_load8_noabort+0x2c/0x50
[   45.973273]  bpf_xdp_link_release+0x3b8/0x3d0
[   45.973834]  bpf_link_free+0xd0/0x188
[   45.974315]  bpf_link_put+0x1d0/0x218
[   45.974790]  bpf_link_release+0x3c/0x58
[   45.975291]  __fput+0x20c/0x7e8
[   45.975706]  ____fput+0x24/0x30
[   45.976117]  task_work_run+0x104/0x258
[   45.976609]  do_notify_resume+0x894/0xaf8
[   45.977121]  work_pending+0xc/0x328
[   45.977575]
[   45.977775] The buggy address belongs to the page:
[   45.978369] page:fffffc00003e6600 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4f998
[   45.979522] flags: 0x7fffe0000000000(node=0|zone=0|lastcpupid=0x3ffff)
[   45.980349] raw: 07fffe0000000000 fffffc00003e6708 ffff0000dac3c010 0000000000000000
[   45.981309] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[   45.982259] page dumped because: kasan: bad access detected
[   45.982948]
[   45.983153] Memory state around the buggy address:
[   45.983753]  ffff00000f997f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   45.984645]  ffff00000f998000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   45.985533] >ffff00000f998080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   45.986419]                                               ^
[   45.987112]  ffff00000f998100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   45.988006]  ffff00000f998180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   45.988895] ==================================================================
[   45.989773] Disabling lock debugging due to kernel taint
[   45.990552] Kernel panic - not syncing: panic_on_warn set ...
[   45.991166] CPU: 1 PID: 732 Comm: a.out Tainted: G    B             5.13.0+ #22
[   45.991929] Hardware name: linux,dummy-virt (DT)
[   45.992448] Call trace:
[   45.992753]  dump_backtrace+0x0/0x4c8
[   45.993208]  show_stack+0x30/0x40
[   45.993627]  dump_stack_lvl+0x120/0x18c
[   45.994113]  dump_stack+0x1c/0x34
[   45.994530]  panic+0x3a4/0x7d8
[   45.994930]  end_report+0x194/0x198
[   45.995380]  kasan_report+0x134/0x200
[   45.995850]  __asan_report_load8_noabort+0x2c/0x50
[   45.996453]  bpf_xdp_link_release+0x3b8/0x3d0
[   45.997007]  bpf_link_free+0xd0/0x188
[   45.997474]  bpf_link_put+0x1d0/0x218
[   45.997942]  bpf_link_release+0x3c/0x58
[   45.998429]  __fput+0x20c/0x7e8
[   45.998833]  ____fput+0x24/0x30
[   45.999247]  task_work_run+0x104/0x258
[   45.999731]  do_notify_resume+0x894/0xaf8
[   46.000236]  work_pending+0xc/0x328
[   46.000697] SMP: stopping secondary CPUs
[   46.001226] Dumping ftrace buffer:
[   46.001663]    (ftrace buffer empty)
[   46.002110] Kernel Offset: disabled
[   46.002545] CPU features: 0x00000001,23202c00
[   46.003080] Memory Limit: none

Fixes: aa8d3a7 ("bpf, xdp: Add bpf_link-based XDP attachment API")
Reported-by: Abaci <[email protected]>
Signed-off-by: Xuan Zhuo <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Reviewed-by: Dust Li <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
qzed pushed a commit that referenced this issue Aug 1, 2021
[ Upstream commit 5acc7d3 ]

The problem occurs between dev_get_by_index() and dev_xdp_attach_link().
At this point, dev_xdp_uninstall() is called. Then xdp link will not be
detached automatically when dev is released. But link->dev already
points to dev, when xdp link is released, dev will still be accessed,
but dev has been released.

dev_get_by_index()        |
link->dev = dev           |
                          |      rtnl_lock()
                          |      unregister_netdevice_many()
                          |          dev_xdp_uninstall()
                          |      rtnl_unlock()
rtnl_lock();              |
dev_xdp_attach_link()     |
rtnl_unlock();            |
                          |      netdev_run_todo() // dev released
bpf_xdp_link_release()    |
    /* access dev.        |
       use-after-free */  |

[   45.966867] BUG: KASAN: use-after-free in bpf_xdp_link_release+0x3b8/0x3d0
[   45.967619] Read of size 8 at addr ffff00000f9980c8 by task a.out/732
[   45.968297]
[   45.968502] CPU: 1 PID: 732 Comm: a.out Not tainted 5.13.0+ #22
[   45.969222] Hardware name: linux,dummy-virt (DT)
[   45.969795] Call trace:
[   45.970106]  dump_backtrace+0x0/0x4c8
[   45.970564]  show_stack+0x30/0x40
[   45.970981]  dump_stack_lvl+0x120/0x18c
[   45.971470]  print_address_description.constprop.0+0x74/0x30c
[   45.972182]  kasan_report+0x1e8/0x200
[   45.972659]  __asan_report_load8_noabort+0x2c/0x50
[   45.973273]  bpf_xdp_link_release+0x3b8/0x3d0
[   45.973834]  bpf_link_free+0xd0/0x188
[   45.974315]  bpf_link_put+0x1d0/0x218
[   45.974790]  bpf_link_release+0x3c/0x58
[   45.975291]  __fput+0x20c/0x7e8
[   45.975706]  ____fput+0x24/0x30
[   45.976117]  task_work_run+0x104/0x258
[   45.976609]  do_notify_resume+0x894/0xaf8
[   45.977121]  work_pending+0xc/0x328
[   45.977575]
[   45.977775] The buggy address belongs to the page:
[   45.978369] page:fffffc00003e6600 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4f998
[   45.979522] flags: 0x7fffe0000000000(node=0|zone=0|lastcpupid=0x3ffff)
[   45.980349] raw: 07fffe0000000000 fffffc00003e6708 ffff0000dac3c010 0000000000000000
[   45.981309] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[   45.982259] page dumped because: kasan: bad access detected
[   45.982948]
[   45.983153] Memory state around the buggy address:
[   45.983753]  ffff00000f997f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   45.984645]  ffff00000f998000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   45.985533] >ffff00000f998080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   45.986419]                                               ^
[   45.987112]  ffff00000f998100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   45.988006]  ffff00000f998180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   45.988895] ==================================================================
[   45.989773] Disabling lock debugging due to kernel taint
[   45.990552] Kernel panic - not syncing: panic_on_warn set ...
[   45.991166] CPU: 1 PID: 732 Comm: a.out Tainted: G    B             5.13.0+ #22
[   45.991929] Hardware name: linux,dummy-virt (DT)
[   45.992448] Call trace:
[   45.992753]  dump_backtrace+0x0/0x4c8
[   45.993208]  show_stack+0x30/0x40
[   45.993627]  dump_stack_lvl+0x120/0x18c
[   45.994113]  dump_stack+0x1c/0x34
[   45.994530]  panic+0x3a4/0x7d8
[   45.994930]  end_report+0x194/0x198
[   45.995380]  kasan_report+0x134/0x200
[   45.995850]  __asan_report_load8_noabort+0x2c/0x50
[   45.996453]  bpf_xdp_link_release+0x3b8/0x3d0
[   45.997007]  bpf_link_free+0xd0/0x188
[   45.997474]  bpf_link_put+0x1d0/0x218
[   45.997942]  bpf_link_release+0x3c/0x58
[   45.998429]  __fput+0x20c/0x7e8
[   45.998833]  ____fput+0x24/0x30
[   45.999247]  task_work_run+0x104/0x258
[   45.999731]  do_notify_resume+0x894/0xaf8
[   46.000236]  work_pending+0xc/0x328
[   46.000697] SMP: stopping secondary CPUs
[   46.001226] Dumping ftrace buffer:
[   46.001663]    (ftrace buffer empty)
[   46.002110] Kernel Offset: disabled
[   46.002545] CPU features: 0x00000001,23202c00
[   46.003080] Memory Limit: none

Fixes: aa8d3a7 ("bpf, xdp: Add bpf_link-based XDP attachment API")
Reported-by: Abaci <[email protected]>
Signed-off-by: Xuan Zhuo <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Reviewed-by: Dust Li <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
kitakar5525 pushed a commit to kitakar5525/linux-kernel that referenced this issue Dec 6, 2021
In commit 142639a ("drm/msm/a6xx: fix crashstate capture for
A650") we changed a6xx_get_gmu_registers() to read 3 sets of
registers. Unfortunately, we didn't change the memory allocation for
the array. That leads to a KASAN warning (this was on the chromeos-5.4
kernel, which has the problematic commit backported to it):

  BUG: KASAN: slab-out-of-bounds in _a6xx_get_gmu_registers+0x144/0x430
  Write of size 8 at addr ffffff80c89432b0 by task A618-worker/209
  CPU: 5 PID: 209 Comm: A618-worker Tainted: G        W         5.4.156-lockdep linux-surface#22
  Hardware name: Google Lazor Limozeen without Touchscreen (rev5 - rev8) (DT)
  Call trace:
   dump_backtrace+0x0/0x248
   show_stack+0x20/0x2c
   dump_stack+0x128/0x1ec
   print_address_description+0x88/0x4a0
   __kasan_report+0xfc/0x120
   kasan_report+0x10/0x18
   __asan_report_store8_noabort+0x1c/0x24
   _a6xx_get_gmu_registers+0x144/0x430
   a6xx_gpu_state_get+0x330/0x25d4
   msm_gpu_crashstate_capture+0xa0/0x84c
   recover_worker+0x328/0x838
   kthread_worker_fn+0x32c/0x574
   kthread+0x2dc/0x39c
   ret_from_fork+0x10/0x18

  Allocated by task 209:
   __kasan_kmalloc+0xfc/0x1c4
   kasan_kmalloc+0xc/0x14
   kmem_cache_alloc_trace+0x1f0/0x2a0
   a6xx_gpu_state_get+0x164/0x25d4
   msm_gpu_crashstate_capture+0xa0/0x84c
   recover_worker+0x328/0x838
   kthread_worker_fn+0x32c/0x574
   kthread+0x2dc/0x39c
   ret_from_fork+0x10/0x18

Fixes: 142639a ("drm/msm/a6xx: fix crashstate capture for A650")
Signed-off-by: Douglas Anderson <[email protected]>
Link: https://lore.kernel.org/r/20211103153049.1.Idfa574ccb529d17b69db3a1852e49b580132035c@changeid
Signed-off-by: Rob Clark <[email protected]>
qzed pushed a commit that referenced this issue Dec 25, 2021
commit b4d25ab upstream.

In commit 142639a ("drm/msm/a6xx: fix crashstate capture for
A650") we changed a6xx_get_gmu_registers() to read 3 sets of
registers. Unfortunately, we didn't change the memory allocation for
the array. That leads to a KASAN warning (this was on the chromeos-5.4
kernel, which has the problematic commit backported to it):

  BUG: KASAN: slab-out-of-bounds in _a6xx_get_gmu_registers+0x144/0x430
  Write of size 8 at addr ffffff80c89432b0 by task A618-worker/209
  CPU: 5 PID: 209 Comm: A618-worker Tainted: G        W         5.4.156-lockdep #22
  Hardware name: Google Lazor Limozeen without Touchscreen (rev5 - rev8) (DT)
  Call trace:
   dump_backtrace+0x0/0x248
   show_stack+0x20/0x2c
   dump_stack+0x128/0x1ec
   print_address_description+0x88/0x4a0
   __kasan_report+0xfc/0x120
   kasan_report+0x10/0x18
   __asan_report_store8_noabort+0x1c/0x24
   _a6xx_get_gmu_registers+0x144/0x430
   a6xx_gpu_state_get+0x330/0x25d4
   msm_gpu_crashstate_capture+0xa0/0x84c
   recover_worker+0x328/0x838
   kthread_worker_fn+0x32c/0x574
   kthread+0x2dc/0x39c
   ret_from_fork+0x10/0x18

  Allocated by task 209:
   __kasan_kmalloc+0xfc/0x1c4
   kasan_kmalloc+0xc/0x14
   kmem_cache_alloc_trace+0x1f0/0x2a0
   a6xx_gpu_state_get+0x164/0x25d4
   msm_gpu_crashstate_capture+0xa0/0x84c
   recover_worker+0x328/0x838
   kthread_worker_fn+0x32c/0x574
   kthread+0x2dc/0x39c
   ret_from_fork+0x10/0x18

Fixes: 142639a ("drm/msm/a6xx: fix crashstate capture for A650")
Signed-off-by: Douglas Anderson <[email protected]>
Link: https://lore.kernel.org/r/20211103153049.1.Idfa574ccb529d17b69db3a1852e49b580132035c@changeid
Signed-off-by: Rob Clark <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
kitakar5525 pushed a commit to kitakar5525/linux-kernel that referenced this issue Dec 27, 2021
…port_id()

The array param[] in elantech_change_report_id() must be at least 3
bytes, because elantech_read_reg_params() is calling ps2_command() with
PSMOUSE_CMD_GETINFO, that is going to access 3 bytes from param[], but
it's defined in the stack as an array of 2 bytes, therefore we have a
potential stack out-of-bounds access here, also confirmed by KASAN:

[    6.512374] BUG: KASAN: stack-out-of-bounds in __ps2_command+0x372/0x7e0
[    6.512397] Read of size 1 at addr ffff8881024d77c2 by task kworker/2:1/118

[    6.512416] CPU: 2 PID: 118 Comm: kworker/2:1 Not tainted 5.13.0-22-generic linux-surface#22+arighi20211110
[    6.512428] Hardware name: LENOVO 20T8000QGE/20T8000QGE, BIOS R1AET32W (1.08 ) 08/14/2020
[    6.512436] Workqueue: events_long serio_handle_event
[    6.512453] Call Trace:
[    6.512462]  show_stack+0x52/0x58
[    6.512474]  dump_stack+0xa1/0xd3
[    6.512487]  print_address_description.constprop.0+0x1d/0x140
[    6.512502]  ? __ps2_command+0x372/0x7e0
[    6.512516]  __kasan_report.cold+0x7d/0x112
[    6.512527]  ? _raw_write_lock_irq+0x20/0xd0
[    6.512539]  ? __ps2_command+0x372/0x7e0
[    6.512552]  kasan_report+0x3c/0x50
[    6.512564]  __asan_load1+0x6a/0x70
[    6.512575]  __ps2_command+0x372/0x7e0
[    6.512589]  ? ps2_drain+0x240/0x240
[    6.512601]  ? dev_printk_emit+0xa2/0xd3
[    6.512612]  ? dev_vprintk_emit+0xc5/0xc5
[    6.512621]  ? __kasan_check_write+0x14/0x20
[    6.512634]  ? mutex_lock+0x8f/0xe0
[    6.512643]  ? __mutex_lock_slowpath+0x20/0x20
[    6.512655]  ps2_command+0x52/0x90
[    6.512670]  elantech_ps2_command+0x4f/0xc0 [psmouse]
[    6.512734]  elantech_change_report_id+0x1e6/0x256 [psmouse]
[    6.512799]  ? elantech_report_trackpoint.constprop.0.cold+0xd/0xd [psmouse]
[    6.512863]  ? ps2_command+0x7f/0x90
[    6.512877]  elantech_query_info.cold+0x6bd/0x9ed [psmouse]
[    6.512943]  ? elantech_setup_ps2+0x460/0x460 [psmouse]
[    6.513005]  ? psmouse_reset+0x69/0xb0 [psmouse]
[    6.513064]  ? psmouse_attr_set_helper+0x2a0/0x2a0 [psmouse]
[    6.513122]  ? phys_pmd_init+0x30e/0x521
[    6.513137]  elantech_init+0x8a/0x200 [psmouse]
[    6.513200]  ? elantech_init_ps2+0xf0/0xf0 [psmouse]
[    6.513249]  ? elantech_query_info+0x440/0x440 [psmouse]
[    6.513296]  ? synaptics_send_cmd+0x60/0x60 [psmouse]
[    6.513342]  ? elantech_query_info+0x440/0x440 [psmouse]
[    6.513388]  ? psmouse_try_protocol+0x11e/0x170 [psmouse]
[    6.513432]  psmouse_extensions+0x65d/0x6e0 [psmouse]
[    6.513476]  ? psmouse_try_protocol+0x170/0x170 [psmouse]
[    6.513519]  ? mutex_unlock+0x22/0x40
[    6.513526]  ? ps2_command+0x7f/0x90
[    6.513536]  ? psmouse_probe+0xa3/0xf0 [psmouse]
[    6.513580]  psmouse_switch_protocol+0x27d/0x2e0 [psmouse]
[    6.513624]  psmouse_connect+0x272/0x530 [psmouse]
[    6.513669]  serio_driver_probe+0x55/0x70
[    6.513679]  really_probe+0x190/0x720
[    6.513689]  driver_probe_device+0x160/0x1f0
[    6.513697]  device_driver_attach+0x119/0x130
[    6.513705]  ? device_driver_attach+0x130/0x130
[    6.513713]  __driver_attach+0xe7/0x1a0
[    6.513720]  ? device_driver_attach+0x130/0x130
[    6.513728]  bus_for_each_dev+0xfb/0x150
[    6.513738]  ? subsys_dev_iter_exit+0x10/0x10
[    6.513748]  ? _raw_write_unlock_bh+0x30/0x30
[    6.513757]  driver_attach+0x2d/0x40
[    6.513764]  serio_handle_event+0x199/0x3d0
[    6.513775]  process_one_work+0x471/0x740
[    6.513785]  worker_thread+0x2d2/0x790
[    6.513794]  ? process_one_work+0x740/0x740
[    6.513802]  kthread+0x1b4/0x1e0
[    6.513809]  ? set_kthread_struct+0x80/0x80
[    6.513816]  ret_from_fork+0x22/0x30

[    6.513832] The buggy address belongs to the page:
[    6.513838] page:00000000bc35e189 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1024d7
[    6.513847] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[    6.513860] raw: 0017ffffc0000000 dead000000000100 dead000000000122 0000000000000000
[    6.513867] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[    6.513872] page dumped because: kasan: bad access detected

[    6.513879] addr ffff8881024d77c2 is located in stack of task kworker/2:1/118 at offset 34 in frame:
[    6.513887]  elantech_change_report_id+0x0/0x256 [psmouse]

[    6.513941] this frame has 1 object:
[    6.513947]  [32, 34) 'param'

[    6.513956] Memory state around the buggy address:
[    6.513962]  ffff8881024d7680: f2 f2 f2 f2 f2 00 00 f3 f3 00 00 00 00 00 00 00
[    6.513969]  ffff8881024d7700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    6.513976] >ffff8881024d7780: 00 00 00 00 f1 f1 f1 f1 02 f3 f3 f3 00 00 00 00
[    6.513982]                                            ^
[    6.513988]  ffff8881024d7800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    6.513995]  ffff8881024d7880: 00 f1 f1 f1 f1 03 f2 03 f2 03 f3 f3 f3 00 00 00
[    6.514000] ==================================================================

Define param[] in elantech_change_report_id() as an array of 3 bytes to
prevent the out-of-bounds access in the stack.

Fixes: e4c9062 ("Input: elantech - fix protocol errors for some trackpoints in SMBus mode")
BugLink: https://bugs.launchpad.net/bugs/1945590
Signed-off-by: Andrea Righi <[email protected]>
Reviewed-by: Wolfram Sang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
qzed pushed a commit that referenced this issue Jan 4, 2022
…port_id()

[ Upstream commit 1d72d9f ]

The array param[] in elantech_change_report_id() must be at least 3
bytes, because elantech_read_reg_params() is calling ps2_command() with
PSMOUSE_CMD_GETINFO, that is going to access 3 bytes from param[], but
it's defined in the stack as an array of 2 bytes, therefore we have a
potential stack out-of-bounds access here, also confirmed by KASAN:

[    6.512374] BUG: KASAN: stack-out-of-bounds in __ps2_command+0x372/0x7e0
[    6.512397] Read of size 1 at addr ffff8881024d77c2 by task kworker/2:1/118

[    6.512416] CPU: 2 PID: 118 Comm: kworker/2:1 Not tainted 5.13.0-22-generic #22+arighi20211110
[    6.512428] Hardware name: LENOVO 20T8000QGE/20T8000QGE, BIOS R1AET32W (1.08 ) 08/14/2020
[    6.512436] Workqueue: events_long serio_handle_event
[    6.512453] Call Trace:
[    6.512462]  show_stack+0x52/0x58
[    6.512474]  dump_stack+0xa1/0xd3
[    6.512487]  print_address_description.constprop.0+0x1d/0x140
[    6.512502]  ? __ps2_command+0x372/0x7e0
[    6.512516]  __kasan_report.cold+0x7d/0x112
[    6.512527]  ? _raw_write_lock_irq+0x20/0xd0
[    6.512539]  ? __ps2_command+0x372/0x7e0
[    6.512552]  kasan_report+0x3c/0x50
[    6.512564]  __asan_load1+0x6a/0x70
[    6.512575]  __ps2_command+0x372/0x7e0
[    6.512589]  ? ps2_drain+0x240/0x240
[    6.512601]  ? dev_printk_emit+0xa2/0xd3
[    6.512612]  ? dev_vprintk_emit+0xc5/0xc5
[    6.512621]  ? __kasan_check_write+0x14/0x20
[    6.512634]  ? mutex_lock+0x8f/0xe0
[    6.512643]  ? __mutex_lock_slowpath+0x20/0x20
[    6.512655]  ps2_command+0x52/0x90
[    6.512670]  elantech_ps2_command+0x4f/0xc0 [psmouse]
[    6.512734]  elantech_change_report_id+0x1e6/0x256 [psmouse]
[    6.512799]  ? elantech_report_trackpoint.constprop.0.cold+0xd/0xd [psmouse]
[    6.512863]  ? ps2_command+0x7f/0x90
[    6.512877]  elantech_query_info.cold+0x6bd/0x9ed [psmouse]
[    6.512943]  ? elantech_setup_ps2+0x460/0x460 [psmouse]
[    6.513005]  ? psmouse_reset+0x69/0xb0 [psmouse]
[    6.513064]  ? psmouse_attr_set_helper+0x2a0/0x2a0 [psmouse]
[    6.513122]  ? phys_pmd_init+0x30e/0x521
[    6.513137]  elantech_init+0x8a/0x200 [psmouse]
[    6.513200]  ? elantech_init_ps2+0xf0/0xf0 [psmouse]
[    6.513249]  ? elantech_query_info+0x440/0x440 [psmouse]
[    6.513296]  ? synaptics_send_cmd+0x60/0x60 [psmouse]
[    6.513342]  ? elantech_query_info+0x440/0x440 [psmouse]
[    6.513388]  ? psmouse_try_protocol+0x11e/0x170 [psmouse]
[    6.513432]  psmouse_extensions+0x65d/0x6e0 [psmouse]
[    6.513476]  ? psmouse_try_protocol+0x170/0x170 [psmouse]
[    6.513519]  ? mutex_unlock+0x22/0x40
[    6.513526]  ? ps2_command+0x7f/0x90
[    6.513536]  ? psmouse_probe+0xa3/0xf0 [psmouse]
[    6.513580]  psmouse_switch_protocol+0x27d/0x2e0 [psmouse]
[    6.513624]  psmouse_connect+0x272/0x530 [psmouse]
[    6.513669]  serio_driver_probe+0x55/0x70
[    6.513679]  really_probe+0x190/0x720
[    6.513689]  driver_probe_device+0x160/0x1f0
[    6.513697]  device_driver_attach+0x119/0x130
[    6.513705]  ? device_driver_attach+0x130/0x130
[    6.513713]  __driver_attach+0xe7/0x1a0
[    6.513720]  ? device_driver_attach+0x130/0x130
[    6.513728]  bus_for_each_dev+0xfb/0x150
[    6.513738]  ? subsys_dev_iter_exit+0x10/0x10
[    6.513748]  ? _raw_write_unlock_bh+0x30/0x30
[    6.513757]  driver_attach+0x2d/0x40
[    6.513764]  serio_handle_event+0x199/0x3d0
[    6.513775]  process_one_work+0x471/0x740
[    6.513785]  worker_thread+0x2d2/0x790
[    6.513794]  ? process_one_work+0x740/0x740
[    6.513802]  kthread+0x1b4/0x1e0
[    6.513809]  ? set_kthread_struct+0x80/0x80
[    6.513816]  ret_from_fork+0x22/0x30

[    6.513832] The buggy address belongs to the page:
[    6.513838] page:00000000bc35e189 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1024d7
[    6.513847] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[    6.513860] raw: 0017ffffc0000000 dead000000000100 dead000000000122 0000000000000000
[    6.513867] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[    6.513872] page dumped because: kasan: bad access detected

[    6.513879] addr ffff8881024d77c2 is located in stack of task kworker/2:1/118 at offset 34 in frame:
[    6.513887]  elantech_change_report_id+0x0/0x256 [psmouse]

[    6.513941] this frame has 1 object:
[    6.513947]  [32, 34) 'param'

[    6.513956] Memory state around the buggy address:
[    6.513962]  ffff8881024d7680: f2 f2 f2 f2 f2 00 00 f3 f3 00 00 00 00 00 00 00
[    6.513969]  ffff8881024d7700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    6.513976] >ffff8881024d7780: 00 00 00 00 f1 f1 f1 f1 02 f3 f3 f3 00 00 00 00
[    6.513982]                                            ^
[    6.513988]  ffff8881024d7800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    6.513995]  ffff8881024d7880: 00 f1 f1 f1 f1 03 f2 03 f2 03 f3 f3 f3 00 00 00
[    6.514000] ==================================================================

Define param[] in elantech_change_report_id() as an array of 3 bytes to
prevent the out-of-bounds access in the stack.

Fixes: e4c9062 ("Input: elantech - fix protocol errors for some trackpoints in SMBus mode")
BugLink: https://bugs.launchpad.net/bugs/1945590
Signed-off-by: Andrea Righi <[email protected]>
Reviewed-by: Wolfram Sang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
kitakar5525 pushed a commit to kitakar5525/linux-kernel that referenced this issue Mar 11, 2022
When bringing down the netdevice or system shutdown, a panic can be
triggered while accessing the sysfs path because the device is already
removed.

    [  755.549084] mlx5_core 0000:12:00.1: Shutdown was called
    [  756.404455] mlx5_core 0000:12:00.0: Shutdown was called
    ...
    [  757.937260] BUG: unable to handle kernel NULL pointer dereference at           (null)
    [  758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280

    crash> bt
    ...
    PID: 12649  TASK: ffff8924108f2100  CPU: 1   COMMAND: "amsd"
    ...
     linux-surface#9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778
        [exception RIP: dma_pool_alloc+0x1ab]
        RIP: ffffffff8ee11acb  RSP: ffff89240e1a3968  RFLAGS: 00010046
        RAX: 0000000000000246  RBX: ffff89243d874100  RCX: 0000000000001000
        RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffff89243d874090
        RBP: ffff89240e1a39c0   R8: 000000000001f080   R9: ffff8905ffc03c00
        R10: ffffffffc04680d4  R11: ffffffff8edde9fd  R12: 00000000000080d0
        R13: ffff89243d874090  R14: ffff89243d874080  R15: 0000000000000000
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    linux-surface#10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core]
    linux-surface#11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core]
    linux-surface#12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core]
    linux-surface#13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core]
    linux-surface#14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core]
    linux-surface#15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core]
    linux-surface#16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core]
    linux-surface#17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46
    linux-surface#18 [ffff89240e1a3d48] speed_show at ffffffff8f277208
    linux-surface#19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3
    linux-surface#20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf
    linux-surface#21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596
    linux-surface#22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10
    linux-surface#23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5
    linux-surface#24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff
    linux-surface#25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f
    linux-surface#26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92

    crash> net_device.state ffff89443b0c0000
      state = 0x5  (__LINK_STATE_START| __LINK_STATE_NOCARRIER)

To prevent this scenario, we also make sure that the netdevice is present.

Signed-off-by: suresh kumar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
qzed pushed a commit that referenced this issue Mar 24, 2022
[ Upstream commit 4224cfd ]

When bringing down the netdevice or system shutdown, a panic can be
triggered while accessing the sysfs path because the device is already
removed.

    [  755.549084] mlx5_core 0000:12:00.1: Shutdown was called
    [  756.404455] mlx5_core 0000:12:00.0: Shutdown was called
    ...
    [  757.937260] BUG: unable to handle kernel NULL pointer dereference at           (null)
    [  758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280

    crash> bt
    ...
    PID: 12649  TASK: ffff8924108f2100  CPU: 1   COMMAND: "amsd"
    ...
     #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778
        [exception RIP: dma_pool_alloc+0x1ab]
        RIP: ffffffff8ee11acb  RSP: ffff89240e1a3968  RFLAGS: 00010046
        RAX: 0000000000000246  RBX: ffff89243d874100  RCX: 0000000000001000
        RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffff89243d874090
        RBP: ffff89240e1a39c0   R8: 000000000001f080   R9: ffff8905ffc03c00
        R10: ffffffffc04680d4  R11: ffffffff8edde9fd  R12: 00000000000080d0
        R13: ffff89243d874090  R14: ffff89243d874080  R15: 0000000000000000
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core]
    #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core]
    #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core]
    #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core]
    #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core]
    #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core]
    #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core]
    #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46
    #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208
    #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3
    #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf
    #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596
    #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10
    #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5
    #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff
    #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f
    #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92

    crash> net_device.state ffff89443b0c0000
      state = 0x5  (__LINK_STATE_START| __LINK_STATE_NOCARRIER)

To prevent this scenario, we also make sure that the netdevice is present.

Signed-off-by: suresh kumar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
qzed pushed a commit that referenced this issue Apr 25, 2022
commit 8d65bc9 upstream.

After waking up a suspended VM, the kernel prints the following trace
for virtio drivers which do not directly call virtio_device_ready() in
the .restore:

    PM: suspend exit
    irq 22: nobody cared (try booting with the "irqpoll" option)
    Call Trace:
     <IRQ>
     dump_stack_lvl+0x38/0x49
     dump_stack+0x10/0x12
     __report_bad_irq+0x3a/0xaf
     note_interrupt.cold+0xb/0x60
     handle_irq_event+0x71/0x80
     handle_fasteoi_irq+0x95/0x1e0
     __common_interrupt+0x6b/0x110
     common_interrupt+0x63/0xe0
     asm_common_interrupt+0x1e/0x40
     ? __do_softirq+0x75/0x2f3
     irq_exit_rcu+0x93/0xe0
     sysvec_apic_timer_interrupt+0xac/0xd0
     </IRQ>
     <TASK>
     asm_sysvec_apic_timer_interrupt+0x12/0x20
     arch_cpu_idle+0x12/0x20
     default_idle_call+0x39/0xf0
     do_idle+0x1b5/0x210
     cpu_startup_entry+0x20/0x30
     start_secondary+0xf3/0x100
     secondary_startup_64_no_verify+0xc3/0xcb
     </TASK>
    handlers:
    [<000000008f9bac49>] vp_interrupt
    [<000000008f9bac49>] vp_interrupt
    Disabling IRQ #22

This happens because we don't invoke .enable_cbs callback in
virtio_device_restore(). That callback is used by some transports
(e.g. virtio-pci) to enable interrupts.

Let's fix it, by calling virtio_device_ready() as we do in
virtio_dev_probe(). This function calls .enable_cts callback and sets
DRIVER_OK status bit.

This fix also avoids setting DRIVER_OK twice for those drivers that
call virtio_device_ready() in the .restore.

Fixes: d50497e ("virtio_config: introduce a new .enable_cbs method")
Signed-off-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
qzed pushed a commit that referenced this issue Oct 24, 2022
[ Upstream commit 8cfb085 ]

Li Huafei reports that mcount-based ftrace with module PLTs was broken
by commit:

  a625357 ("arm64: ftrace: consistently handle PLTs.")

When a module PLTs are used and a module is loaded sufficiently far away
from the kernel, we'll create PLTs for any branches which are
out-of-range. These are separate from the special ftrace trampoline
PLTs, which the module PLT code doesn't directly manipulate.

When mcount is in use this is a problem, as each mcount callsite in a
module will be initialized to point to a module PLT, but since commit
a625357 ftrace_make_nop() will assume that the callsite has
been initialized to point to the special ftrace trampoline PLT, and
ftrace_find_callable_addr() rejects other cases.

This means that when ftrace tries to initialize a callsite via
ftrace_make_nop(), the call to ftrace_find_callable_addr() will find
that the `_mcount` stub is out-of-range and is not handled by the ftrace
PLT, resulting in a splat:

| ftrace_test: loading out-of-tree module taints kernel.
| ftrace: no module PLT for _mcount
| ------------[ ftrace bug ]------------
| ftrace failed to modify
| [<ffff800029180014>] 0xffff800029180014
|  actual:   44:00:00:94
| Initializing ftrace call sites
| ftrace record flags: 2000000
|  (0)
|  expected tramp: ffff80000802eb3c
| ------------[ cut here ]------------
| WARNING: CPU: 3 PID: 157 at kernel/trace/ftrace.c:2120 ftrace_bug+0x94/0x270
| Modules linked in:
| CPU: 3 PID: 157 Comm: insmod Tainted: G           O       6.0.0-rc6-00151-gcd722513a189-dirty #22
| Hardware name: linux,dummy-virt (DT)
| pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : ftrace_bug+0x94/0x270
| lr : ftrace_bug+0x21c/0x270
| sp : ffff80000b2bbaf0
| x29: ffff80000b2bbaf0 x28: 0000000000000000 x27: ffff0000c4d38000
| x26: 0000000000000001 x25: ffff800009d7e000 x24: ffff0000c4d86e00
| x23: 0000000002000000 x22: ffff80000a62b000 x21: ffff8000098ebea8
| x20: ffff0000c4d38000 x19: ffff80000aa24158 x18: ffffffffffffffff
| x17: 0000000000000000 x16: 0a0d2d2d2d2d2d2d x15: ffff800009aa9118
| x14: 0000000000000000 x13: 6333626532303830 x12: 3030303866666666
| x11: 203a706d61727420 x10: 6465746365707865 x9 : 3362653230383030
| x8 : c0000000ffffefff x7 : 0000000000017fe8 x6 : 000000000000bff4
| x5 : 0000000000057fa8 x4 : 0000000000000000 x3 : 0000000000000001
| x2 : ad2cb14bb5438900 x1 : 0000000000000000 x0 : 0000000000000022
| Call trace:
|  ftrace_bug+0x94/0x270
|  ftrace_process_locs+0x308/0x430
|  ftrace_module_init+0x44/0x60
|  load_module+0x15b4/0x1ce8
|  __do_sys_init_module+0x1ec/0x238
|  __arm64_sys_init_module+0x24/0x30
|  invoke_syscall+0x54/0x118
|  el0_svc_common.constprop.4+0x84/0x100
|  do_el0_svc+0x3c/0xd0
|  el0_svc+0x1c/0x50
|  el0t_64_sync_handler+0x90/0xb8
|  el0t_64_sync+0x15c/0x160
| ---[ end trace 0000000000000000 ]---
| ---------test_init-----------

Fix this by reverting to the old behaviour of ignoring the old
instruction when initialising an mcount callsite in a module, which was
the behaviour prior to commit a625357.

Signed-off-by: Mark Rutland <[email protected]>
Fixes: a625357 ("arm64: ftrace: consistently handle PLTs.")
Reported-by: Li Huafei <[email protected]>
Link: https://lore.kernel.org/linux-arm-kernel/[email protected]
Cc: Ard Biesheuvel <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
qzed pushed a commit that referenced this issue Nov 13, 2022
Tests for races between shinfo_cache (de)activation and hypercall+ioctl()
processing.  KVM has had bugs where activating the shared info cache
multiple times and/or with concurrent users results in lock corruption,
NULL pointer dereferences, and other fun.

For the timer injection testcase (#22), re-arm the timer until the IRQ
is successfully injected.  If the timer expires while the shared info
is deactivated (invalid), KVM will drop the event.

Signed-off-by: Michal Luczaj <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
qzed pushed a commit that referenced this issue Apr 19, 2023
[ Upstream commit 4e264be ]

When a system with E810 with existing VFs gets rebooted the following
hang may be observed.

 Pid 1 is hung in iavf_remove(), part of a network driver:
 PID: 1        TASK: ffff965400e5a340  CPU: 24   COMMAND: "systemd-shutdow"
  #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb
  #1 [ffffaad04005fae8] schedule at ffffffff8b323e2d
  #2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc
  #3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930
  #4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf]
  #5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513
  #6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa
  #7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc
  #8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e
  #9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429
 #10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4
 #11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice]
 #12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice]
 #13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice]
 #14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1
 #15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386
 #16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870
 #17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6
 #18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159
 #19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc
 #20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d
 #21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169
 #22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b
     RIP: 00007f1baa5c13d7  RSP: 00007fffbcc55a98  RFLAGS: 00000202
     RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007f1baa5c13d7
     RDX: 0000000001234567  RSI: 0000000028121969  RDI: 00000000fee1dead
     RBP: 00007fffbcc55ca0   R8: 0000000000000000   R9: 00007fffbcc54e90
     R10: 00007fffbcc55050  R11: 0000000000000202  R12: 0000000000000005
     R13: 0000000000000000  R14: 00007fffbcc55af0  R15: 0000000000000000
     ORIG_RAX: 00000000000000a9  CS: 0033  SS: 002b

During reboot all drivers PM shutdown callbacks are invoked.
In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE.
In ice_shutdown() the call chain above is executed, which at some point
calls iavf_remove(). However iavf_remove() expects the VF to be in one
of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If
that's not the case it sleeps forever.
So if iavf_shutdown() gets invoked before iavf_remove() the system will
hang indefinitely because the adapter is already in state __IAVF_REMOVE.

Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE,
as we already went through iavf_shutdown().

Fixes: 9745780 ("iavf: Add waiting so the port is initialized in remove")
Fixes: a841733 ("iavf: Fix race condition between iavf_shutdown and iavf_remove")
Reported-by: Marius Cornea <[email protected]>
Signed-off-by: Stefan Assmann <[email protected]>
Reviewed-by: Michal Kubiak <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
qzed pushed a commit that referenced this issue Apr 20, 2023
[ Upstream commit 4e264be ]

When a system with E810 with existing VFs gets rebooted the following
hang may be observed.

 Pid 1 is hung in iavf_remove(), part of a network driver:
 PID: 1        TASK: ffff965400e5a340  CPU: 24   COMMAND: "systemd-shutdow"
  #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb
  #1 [ffffaad04005fae8] schedule at ffffffff8b323e2d
  #2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc
  #3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930
  #4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf]
  #5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513
  #6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa
  #7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc
  #8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e
  #9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429
 #10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4
 #11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice]
 #12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice]
 #13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice]
 #14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1
 #15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386
 #16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870
 #17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6
 #18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159
 #19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc
 #20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d
 #21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169
 #22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b
     RIP: 00007f1baa5c13d7  RSP: 00007fffbcc55a98  RFLAGS: 00000202
     RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007f1baa5c13d7
     RDX: 0000000001234567  RSI: 0000000028121969  RDI: 00000000fee1dead
     RBP: 00007fffbcc55ca0   R8: 0000000000000000   R9: 00007fffbcc54e90
     R10: 00007fffbcc55050  R11: 0000000000000202  R12: 0000000000000005
     R13: 0000000000000000  R14: 00007fffbcc55af0  R15: 0000000000000000
     ORIG_RAX: 00000000000000a9  CS: 0033  SS: 002b

During reboot all drivers PM shutdown callbacks are invoked.
In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE.
In ice_shutdown() the call chain above is executed, which at some point
calls iavf_remove(). However iavf_remove() expects the VF to be in one
of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If
that's not the case it sleeps forever.
So if iavf_shutdown() gets invoked before iavf_remove() the system will
hang indefinitely because the adapter is already in state __IAVF_REMOVE.

Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE,
as we already went through iavf_shutdown().

Fixes: 9745780 ("iavf: Add waiting so the port is initialized in remove")
Fixes: a841733 ("iavf: Fix race condition between iavf_shutdown and iavf_remove")
Reported-by: Marius Cornea <[email protected]>
Signed-off-by: Stefan Assmann <[email protected]>
Reviewed-by: Michal Kubiak <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
qzed pushed a commit that referenced this issue May 3, 2023
When a system with E810 with existing VFs gets rebooted the following
hang may be observed.

 Pid 1 is hung in iavf_remove(), part of a network driver:
 PID: 1        TASK: ffff965400e5a340  CPU: 24   COMMAND: "systemd-shutdow"
  #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb
  #1 [ffffaad04005fae8] schedule at ffffffff8b323e2d
  #2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc
  #3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930
  #4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf]
  #5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513
  #6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa
  #7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc
  #8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e
  #9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429
 #10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4
 #11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice]
 #12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice]
 #13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice]
 #14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1
 #15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386
 #16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870
 #17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6
 #18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159
 #19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc
 #20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d
 #21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169
 #22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b
     RIP: 00007f1baa5c13d7  RSP: 00007fffbcc55a98  RFLAGS: 00000202
     RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007f1baa5c13d7
     RDX: 0000000001234567  RSI: 0000000028121969  RDI: 00000000fee1dead
     RBP: 00007fffbcc55ca0   R8: 0000000000000000   R9: 00007fffbcc54e90
     R10: 00007fffbcc55050  R11: 0000000000000202  R12: 0000000000000005
     R13: 0000000000000000  R14: 00007fffbcc55af0  R15: 0000000000000000
     ORIG_RAX: 00000000000000a9  CS: 0033  SS: 002b

During reboot all drivers PM shutdown callbacks are invoked.
In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE.
In ice_shutdown() the call chain above is executed, which at some point
calls iavf_remove(). However iavf_remove() expects the VF to be in one
of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If
that's not the case it sleeps forever.
So if iavf_shutdown() gets invoked before iavf_remove() the system will
hang indefinitely because the adapter is already in state __IAVF_REMOVE.

Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE,
as we already went through iavf_shutdown().

Fixes: 9745780 ("iavf: Add waiting so the port is initialized in remove")
Fixes: a841733 ("iavf: Fix race condition between iavf_shutdown and iavf_remove")
Reported-by: Marius Cornea <[email protected]>
Signed-off-by: Stefan Assmann <[email protected]>
Reviewed-by: Michal Kubiak <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
qzed pushed a commit that referenced this issue Jun 14, 2023
[ Upstream commit 37c3b9f ]

The cited commit adds a compeletion to remove dependency on rtnl
lock. But it causes a deadlock for multiple encapsulations:

 crash> bt ffff8aece8a64000
 PID: 1514557  TASK: ffff8aece8a64000  CPU: 3    COMMAND: "tc"
  #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418
  #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898
  #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8
  #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb
  #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core]
  #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core]
  #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core]
  #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core]
  #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core]
 #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core]
 #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core]
 #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8
 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower]
 #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower]
 #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047
 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31
 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853
 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835
 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27
 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245
 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482
 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a
 #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2
 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2
 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f
 #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8
 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c
 crash> bt 0xffff8aeb07544000
 PID: 1110766  TASK: ffff8aeb07544000  CPU: 0    COMMAND: "kworker/u20:9"
  #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418
  #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88
  #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b
  #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core]
  #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core]
  #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core]
  #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c
  #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012
  #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d
 #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f

After the first encap is attached, flow will be added to encap
entry's flows list. If neigh update is running at this time, the
following encaps of the flow can't hold the encap_tbl_lock and
sleep. If neigh update thread is waiting for that flow's init_done,
deadlock happens.

Fix it by holding lock outside of the for loop. If neigh update is
running, prevent encap flows from offloading. Since the lock is held
outside of the for loop, concurrent creation of encap entries is not
allowed. So remove unnecessary wait_for_completion call for res_ready.

Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update")
Signed-off-by: Chris Mi <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Reviewed-by: Vlad Buslov <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
qzed pushed a commit that referenced this issue Sep 24, 2023
commit 0b0747d upstream.

The following processes run into a deadlock. CPU 41 was waiting for CPU 29
to handle a CSD request while holding spinlock "crashdump_lock", but CPU 29
was hung by that spinlock with IRQs disabled.

  PID: 17360    TASK: ffff95c1090c5c40  CPU: 41  COMMAND: "mrdiagd"
  !# 0 [ffffb80edbf37b58] __read_once_size at ffffffff9b871a40 include/linux/compiler.h:185:0
  !# 1 [ffffb80edbf37b58] atomic_read at ffffffff9b871a40 arch/x86/include/asm/atomic.h:27:0
  !# 2 [ffffb80edbf37b58] dump_stack at ffffffff9b871a40 lib/dump_stack.c:54:0
   # 3 [ffffb80edbf37b78] csd_lock_wait_toolong at ffffffff9b131ad5 kernel/smp.c:364:0
   # 4 [ffffb80edbf37b78] __csd_lock_wait at ffffffff9b131ad5 kernel/smp.c:384:0
   # 5 [ffffb80edbf37bf8] csd_lock_wait at ffffffff9b13267a kernel/smp.c:394:0
   # 6 [ffffb80edbf37bf8] smp_call_function_many at ffffffff9b13267a kernel/smp.c:843:0
   # 7 [ffffb80edbf37c50] smp_call_function at ffffffff9b13279d kernel/smp.c:867:0
   # 8 [ffffb80edbf37c50] on_each_cpu at ffffffff9b13279d kernel/smp.c:976:0
   # 9 [ffffb80edbf37c78] flush_tlb_kernel_range at ffffffff9b085c4b arch/x86/mm/tlb.c:742:0
   #10 [ffffb80edbf37cb8] __purge_vmap_area_lazy at ffffffff9b23a1e0 mm/vmalloc.c:701:0
   #11 [ffffb80edbf37ce0] try_purge_vmap_area_lazy at ffffffff9b23a2cc mm/vmalloc.c:722:0
   #12 [ffffb80edbf37ce0] free_vmap_area_noflush at ffffffff9b23a2cc mm/vmalloc.c:754:0
   #13 [ffffb80edbf37cf8] free_unmap_vmap_area at ffffffff9b23bb3b mm/vmalloc.c:764:0
   #14 [ffffb80edbf37cf8] remove_vm_area at ffffffff9b23bb3b mm/vmalloc.c:1509:0
   #15 [ffffb80edbf37d18] __vunmap at ffffffff9b23bb8a mm/vmalloc.c:1537:0
   #16 [ffffb80edbf37d40] vfree at ffffffff9b23bc85 mm/vmalloc.c:1612:0
   #17 [ffffb80edbf37d58] megasas_free_host_crash_buffer [megaraid_sas] at ffffffffc020b7f2 drivers/scsi/megaraid/megaraid_sas_fusion.c:3932:0
   #18 [ffffb80edbf37d80] fw_crash_state_store [megaraid_sas] at ffffffffc01f804d drivers/scsi/megaraid/megaraid_sas_base.c:3291:0
   #19 [ffffb80edbf37dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0
   #20 [ffffb80edbf37dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0
   #21 [ffffb80edbf37de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0
   #22 [ffffb80edbf37e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0
   #23 [ffffb80edbf37ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0
   #24 [ffffb80edbf37ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0
   #25 [ffffb80edbf37ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0
   #26 [ffffb80edbf37f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0
   #27 [ffffb80edbf37f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0

  PID: 17355    TASK: ffff95c1090c3d80  CPU: 29  COMMAND: "mrdiagd"
  !# 0 [ffffb80f2d3c7d30] __read_once_size at ffffffff9b0f2ab0 include/linux/compiler.h:185:0
  !# 1 [ffffb80f2d3c7d30] native_queued_spin_lock_slowpath at ffffffff9b0f2ab0 kernel/locking/qspinlock.c:368:0
   # 2 [ffffb80f2d3c7d58] pv_queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/paravirt.h:674:0
   # 3 [ffffb80f2d3c7d58] queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/qspinlock.h:53:0
   # 4 [ffffb80f2d3c7d68] queued_spin_lock at ffffffff9b8961a6 include/asm-generic/qspinlock.h:90:0
   # 5 [ffffb80f2d3c7d68] do_raw_spin_lock_flags at ffffffff9b8961a6 include/linux/spinlock.h:173:0
   # 6 [ffffb80f2d3c7d68] __raw_spin_lock_irqsave at ffffffff9b8961a6 include/linux/spinlock_api_smp.h:122:0
   # 7 [ffffb80f2d3c7d68] _raw_spin_lock_irqsave at ffffffff9b8961a6 kernel/locking/spinlock.c:160:0
   # 8 [ffffb80f2d3c7d88] fw_crash_buffer_store [megaraid_sas] at ffffffffc01f8129 drivers/scsi/megaraid/megaraid_sas_base.c:3205:0
   # 9 [ffffb80f2d3c7dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0
   #10 [ffffb80f2d3c7dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0
   #11 [ffffb80f2d3c7de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0
   #12 [ffffb80f2d3c7e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0
   #13 [ffffb80f2d3c7ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0
   #14 [ffffb80f2d3c7ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0
   #15 [ffffb80f2d3c7ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0
   #16 [ffffb80f2d3c7f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0
   #17 [ffffb80f2d3c7f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0

The lock is used to synchronize different sysfs operations, it doesn't
protect any resource that will be touched by an interrupt. Consequently
it's not required to disable IRQs. Replace the spinlock with a mutex to fix
the deadlock.

Signed-off-by: Junxiao Bi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mike Christie <[email protected]>
Cc: [email protected]
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
qzed pushed a commit that referenced this issue Sep 24, 2023
commit 0b0747d upstream.

The following processes run into a deadlock. CPU 41 was waiting for CPU 29
to handle a CSD request while holding spinlock "crashdump_lock", but CPU 29
was hung by that spinlock with IRQs disabled.

  PID: 17360    TASK: ffff95c1090c5c40  CPU: 41  COMMAND: "mrdiagd"
  !# 0 [ffffb80edbf37b58] __read_once_size at ffffffff9b871a40 include/linux/compiler.h:185:0
  !# 1 [ffffb80edbf37b58] atomic_read at ffffffff9b871a40 arch/x86/include/asm/atomic.h:27:0
  !# 2 [ffffb80edbf37b58] dump_stack at ffffffff9b871a40 lib/dump_stack.c:54:0
   # 3 [ffffb80edbf37b78] csd_lock_wait_toolong at ffffffff9b131ad5 kernel/smp.c:364:0
   # 4 [ffffb80edbf37b78] __csd_lock_wait at ffffffff9b131ad5 kernel/smp.c:384:0
   # 5 [ffffb80edbf37bf8] csd_lock_wait at ffffffff9b13267a kernel/smp.c:394:0
   # 6 [ffffb80edbf37bf8] smp_call_function_many at ffffffff9b13267a kernel/smp.c:843:0
   # 7 [ffffb80edbf37c50] smp_call_function at ffffffff9b13279d kernel/smp.c:867:0
   # 8 [ffffb80edbf37c50] on_each_cpu at ffffffff9b13279d kernel/smp.c:976:0
   # 9 [ffffb80edbf37c78] flush_tlb_kernel_range at ffffffff9b085c4b arch/x86/mm/tlb.c:742:0
   #10 [ffffb80edbf37cb8] __purge_vmap_area_lazy at ffffffff9b23a1e0 mm/vmalloc.c:701:0
   #11 [ffffb80edbf37ce0] try_purge_vmap_area_lazy at ffffffff9b23a2cc mm/vmalloc.c:722:0
   #12 [ffffb80edbf37ce0] free_vmap_area_noflush at ffffffff9b23a2cc mm/vmalloc.c:754:0
   #13 [ffffb80edbf37cf8] free_unmap_vmap_area at ffffffff9b23bb3b mm/vmalloc.c:764:0
   #14 [ffffb80edbf37cf8] remove_vm_area at ffffffff9b23bb3b mm/vmalloc.c:1509:0
   #15 [ffffb80edbf37d18] __vunmap at ffffffff9b23bb8a mm/vmalloc.c:1537:0
   #16 [ffffb80edbf37d40] vfree at ffffffff9b23bc85 mm/vmalloc.c:1612:0
   #17 [ffffb80edbf37d58] megasas_free_host_crash_buffer [megaraid_sas] at ffffffffc020b7f2 drivers/scsi/megaraid/megaraid_sas_fusion.c:3932:0
   #18 [ffffb80edbf37d80] fw_crash_state_store [megaraid_sas] at ffffffffc01f804d drivers/scsi/megaraid/megaraid_sas_base.c:3291:0
   #19 [ffffb80edbf37dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0
   #20 [ffffb80edbf37dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0
   #21 [ffffb80edbf37de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0
   #22 [ffffb80edbf37e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0
   #23 [ffffb80edbf37ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0
   #24 [ffffb80edbf37ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0
   #25 [ffffb80edbf37ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0
   #26 [ffffb80edbf37f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0
   #27 [ffffb80edbf37f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0

  PID: 17355    TASK: ffff95c1090c3d80  CPU: 29  COMMAND: "mrdiagd"
  !# 0 [ffffb80f2d3c7d30] __read_once_size at ffffffff9b0f2ab0 include/linux/compiler.h:185:0
  !# 1 [ffffb80f2d3c7d30] native_queued_spin_lock_slowpath at ffffffff9b0f2ab0 kernel/locking/qspinlock.c:368:0
   # 2 [ffffb80f2d3c7d58] pv_queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/paravirt.h:674:0
   # 3 [ffffb80f2d3c7d58] queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/qspinlock.h:53:0
   # 4 [ffffb80f2d3c7d68] queued_spin_lock at ffffffff9b8961a6 include/asm-generic/qspinlock.h:90:0
   # 5 [ffffb80f2d3c7d68] do_raw_spin_lock_flags at ffffffff9b8961a6 include/linux/spinlock.h:173:0
   # 6 [ffffb80f2d3c7d68] __raw_spin_lock_irqsave at ffffffff9b8961a6 include/linux/spinlock_api_smp.h:122:0
   # 7 [ffffb80f2d3c7d68] _raw_spin_lock_irqsave at ffffffff9b8961a6 kernel/locking/spinlock.c:160:0
   # 8 [ffffb80f2d3c7d88] fw_crash_buffer_store [megaraid_sas] at ffffffffc01f8129 drivers/scsi/megaraid/megaraid_sas_base.c:3205:0
   # 9 [ffffb80f2d3c7dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0
   #10 [ffffb80f2d3c7dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0
   #11 [ffffb80f2d3c7de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0
   #12 [ffffb80f2d3c7e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0
   #13 [ffffb80f2d3c7ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0
   #14 [ffffb80f2d3c7ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0
   #15 [ffffb80f2d3c7ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0
   #16 [ffffb80f2d3c7f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0
   #17 [ffffb80f2d3c7f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0

The lock is used to synchronize different sysfs operations, it doesn't
protect any resource that will be touched by an interrupt. Consequently
it's not required to disable IRQs. Replace the spinlock with a mutex to fix
the deadlock.

Signed-off-by: Junxiao Bi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Mike Christie <[email protected]>
Cc: [email protected]
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
iwanders pushed a commit to iwanders/linux-surface-kernel that referenced this issue Mar 2, 2024
When configuring a hugetlb filesystem via the fsconfig() syscall, there is
a possible NULL dereference in hugetlbfs_fill_super() caused by assigning
NULL to ctx->hstate in hugetlbfs_parse_param() when the requested pagesize
is non valid.

E.g: Taking the following steps:

     fd = fsopen("hugetlbfs", FSOPEN_CLOEXEC);
     fsconfig(fd, FSCONFIG_SET_STRING, "pagesize", "1024", 0);
     fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);

Given that the requested "pagesize" is invalid, ctxt->hstate will be replaced
with NULL, losing its previous value, and we will print an error:

 ...
 ...
 case Opt_pagesize:
 ps = memparse(param->string, &rest);
 ctx->hstate = h;
 if (!ctx->hstate) {
         pr_err("Unsupported page size %lu MB\n", ps / SZ_1M);
         return -EINVAL;
 }
 return 0;
 ...
 ...

This is a problem because later on, we will dereference ctxt->hstate in
hugetlbfs_fill_super()

 ...
 ...
 sb->s_blocksize = huge_page_size(ctx->hstate);
 ...
 ...

Causing below Oops.

Fix this by replacing cxt->hstate value only when then pagesize is known
to be valid.

 kernel: hugetlbfs: Unsupported page size 0 MB
 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028
 kernel: #PF: supervisor read access in kernel mode
 kernel: #PF: error_code(0x0000) - not-present page
 kernel: PGD 800000010f66c067 P4D 800000010f66c067 PUD 1b22f8067 PMD 0
 kernel: Oops: 0000 [linux-surface#1] PREEMPT SMP PTI
 kernel: CPU: 4 PID: 5659 Comm: syscall Tainted: G            E      6.8.0-rc2-default+ linux-surface#22 5a47c3fef76212addcc6eb71344aabc35190ae8f
 kernel: Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017
 kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0
 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28
 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246
 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004
 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000
 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004
 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000
 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400
 kernel: FS:  00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000
 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0
 kernel: Call Trace:
 kernel:  <TASK>
 kernel:  ? __die_body+0x1a/0x60
 kernel:  ? page_fault_oops+0x16f/0x4a0
 kernel:  ? search_bpf_extables+0x65/0x70
 kernel:  ? fixup_exception+0x22/0x310
 kernel:  ? exc_page_fault+0x69/0x150
 kernel:  ? asm_exc_page_fault+0x22/0x30
 kernel:  ? __pfx_hugetlbfs_fill_super+0x10/0x10
 kernel:  ? hugetlbfs_fill_super+0xb4/0x1a0
 kernel:  ? hugetlbfs_fill_super+0x28/0x1a0
 kernel:  ? __pfx_hugetlbfs_fill_super+0x10/0x10
 kernel:  vfs_get_super+0x40/0xa0
 kernel:  ? __pfx_bpf_lsm_capable+0x10/0x10
 kernel:  vfs_get_tree+0x25/0xd0
 kernel:  vfs_cmd_create+0x64/0xe0
 kernel:  __x64_sys_fsconfig+0x395/0x410
 kernel:  do_syscall_64+0x80/0x160
 kernel:  ? syscall_exit_to_user_mode+0x82/0x240
 kernel:  ? do_syscall_64+0x8d/0x160
 kernel:  ? syscall_exit_to_user_mode+0x82/0x240
 kernel:  ? do_syscall_64+0x8d/0x160
 kernel:  ? exc_page_fault+0x69/0x150
 kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 kernel: RIP: 0033:0x7ffbc0cb87c9
 kernel: Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 96 0d 00 f7 d8 64 89 01 48
 kernel: RSP: 002b:00007ffc29d2f388 EFLAGS: 00000206 ORIG_RAX: 00000000000001af
 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffbc0cb87c9
 kernel: RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
 kernel: RBP: 00007ffc29d2f3b0 R08: 0000000000000000 R09: 0000000000000000
 kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
 kernel: R13: 00007ffc29d2f4c0 R14: 0000000000000000 R15: 0000000000000000
 kernel:  </TASK>
 kernel: Modules linked in: rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) sunrpc(E) netfs(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) intel_rapl_msr(E) intel_rapl_common(E) iTCO_wdt(E) intel_pmc_bxt(E) sb_edac(E) iTCO_vendor_support(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) rfkill(E) ipmi_ssif(E) kvm(E) acpi_ipmi(E) irqbypass(E) pcspkr(E) igb(E) ipmi_si(E) mei_me(E) i2c_i801(E) joydev(E) intel_pch_thermal(E) i2c_smbus(E) dca(E) lpc_ich(E) mei(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_pad(E) tiny_power_button(E) button(E) fuse(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) ext4(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sd_mod(E) t10_pi(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) polyval_clmulni(E) ahci(E) xhci_pci(E) polyval_generic(E) gf128mul(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) xhci_pci_renesas(E) libahci(E) ehci_pci(E) sha1_ssse3(E) xhci_hcd(E) ehci_hcd(E) libata(E)
 kernel:  mgag200(E) i2c_algo_bit(E) usbcore(E) wmi(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E) aesni_intel(E) crypto_simd(E) cryptd(E)
 kernel: Unloaded tainted modules: acpi_cpufreq(E):1 fjes(E):1
 kernel: CR2: 0000000000000028
 kernel: ---[ end trace 0000000000000000 ]---
 kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0
 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28
 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246
 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004
 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000
 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004
 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000
 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400
 kernel: FS:  00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000
 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 3202198 ("hugetlbfs: Convert to fs_context")
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Oscar Salvador <[email protected]>
Acked-by: Muchun Song <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
StollD pushed a commit that referenced this issue Mar 9, 2024
[ Upstream commit c20f482 ]

We call bnxt_half_open_nic() to setup the chip partially to run
loopback tests.  The rings and buffers are initialized normally
so that we can transmit and receive packets in loopback mode.
That means page pool buffers are allocated for the aggregation ring
just like the normal case.  NAPI is not needed because we are just
polling for the loopback packets.

When we're done with the loopback tests, we call bnxt_half_close_nic()
to clean up.  When freeing the page pools, we hit a WARN_ON()
in page_pool_unlink_napi() because the NAPI state linked to the
page pool is uninitialized.

The simplest way to avoid this warning is just to initialize the
NAPIs during half open and delete the NAPIs during half close.
Trying to skip the page pool initialization or skip linking of
NAPI during half open will be more complicated.

This fix avoids this warning:

WARNING: CPU: 4 PID: 46967 at net/core/page_pool.c:946 page_pool_unlink_napi+0x1f/0x30
CPU: 4 PID: 46967 Comm: ethtool Tainted: G S      W          6.7.0-rc5+ #22
Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.3.8 08/31/2021
RIP: 0010:page_pool_unlink_napi+0x1f/0x30
Code: 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 8b 47 18 48 85 c0 74 1b 48 8b 50 10 83 e2 01 74 08 8b 40 34 83 f8 ff 74 02 <0f> 0b 48 c7 47 18 00 00 00 00 c3 cc cc cc cc 66 90 90 90 90 90 90
RSP: 0018:ffa000003d0dfbe8 EFLAGS: 00010246
RAX: ff110003607ce640 RBX: ff110010baf5d000 RCX: 0000000000000008
RDX: 0000000000000000 RSI: ff110001e5e522c0 RDI: ff110010baf5d000
RBP: ff11000145539b40 R08: 0000000000000001 R09: ffffffffc063f641
R10: ff110001361eddb8 R11: 000000000040000f R12: 0000000000000001
R13: 000000000000001c R14: ff1100014553a080 R15: 0000000000003fc0
FS:  00007f9301c4f740(0000) GS:ff1100103fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f91344fa8f0 CR3: 00000003527cc005 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 ? __warn+0x81/0x140
 ? page_pool_unlink_napi+0x1f/0x30
 ? report_bug+0x102/0x200
 ? handle_bug+0x44/0x70
 ? exc_invalid_op+0x13/0x60
 ? asm_exc_invalid_op+0x16/0x20
 ? bnxt_free_ring.isra.123+0xb1/0xd0 [bnxt_en]
 ? page_pool_unlink_napi+0x1f/0x30
 page_pool_destroy+0x3e/0x150
 bnxt_free_mem+0x441/0x5e0 [bnxt_en]
 bnxt_half_close_nic+0x2a/0x40 [bnxt_en]
 bnxt_self_test+0x21d/0x450 [bnxt_en]
 __dev_ethtool+0xeda/0x2e30
 ? native_queued_spin_lock_slowpath+0x17f/0x2b0
 ? __link_object+0xa1/0x160
 ? _raw_spin_unlock_irqrestore+0x23/0x40
 ? __create_object+0x5f/0x90
 ? __kmem_cache_alloc_node+0x317/0x3c0
 ? dev_ethtool+0x59/0x170
 dev_ethtool+0xa7/0x170
 dev_ioctl+0xc3/0x530
 sock_do_ioctl+0xa8/0xf0
 sock_ioctl+0x270/0x310
 __x64_sys_ioctl+0x8c/0xc0
 do_syscall_64+0x3e/0xf0
 entry_SYSCALL_64_after_hwframe+0x6e/0x76

Fixes: 294e39e ("bnxt: hook NAPIs to page pools")
Reviewed-by: Andy Gospodarek <[email protected]>
Reviewed-by: Ajit Khaparde <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
StollD pushed a commit that referenced this issue Mar 9, 2024
commit 79d72c6 upstream.

When configuring a hugetlb filesystem via the fsconfig() syscall, there is
a possible NULL dereference in hugetlbfs_fill_super() caused by assigning
NULL to ctx->hstate in hugetlbfs_parse_param() when the requested pagesize
is non valid.

E.g: Taking the following steps:

     fd = fsopen("hugetlbfs", FSOPEN_CLOEXEC);
     fsconfig(fd, FSCONFIG_SET_STRING, "pagesize", "1024", 0);
     fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);

Given that the requested "pagesize" is invalid, ctxt->hstate will be replaced
with NULL, losing its previous value, and we will print an error:

 ...
 ...
 case Opt_pagesize:
 ps = memparse(param->string, &rest);
 ctx->hstate = h;
 if (!ctx->hstate) {
         pr_err("Unsupported page size %lu MB\n", ps / SZ_1M);
         return -EINVAL;
 }
 return 0;
 ...
 ...

This is a problem because later on, we will dereference ctxt->hstate in
hugetlbfs_fill_super()

 ...
 ...
 sb->s_blocksize = huge_page_size(ctx->hstate);
 ...
 ...

Causing below Oops.

Fix this by replacing cxt->hstate value only when then pagesize is known
to be valid.

 kernel: hugetlbfs: Unsupported page size 0 MB
 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028
 kernel: #PF: supervisor read access in kernel mode
 kernel: #PF: error_code(0x0000) - not-present page
 kernel: PGD 800000010f66c067 P4D 800000010f66c067 PUD 1b22f8067 PMD 0
 kernel: Oops: 0000 [#1] PREEMPT SMP PTI
 kernel: CPU: 4 PID: 5659 Comm: syscall Tainted: G            E      6.8.0-rc2-default+ #22 5a47c3fef76212addcc6eb71344aabc35190ae8f
 kernel: Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017
 kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0
 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28
 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246
 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004
 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000
 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004
 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000
 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400
 kernel: FS:  00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000
 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0
 kernel: Call Trace:
 kernel:  <TASK>
 kernel:  ? __die_body+0x1a/0x60
 kernel:  ? page_fault_oops+0x16f/0x4a0
 kernel:  ? search_bpf_extables+0x65/0x70
 kernel:  ? fixup_exception+0x22/0x310
 kernel:  ? exc_page_fault+0x69/0x150
 kernel:  ? asm_exc_page_fault+0x22/0x30
 kernel:  ? __pfx_hugetlbfs_fill_super+0x10/0x10
 kernel:  ? hugetlbfs_fill_super+0xb4/0x1a0
 kernel:  ? hugetlbfs_fill_super+0x28/0x1a0
 kernel:  ? __pfx_hugetlbfs_fill_super+0x10/0x10
 kernel:  vfs_get_super+0x40/0xa0
 kernel:  ? __pfx_bpf_lsm_capable+0x10/0x10
 kernel:  vfs_get_tree+0x25/0xd0
 kernel:  vfs_cmd_create+0x64/0xe0
 kernel:  __x64_sys_fsconfig+0x395/0x410
 kernel:  do_syscall_64+0x80/0x160
 kernel:  ? syscall_exit_to_user_mode+0x82/0x240
 kernel:  ? do_syscall_64+0x8d/0x160
 kernel:  ? syscall_exit_to_user_mode+0x82/0x240
 kernel:  ? do_syscall_64+0x8d/0x160
 kernel:  ? exc_page_fault+0x69/0x150
 kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 kernel: RIP: 0033:0x7ffbc0cb87c9
 kernel: Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 96 0d 00 f7 d8 64 89 01 48
 kernel: RSP: 002b:00007ffc29d2f388 EFLAGS: 00000206 ORIG_RAX: 00000000000001af
 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffbc0cb87c9
 kernel: RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
 kernel: RBP: 00007ffc29d2f3b0 R08: 0000000000000000 R09: 0000000000000000
 kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
 kernel: R13: 00007ffc29d2f4c0 R14: 0000000000000000 R15: 0000000000000000
 kernel:  </TASK>
 kernel: Modules linked in: rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) sunrpc(E) netfs(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) intel_rapl_msr(E) intel_rapl_common(E) iTCO_wdt(E) intel_pmc_bxt(E) sb_edac(E) iTCO_vendor_support(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) rfkill(E) ipmi_ssif(E) kvm(E) acpi_ipmi(E) irqbypass(E) pcspkr(E) igb(E) ipmi_si(E) mei_me(E) i2c_i801(E) joydev(E) intel_pch_thermal(E) i2c_smbus(E) dca(E) lpc_ich(E) mei(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_pad(E) tiny_power_button(E) button(E) fuse(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) ext4(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sd_mod(E) t10_pi(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) polyval_clmulni(E) ahci(E) xhci_pci(E) polyval_generic(E) gf128mul(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) xhci_pci_renesas(E) libahci(E) ehci_pci(E) sha1_ssse3(E) xhci_hcd(E) ehci_hcd(E) libata(E)
 kernel:  mgag200(E) i2c_algo_bit(E) usbcore(E) wmi(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E) aesni_intel(E) crypto_simd(E) cryptd(E)
 kernel: Unloaded tainted modules: acpi_cpufreq(E):1 fjes(E):1
 kernel: CR2: 0000000000000028
 kernel: ---[ end trace 0000000000000000 ]---
 kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0
 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28
 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246
 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004
 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000
 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004
 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000
 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400
 kernel: FS:  00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000
 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 3202198 ("hugetlbfs: Convert to fs_context")
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Oscar Salvador <[email protected]>
Acked-by: Muchun Song <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
AquaBx pushed a commit to AquaBx/kernel that referenced this issue Mar 23, 2024
When the skb is reorganized during esp_output (!esp->inline), the pages
coming from the original skb fragments are supposed to be released back
to the system through put_page. But if the skb fragment pages are
originating from a page_pool, calling put_page on them will trigger a
page_pool leak which will eventually result in a crash.

This leak can be easily observed when using CONFIG_DEBUG_VM and doing
ipsec + gre (non offloaded) forwarding:

  BUG: Bad page state in process ksoftirqd/16  pfn:1451b6
  page:00000000de2b8d32 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1451b6000 pfn:0x1451b6
  flags: 0x200000000000000(node=0|zone=2)
  page_type: 0xffffffff()
  raw: 0200000000000000 dead000000000040 ffff88810d23c000 0000000000000000
  raw: 00000001451b6000 0000000000000001 00000000ffffffff 0000000000000000
  page dumped because: page_pool leak
  Modules linked in: ip_gre gre mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat xt_addrtype br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core overlay zram zsmalloc fuse [last unloaded: mlx5_core]
  CPU: 16 PID: 96 Comm: ksoftirqd/16 Not tainted 6.8.0-rc4+ linux-surface#22
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x36/0x50
   bad_page+0x70/0xf0
   free_unref_page_prepare+0x27a/0x460
   free_unref_page+0x38/0x120
   esp_ssg_unref.isra.0+0x15f/0x200
   esp_output_tail+0x66d/0x780
   esp_xmit+0x2c5/0x360
   validate_xmit_xfrm+0x313/0x370
   ? validate_xmit_skb+0x1d/0x330
   validate_xmit_skb_list+0x4c/0x70
   sch_direct_xmit+0x23e/0x350
   __dev_queue_xmit+0x337/0xba0
   ? nf_hook_slow+0x3f/0xd0
   ip_finish_output2+0x25e/0x580
   iptunnel_xmit+0x19b/0x240
   ip_tunnel_xmit+0x5fb/0xb60
   ipgre_xmit+0x14d/0x280 [ip_gre]
   dev_hard_start_xmit+0xc3/0x1c0
   __dev_queue_xmit+0x208/0xba0
   ? nf_hook_slow+0x3f/0xd0
   ip_finish_output2+0x1ca/0x580
   ip_sublist_rcv_finish+0x32/0x40
   ip_sublist_rcv+0x1b2/0x1f0
   ? ip_rcv_finish_core.constprop.0+0x460/0x460
   ip_list_rcv+0x103/0x130
   __netif_receive_skb_list_core+0x181/0x1e0
   netif_receive_skb_list_internal+0x1b3/0x2c0
   napi_gro_receive+0xc8/0x200
   gro_cell_poll+0x52/0x90
   __napi_poll+0x25/0x1a0
   net_rx_action+0x28e/0x300
   __do_softirq+0xc3/0x276
   ? sort_range+0x20/0x20
   run_ksoftirqd+0x1e/0x30
   smpboot_thread_fn+0xa6/0x130
   kthread+0xcd/0x100
   ? kthread_complete_and_exit+0x20/0x20
   ret_from_fork+0x31/0x50
   ? kthread_complete_and_exit+0x20/0x20
   ret_from_fork_asm+0x11/0x20
   </TASK>

The suggested fix is to introduce a new wrapper (skb_page_unref) that
covers page refcounting for page_pool pages as well.

Cc: [email protected]
Fixes: 6a5bcd8 ("page_pool: Allow drivers to hint on SKB recycling")
Reported-and-tested-by: Anatoli N.Chechelnickiy <[email protected]>
Reported-by: Ian Kumlien <[email protected]>
Link: https://lore.kernel.org/netdev/CAA85sZvvHtrpTQRqdaOx6gd55zPAVsqMYk_Lwh4Md5knTq7AyA@mail.gmail.com
Signed-off-by: Dragos Tatulea <[email protected]>
Reviewed-by: Mina Almasry <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Acked-by: Ilias Apalodimas <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
StollD pushed a commit that referenced this issue Apr 16, 2024
[ Upstream commit c319882 ]

When the skb is reorganized during esp_output (!esp->inline), the pages
coming from the original skb fragments are supposed to be released back
to the system through put_page. But if the skb fragment pages are
originating from a page_pool, calling put_page on them will trigger a
page_pool leak which will eventually result in a crash.

This leak can be easily observed when using CONFIG_DEBUG_VM and doing
ipsec + gre (non offloaded) forwarding:

  BUG: Bad page state in process ksoftirqd/16  pfn:1451b6
  page:00000000de2b8d32 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1451b6000 pfn:0x1451b6
  flags: 0x200000000000000(node=0|zone=2)
  page_type: 0xffffffff()
  raw: 0200000000000000 dead000000000040 ffff88810d23c000 0000000000000000
  raw: 00000001451b6000 0000000000000001 00000000ffffffff 0000000000000000
  page dumped because: page_pool leak
  Modules linked in: ip_gre gre mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat xt_addrtype br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core overlay zram zsmalloc fuse [last unloaded: mlx5_core]
  CPU: 16 PID: 96 Comm: ksoftirqd/16 Not tainted 6.8.0-rc4+ #22
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x36/0x50
   bad_page+0x70/0xf0
   free_unref_page_prepare+0x27a/0x460
   free_unref_page+0x38/0x120
   esp_ssg_unref.isra.0+0x15f/0x200
   esp_output_tail+0x66d/0x780
   esp_xmit+0x2c5/0x360
   validate_xmit_xfrm+0x313/0x370
   ? validate_xmit_skb+0x1d/0x330
   validate_xmit_skb_list+0x4c/0x70
   sch_direct_xmit+0x23e/0x350
   __dev_queue_xmit+0x337/0xba0
   ? nf_hook_slow+0x3f/0xd0
   ip_finish_output2+0x25e/0x580
   iptunnel_xmit+0x19b/0x240
   ip_tunnel_xmit+0x5fb/0xb60
   ipgre_xmit+0x14d/0x280 [ip_gre]
   dev_hard_start_xmit+0xc3/0x1c0
   __dev_queue_xmit+0x208/0xba0
   ? nf_hook_slow+0x3f/0xd0
   ip_finish_output2+0x1ca/0x580
   ip_sublist_rcv_finish+0x32/0x40
   ip_sublist_rcv+0x1b2/0x1f0
   ? ip_rcv_finish_core.constprop.0+0x460/0x460
   ip_list_rcv+0x103/0x130
   __netif_receive_skb_list_core+0x181/0x1e0
   netif_receive_skb_list_internal+0x1b3/0x2c0
   napi_gro_receive+0xc8/0x200
   gro_cell_poll+0x52/0x90
   __napi_poll+0x25/0x1a0
   net_rx_action+0x28e/0x300
   __do_softirq+0xc3/0x276
   ? sort_range+0x20/0x20
   run_ksoftirqd+0x1e/0x30
   smpboot_thread_fn+0xa6/0x130
   kthread+0xcd/0x100
   ? kthread_complete_and_exit+0x20/0x20
   ret_from_fork+0x31/0x50
   ? kthread_complete_and_exit+0x20/0x20
   ret_from_fork_asm+0x11/0x20
   </TASK>

The suggested fix is to introduce a new wrapper (skb_page_unref) that
covers page refcounting for page_pool pages as well.

Cc: [email protected]
Fixes: 6a5bcd8 ("page_pool: Allow drivers to hint on SKB recycling")
Reported-and-tested-by: Anatoli N.Chechelnickiy <[email protected]>
Reported-by: Ian Kumlien <[email protected]>
Link: https://lore.kernel.org/netdev/CAA85sZvvHtrpTQRqdaOx6gd55zPAVsqMYk_Lwh4Md5knTq7AyA@mail.gmail.com
Signed-off-by: Dragos Tatulea <[email protected]>
Reviewed-by: Mina Almasry <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Acked-by: Ilias Apalodimas <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
qzed pushed a commit that referenced this issue Jun 30, 2024
[ Upstream commit 769e6a1 ]

ui_browser__show() is capturing the input title that is stack allocated
memory in hist_browser__run().

Avoid a use after return by strdup-ing the string.

Committer notes:

Further explanation from Ian Rogers:

My command line using tui is:
$ sudo bash -c 'rm /tmp/asan.log*; export
ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a
sleep 1; /tmp/perf/perf mem report'
I then go to the perf annotate view and quit. This triggers the asan
error (from the log file):
```
==1254591==ERROR: AddressSanitizer: stack-use-after-return on address
0x7f2813331920 at pc 0x7f28180
65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10
READ of size 80 at 0x7f2813331920 thread T0
    #0 0x7f2818065990 in __interceptor_strlen
../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461
    #1 0x7f2817698251 in SLsmg_write_wrapped_string
(/lib/x86_64-linux-gnu/libslang.so.2+0x98251)
    #2 0x7f28176984b9 in SLsmg_write_nstring
(/lib/x86_64-linux-gnu/libslang.so.2+0x984b9)
    #3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60
    #4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266
    #5 0x55c94045c776 in ui_browser__show ui/browser.c:288
    #6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206
    #7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458
    #8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412
    #9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527
    #10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613
    #11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661
    #12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671
    #13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141
    #14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805
    #15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374
    #16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516
    #17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350
    #18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403
    #19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447
    #20 0x55c9400e53ad in main tools/perf/perf.c:561
    #21 0x7f28170456c9 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
    #22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360
    #23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId:
84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93)

Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame
    #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746

  This frame has 1 object(s):
    [32, 192) 'title' (line 747) <== Memory access at offset 32 is
inside this variable
HINT: this may be a false positive if your program uses some custom
stack unwind mechanism, swapcontext or vfork
```
hist_browser__run isn't on the stack so the asan error looks legit.
There's no clean init/exit on struct ui_browser so I may be trading a
use-after-return for a memory leak, but that seems look a good trade
anyway.

Fixes: 05e8b08 ("perf ui browser: Stop using 'self'")
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Ben Gainey <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Li Dong <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Paran Lee <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Sun Haiyong <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Yanteng Si <[email protected]>
Cc: Yicong Yang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
qzed pushed a commit that referenced this issue Jun 30, 2024
…uddy pages

commit 8cf360b upstream.

When I did memory failure tests recently, below panic occurs:

page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
raw: 06fffe0000000000 dead000000000100 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000009 00000000ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(!PageBuddy(page))
------------[ cut here ]------------
kernel BUG at include/linux/page-flags.h:1009!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:__del_page_from_free_list+0x151/0x180
RSP: 0018:ffffa49c90437998 EFLAGS: 00000046
RAX: 0000000000000035 RBX: 0000000000000009 RCX: ffff8dd8dfd1c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff8dd8dfd1c9c0
RBP: ffffd901233b8000 R08: ffffffffab5511f8 R09: 0000000000008c69
R10: 0000000000003c15 R11: ffffffffab5511f8 R12: ffff8dd8fffc0c80
R13: 0000000000000001 R14: ffff8dd8fffc0c80 R15: 0000000000000009
FS:  00007ff916304740(0000) GS:ffff8dd8dfd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055eae50124c8 CR3: 00000008479e0000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 __rmqueue_pcplist+0x23b/0x520
 get_page_from_freelist+0x26b/0xe40
 __alloc_pages_noprof+0x113/0x1120
 __folio_alloc_noprof+0x11/0xb0
 alloc_buddy_hugetlb_folio.isra.0+0x5a/0x130
 __alloc_fresh_hugetlb_folio+0xe7/0x140
 alloc_pool_huge_folio+0x68/0x100
 set_max_huge_pages+0x13d/0x340
 hugetlb_sysctl_handler_common+0xe8/0x110
 proc_sys_call_handler+0x194/0x280
 vfs_write+0x387/0x550
 ksys_write+0x64/0xe0
 do_syscall_64+0xc2/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff916114887
RSP: 002b:00007ffec8a2fd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000055eae500e350 RCX: 00007ff916114887
RDX: 0000000000000004 RSI: 000055eae500e390 RDI: 0000000000000003
RBP: 000055eae50104c0 R08: 0000000000000000 R09: 000055eae50104c0
R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000004
R13: 0000000000000004 R14: 00007ff916216b80 R15: 00007ff916216a00
 </TASK>
Modules linked in: mce_inject hwpoison_inject
---[ end trace 0000000000000000 ]---

And before the panic, there had an warning about bad page state:

BUG: Bad page state in process page-types  pfn:8cee00
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
page_type: 0xffffff7f(buddy)
raw: 06fffe0000000000 ffffd901241c0008 ffffd901240f8008 0000000000000000
raw: 0000000000000000 0000000000000009 00000000ffffff7f 0000000000000000
page dumped because: nonzero mapcount
Modules linked in: mce_inject hwpoison_inject
CPU: 8 PID: 154211 Comm: page-types Not tainted 6.9.0-rc4-00499-g5544ec3178e2-dirty #22
Call Trace:
 <TASK>
 dump_stack_lvl+0x83/0xa0
 bad_page+0x63/0xf0
 free_unref_page+0x36e/0x5c0
 unpoison_memory+0x50b/0x630
 simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
 debugfs_attr_write+0x42/0x60
 full_proxy_write+0x5b/0x80
 vfs_write+0xcd/0x550
 ksys_write+0x64/0xe0
 do_syscall_64+0xc2/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f189a514887
RSP: 002b:00007ffdcd899718 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f189a514887
RDX: 0000000000000009 RSI: 00007ffdcd899730 RDI: 0000000000000003
RBP: 00007ffdcd8997a0 R08: 0000000000000000 R09: 00007ffdcd8994b2
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcda199a8
R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f189a7a5040
 </TASK>

The root cause should be the below race:

 memory_failure
  try_memory_failure_hugetlb
   me_huge_page
    __page_handle_poison
     dissolve_free_hugetlb_folio
     drain_all_pages -- Buddy page can be isolated e.g. for compaction.
     take_page_off_buddy -- Failed as page is not in the buddy list.
	     -- Page can be putback into buddy after compaction.
    page_ref_inc -- Leads to buddy page with refcnt = 1.

Then unpoison_memory() can unpoison the page and send the buddy page back
into buddy list again leading to the above bad page state warning.  And
bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy
page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to
allocate this page.

Fix this issue by only treating __page_handle_poison() as successful when
it returns 1.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: ceaf8fb ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage")
Signed-off-by: Miaohe Lin <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: IPTS Area: Issue in IPTS driver C: missing feature Category: Missing feature D: Surface Laptop 3 Device: Surface Laptop 3 D: Surface Pro 7 Device: Surface Pro 7 P: medium Priority: Medium priority
Projects
None yet
Development

No branches or pull requests

9 participants