-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NVIDIA: SAUCE: iommu/arm-smmu-v3: Allow default substream bypass with… #22
Open
jamieNguyenNVIDIA
wants to merge
128
commits into
NVIDIA:24.04_linux-nvidia
Choose a base branch
from
jamieNguyenNVIDIA:jamien/SAUCE_pasid
base: 24.04_linux-nvidia
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
NVIDIA: SAUCE: iommu/arm-smmu-v3: Allow default substream bypass with… #22
jamieNguyenNVIDIA
wants to merge
128
commits into
NVIDIA:24.04_linux-nvidia
from
jamieNguyenNVIDIA:jamien/SAUCE_pasid
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Ian May <[email protected]>
…dversion" This reverts commit 47d27f2. We need to revert this to avoid regressing any modules used in Jammy. Signed-off-by: Ian May <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1786013 Signed-off-by: Ian May <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1786013 Signed-off-by: Ian May <[email protected]>
Ignore: yes Signed-off-by: Ian May <[email protected]>
Signed-off-by: Ian May <[email protected]>
Signed-off-by: Ian May <[email protected]>
Ignore: yes Signed-off-by: Ian May <[email protected]>
Ignore: yes Signed-off-by: Ian May <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2038972 Properties: no-test-build Signed-off-by: Ian May <[email protected]>
Ignore: yes Signed-off-by: Ian May <[email protected]>
Signed-off-by: Ian May <[email protected]>
Ignore: yes Signed-off-by: Paolo Pisati <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2046137 Properties: no-test-build Signed-off-by: Paolo Pisati <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1786013 Signed-off-by: Paolo Pisati <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1786013 Signed-off-by: Paolo Pisati <[email protected]>
Ignore: yes Signed-off-by: Andrea Righi <[email protected]>
Signed-off-by: Paolo Pisati <[email protected]>
Signed-off-by: Paolo Pisati <[email protected]>
Signed-off-by: Paolo Pisati <[email protected]>
Ignore: yes Signed-off-by: Andrea Righi <[email protected]>
Ignore: yes Signed-off-by: Andrea Righi <[email protected]>
Ignore: yes Signed-off-by: Andrea Righi <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2055128 Properties: no-test-build Signed-off-by: Andrea Righi <[email protected]>
…ain/d2024.02.07) BugLink: https://bugs.launchpad.net/bugs/1786013 Signed-off-by: Andrea Righi <[email protected]>
Signed-off-by: Andrea Righi <[email protected]>
Signed-off-by: Andrea Righi <[email protected]>
Ignore: yes Signed-off-by: Andrea Righi <[email protected]>
Signed-off-by: Andrea Righi <[email protected]>
Ignore: yes Signed-off-by: Jacob Martin <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2068306 Properties: no-test-build Signed-off-by: Jacob Martin <[email protected]>
Signed-off-by: Jacob Martin <[email protected]>
Ignore: yes Signed-off-by: Jacob Martin <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2071967 Properties: no-test-build Signed-off-by: Jacob Martin <[email protected]>
Signed-off-by: Jacob Martin <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2069777 Use u8, u32 and u64 instead of respective uintN_t types. Remove unnecessary newlines for function argument lists. Signed-off-by: Shravan Kumar Ramani <[email protected]> Link: https://lore.kernel.org/r/39be055af3506ce6f843d11e45d71620f2a96e26.1707808180.git.shravankr@nvidia.com Reviewed-by: Ilpo Järvinen <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]> (cherry picked from commit fd23023) Signed-off-by: David Thompson <[email protected]> Acked-by: Brad Figg <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off-by: Brad Figg <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2069777 Use unsigned integer types for register values and array indices. Use %u instead of %d accordingly. Signed-off-by: Shravan Kumar Ramani <[email protected]> Link: https://lore.kernel.org/r/d8548c70339a29258a906b2b518e5c48f669795c.1707808180.git.shravankr@nvidia.com Reviewed-by: Ilpo Järvinen <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]> (cherry picked from commit 1ae9ffd) Signed-off-by: David Thompson <[email protected]> Acked-by: Brad Figg <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off-by: Brad Figg <[email protected]>
…ptional BugLink: https://bugs.launchpad.net/bugs/2069777 The mlxbf_pmc_event_list() function returns a pointer to an array of supported events and the array size. The array size is returned via a pointer passed as an argument, which is mandatory. However, we want to be able to use mlxbf_pmc_event_list() just to check if a block name is implemented/supported. For this usage passing the size argument is not necessary so let's make it optional. Signed-off-by: Luiz Capitulino <[email protected]> Reviewed-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/182de8ec6b9c33152f2ba6b248c35b0311abf5e4.1708635408.git.luizcap@redhat.com Reviewed-by: Ilpo Järvinen <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]> (cherry picked from commit 0d46439) Signed-off-by: David Thompson <[email protected]> Acked-by: Brad Figg <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off-by: Brad Figg <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2069777 Currently, the driver has two behaviors to deal with new & unsupported performance blocks reported by the firmware: 1. For register and unknown block types, the driver will fail to load with the following error message: [ 4510.956369] mlxbf-pmc: probe of MLNXBFD2:00 failed with error -22 2. For counter and crspace blocks, the driver will load and sysfs files will be created but getting the contents of event_list or trying to setup the counter will fail Instead, let's ignore and log unsupported blocks. This means the driver will always load and unsupported blocks will never show up in sysfs. Signed-off-by: Luiz Capitulino <[email protected]> Reviewed-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/f8e2e6210b43e825b69824b420c801cd513d401d.1708635408.git.luizcap@redhat.com Reviewed-by: Ilpo Järvinen <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]> (cherry picked from commit c0459ee) Signed-off-by: David Thompson <[email protected]> Acked-by: Brad Figg <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off-by: Brad Figg <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2069777 These need to be signed for the error handling to work. The mlxbf_pmc_get_event_num() function returns int so int type is correct. Fixes: 1ae9ffd ("platform/mellanox: mlxbf-pmc: Cleanup signed/unsigned mix-up") Signed-off-by: Dan Carpenter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Ilpo Järvinen <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]> (cherry picked from commit 7c8772f) Signed-off-by: David Thompson <[email protected]> Acked-by: Brad Figg <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off-by: Brad Figg <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2071654 We enumerate devices by attempting config reads to the Vendor ID of each possible device. On conventional PCI, if no device responds, the read terminates with a Master Abort (PCI r3.0, sec 6.1). On PCIe, the config read is terminated as an Unsupported Request (PCIe r6.0, sec 2.3.2, 7.5.1.3.7). In either case, if the read addressed a device below a bridge, it is logged by setting "Received Master Abort" in the bridge Secondary Status register. Clear any errors logged in the Secondary Status register after enumeration. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vidya Sagar <[email protected]> [bhelgaas: simplify commit log] Signed-off-by: Bjorn Helgaas <[email protected]> (cherry picked from commit 7bf9d2a) Signed-off-by: Jamie Nguyen <[email protected]> Acked-by: Brad Figg <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off-by: Brad Figg <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2071655 Start switching iomap_copy routines over to use #define and arch provided inline/macro functions instead of weak symbols. Inline functions allow more compiler optimization and this is often a driver hot path. x86 has the only weak implementation for __iowrite32_copy(), so replace it with a static inline containing the same single instruction inline assembly. The compiler will generate the "mov edx,ecx" in a more optimal way. Remove iomap_copy_64.S Link: https://lore.kernel.org/r/[email protected] Acked-by: Arnd Bergmann <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> (cherry picked from commit 20516d6) Signed-off-by: Jamie Nguyen <[email protected]> Acked-by: Brad Figg <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off-by: Brad Figg <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2071655 It is trivial to implement an inline to do this, so provide it in the s390 headers. Like the 64 bit version it should just invoke zpci_memcpy_toio() with the correct size. Link: https://lore.kernel.org/r/[email protected] Acked-by: Niklas Schnelle <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> (cherry picked from commit 6ae798c) Signed-off-by: Jamie Nguyen <[email protected]> Acked-by: Brad Figg <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off-by: Brad Figg <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2071655 Complete switching the __iowriteXX_copy() routines over to use #define and arch provided inline/macro functions instead of weak symbols. S390 has an implementation that simply calls another memcpy function. Inline this so the callers don't have to do two jumps. Link: https://lore.kernel.org/r/[email protected] Acked-by: Niklas Schnelle <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> (cherry picked from commit e7bc47b) Signed-off-by: Jamie Nguyen <[email protected]> Acked-by: Brad Figg <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off-by: Brad Figg <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2071655 The kernel provides driver support for using write combining IO memory through the __iowriteXX_copy() API which is commonly used as an optional optimization to generate 16/32/64 byte MemWr TLPs in a PCIe environment. iomap_copy.c provides a generic implementation as a simple 4/8 byte at a time copy loop that has worked well with past ARM64 CPUs, giving a high frequency of large TLPs being successfully formed. However modern ARM64 CPUs are quite sensitive to how the write combining CPU HW is operated and a compiler generated loop with intermixed load/store is not sufficient to frequently generate a large TLP. The CPUs would like to see the entire TLP generated by consecutive store instructions from registers. Compilers like gcc tend to intermix loads and stores and have poor code generation, in part, due to the ARM64 situation that writeq() does not codegen anything other than "[xN]". However even with that resolved compilers like clang still do not have good code generation. This means on modern ARM64 CPUs the rate at which __iowriteXX_copy() successfully generates large TLPs is very small (less than 1 in 10,000) tries), to the point that the use of WC is pointless. Implement __iowrite32/64_copy() specifically for ARM64 and use inline assembly to build consecutive blocks of STR instructions. Provide direct support for 64/32/16 large TLP generation in this manner. Optimize for common constant lengths so that the compiler can directly inline the store blocks. This brings the frequency of large TLP generation up to a high level that is comparable with older CPU generations. As the __iowriteXX_copy() family of APIs is intended for use with WC incorporate the DGH hint directly into the function. Link: https://lore.kernel.org/r/[email protected] Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Mark Rutland <[email protected]> Cc: [email protected] Cc: [email protected] Reviewed-by: Catalin Marinas <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> (cherry picked from commit ead7911) Signed-off-by: Jamie Nguyen <[email protected]> Acked-by: Brad Figg <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off-by: Brad Figg <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2071655 Now that the ARM64 arch implementation does the DGH as part of __iowrite64_copy() there is no reason to open code this in drivers. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jijie Shao<[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> (cherry picked from commit 2b7a5e1) Signed-off-by: Jamie Nguyen <[email protected]> Acked-by: Brad Figg <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off-by: Brad Figg <[email protected]>
Ignore: yes Signed-off-by: Jacob Martin <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2072186 Properties: no-test-build Signed-off-by: Jacob Martin <[email protected]>
Signed-off-by: Jacob Martin <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2073811 PCIe ACS settings control the level of isolation and the possible P2P paths between devices. With greater isolation the kernel will create smaller iommu_groups and with less isolation there is more HW that can achieve P2P transfers. From a virtualization perspective all devices in the same iommu_group must be assigned to the same VM as they lack security isolation. There is no way for the kernel to automatically know the correct ACS settings for any given system and workload. Existing command line options (e.g., disable_acs_redir) allow only for large scale change, disabling all isolation, but this is not sufficient for more complex cases. Add a kernel command-line option 'config_acs' to directly control all the ACS bits for specific devices, which allows the operator to setup the right level of isolation to achieve the desired P2P configuration. The definition is future proof; when new ACS bits are added to the spec the open syntax can be extended. ACS needs to be setup early in the kernel boot as the ACS settings affect how iommu_groups are formed. iommu_group formation is a one time event during initial device discovery, so changing ACS bits after kernel boot can result in an inaccurate view of the iommu_groups compared to the current isolation configuration. ACS applies to PCIe Downstream Ports and multi-function devices. The default ACS settings are strict and deny any direct traffic between two functions. This results in the smallest iommu_group the HW can support. Frequently these values result in slow or non-working P2PDMA. ACS offers a range of security choices controlling how traffic is allowed to go directly between two devices. Some popular choices: - Full prevention - Translated requests can be direct, with various options - Asymmetric direct traffic, A can reach B but not the reverse - All traffic can be direct Along with some other less common ones for special topologies. The intention is that this option would be used with expert knowledge of the HW capability and workload to achieve the desired configuration. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vidya Sagar <[email protected]> [bhelgaas: add example, tidy printk formats] Signed-off-by: Bjorn Helgaas <[email protected]> (cherry picked from commit 47c8846) Signed-off-by: Jamie Nguyen <[email protected]> Acked-by: Brad Figg <[email protected]> Acked-by: Jacob Martin <[email protected]> Acked-by: Noah Wager <[email protected]> Signed-off-by: Brad Figg <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2075396 Commit 3bd786f ("mm: convert do_set_pte() to set_pte_range()") replaced do_set_pte() with set_pte_range() and that introduced a regression in the following faulting path of non-anonymous vmas which caused the PTE for the faulting address to be marked as old instead of young. handle_pte_fault() do_pte_missing() do_fault() do_read_fault() || do_cow_fault() || do_shared_fault() finish_fault() set_pte_range() The polarity of prefault calculation is incorrect. This leads to prefault being incorrectly set for the faulting address. The following check will incorrectly mark the PTE old rather than young. On some architectures this will cause a double fault to mark it young when the access is retried. if (prefault && arch_wants_old_prefaulted_pte()) entry = pte_mkold(entry); On a subsequent fault on the same address, the faulting path will see a non NULL vmf->pte and instead of reaching the do_pte_missing() path, PTE will then be correctly marked young in handle_pte_fault() itself. Due to this bug, performance degradation in the fault handling path will be observed due to unnecessary double faulting. Link: https://lkml.kernel.org/r/[email protected] Fixes: 3bd786f ("mm: convert do_set_pte() to set_pte_range()") Signed-off-by: Ram Tummala <[email protected]> Reviewed-by: Yin Fengwei <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Yin Fengwei <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (backported from commit 4cd7ba1) [context changes] Signed-off-by: Jamie Nguyen <[email protected]> Tested-by: Carol Soto <[email protected]> Acked-by: Brad Figg <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off-by: Brad Figg <[email protected]>
… a pasid support When an iommu_domain is set to IOMMU_DOMAIN_IDENTITY, the driver would skip the allocation of a CD table and set the CONFIG field of the STE to STRTAB_STE_0_CFG_BYPASS. This works well for devices that only have one substream, i.e. PASID disabled. However, there could be a use case, for a pasid capable device, that allows bypassing the translation at the default substream while still enabling the pasid feature, which means the driver should not skip the allocation of a CD table nor simply bypass the CONFIG field. Instead, the S1DSS field should be set to STRTAB_STE_1_S1DSS_BYPASS and the SHCFG field should be set to STRTAB_STE_1_SHCFG_INCOMING. Add s1dss in struct arm_smmu_s1_cfg, to allow a configuration in the finalise() to support this use case. Also, according to "13.5 Summary of attribute/permission configuration fields" in the reference manual, the SHCFG field value is irrelevant. So, set the SHCFG field of the STE always to STRTAB_STE_1_SHCFG_INCOMING for simplification. Signed-off-by: Nicolin Chen <[email protected]> Reviewed-by: Pritesh Raithatha <[email protected]> Signed-off-by: Jamie Nguyen <[email protected]> Tested-by: Matt Ochs <[email protected]>
|
nvidia-bfigg
force-pushed
the
24.04_linux-nvidia
branch
from
August 10, 2024 15:00
29686e1
to
3200be9
Compare
nvidia-bfigg
force-pushed
the
24.04_linux-nvidia
branch
from
August 17, 2024 15:00
3200be9
to
6ad2527
Compare
nvidia-bfigg
pushed a commit
that referenced
this pull request
Aug 17, 2024
BugLink: https://bugs.launchpad.net/bugs/2073603 [ Upstream commit 769e6a1 ] ui_browser__show() is capturing the input title that is stack allocated memory in hist_browser__run(). Avoid a use after return by strdup-ing the string. Committer notes: Further explanation from Ian Rogers: My command line using tui is: $ sudo bash -c 'rm /tmp/asan.log*; export ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a sleep 1; /tmp/perf/perf mem report' I then go to the perf annotate view and quit. This triggers the asan error (from the log file): ``` ==1254591==ERROR: AddressSanitizer: stack-use-after-return on address 0x7f2813331920 at pc 0x7f28180 65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10 READ of size 80 at 0x7f2813331920 thread T0 #0 0x7f2818065990 in __interceptor_strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461 #1 0x7f2817698251 in SLsmg_write_wrapped_string (/lib/x86_64-linux-gnu/libslang.so.2+0x98251) #2 0x7f28176984b9 in SLsmg_write_nstring (/lib/x86_64-linux-gnu/libslang.so.2+0x984b9) #3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60 #4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266 #5 0x55c94045c776 in ui_browser__show ui/browser.c:288 #6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206 #7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458 #8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412 #9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527 #10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613 #11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661 #12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671 #13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141 #14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805 #15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374 #16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516 #17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350 #18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403 #19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447 #20 0x55c9400e53ad in main tools/perf/perf.c:561 #21 0x7f28170456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360 #23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId: 84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93) Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746 This frame has 1 object(s): [32, 192) 'title' (line 747) <== Memory access at offset 32 is inside this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork ``` hist_browser__run isn't on the stack so the asan error looks legit. There's no clean init/exit on struct ui_browser so I may be trading a use-after-return for a memory leak, but that seems look a good trade anyway. Fixes: 05e8b08 ("perf ui browser: Stop using 'self'") Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yanteng Si <[email protected]> Cc: Yicong Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Portia Stephens <[email protected]> Signed-off-by: Roxana Nicolescu <[email protected]>
nvidia-bfigg
pushed a commit
that referenced
this pull request
Aug 17, 2024
…uddy pages BugLink: https://bugs.launchpad.net/bugs/2073788 commit 8cf360b upstream. When I did memory failure tests recently, below panic occurs: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00 flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff) raw: 06fffe0000000000 dead000000000100 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000009 00000000ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(!PageBuddy(page)) ------------[ cut here ]------------ kernel BUG at include/linux/page-flags.h:1009! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:__del_page_from_free_list+0x151/0x180 RSP: 0018:ffffa49c90437998 EFLAGS: 00000046 RAX: 0000000000000035 RBX: 0000000000000009 RCX: ffff8dd8dfd1c9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff8dd8dfd1c9c0 RBP: ffffd901233b8000 R08: ffffffffab5511f8 R09: 0000000000008c69 R10: 0000000000003c15 R11: ffffffffab5511f8 R12: ffff8dd8fffc0c80 R13: 0000000000000001 R14: ffff8dd8fffc0c80 R15: 0000000000000009 FS: 00007ff916304740(0000) GS:ffff8dd8dfd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055eae50124c8 CR3: 00000008479e0000 CR4: 00000000000006f0 Call Trace: <TASK> __rmqueue_pcplist+0x23b/0x520 get_page_from_freelist+0x26b/0xe40 __alloc_pages_noprof+0x113/0x1120 __folio_alloc_noprof+0x11/0xb0 alloc_buddy_hugetlb_folio.isra.0+0x5a/0x130 __alloc_fresh_hugetlb_folio+0xe7/0x140 alloc_pool_huge_folio+0x68/0x100 set_max_huge_pages+0x13d/0x340 hugetlb_sysctl_handler_common+0xe8/0x110 proc_sys_call_handler+0x194/0x280 vfs_write+0x387/0x550 ksys_write+0x64/0xe0 do_syscall_64+0xc2/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ff916114887 RSP: 002b:00007ffec8a2fd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000055eae500e350 RCX: 00007ff916114887 RDX: 0000000000000004 RSI: 000055eae500e390 RDI: 0000000000000003 RBP: 000055eae50104c0 R08: 0000000000000000 R09: 000055eae50104c0 R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000004 R13: 0000000000000004 R14: 00007ff916216b80 R15: 00007ff916216a00 </TASK> Modules linked in: mce_inject hwpoison_inject ---[ end trace 0000000000000000 ]--- And before the panic, there had an warning about bad page state: BUG: Bad page state in process page-types pfn:8cee00 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00 flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff) page_type: 0xffffff7f(buddy) raw: 06fffe0000000000 ffffd901241c0008 ffffd901240f8008 0000000000000000 raw: 0000000000000000 0000000000000009 00000000ffffff7f 0000000000000000 page dumped because: nonzero mapcount Modules linked in: mce_inject hwpoison_inject CPU: 8 PID: 154211 Comm: page-types Not tainted 6.9.0-rc4-00499-g5544ec3178e2-dirty #22 Call Trace: <TASK> dump_stack_lvl+0x83/0xa0 bad_page+0x63/0xf0 free_unref_page+0x36e/0x5c0 unpoison_memory+0x50b/0x630 simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110 debugfs_attr_write+0x42/0x60 full_proxy_write+0x5b/0x80 vfs_write+0xcd/0x550 ksys_write+0x64/0xe0 do_syscall_64+0xc2/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f189a514887 RSP: 002b:00007ffdcd899718 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f189a514887 RDX: 0000000000000009 RSI: 00007ffdcd899730 RDI: 0000000000000003 RBP: 00007ffdcd8997a0 R08: 0000000000000000 R09: 00007ffdcd8994b2 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcda199a8 R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f189a7a5040 </TASK> The root cause should be the below race: memory_failure try_memory_failure_hugetlb me_huge_page __page_handle_poison dissolve_free_hugetlb_folio drain_all_pages -- Buddy page can be isolated e.g. for compaction. take_page_off_buddy -- Failed as page is not in the buddy list. -- Page can be putback into buddy after compaction. page_ref_inc -- Leads to buddy page with refcnt = 1. Then unpoison_memory() can unpoison the page and send the buddy page back into buddy list again leading to the above bad page state warning. And bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to allocate this page. Fix this issue by only treating __page_handle_poison() as successful when it returns 1. Link: https://lkml.kernel.org/r/[email protected] Fixes: ceaf8fb ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage") Signed-off-by: Miaohe Lin <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Portia Stephens <[email protected]> Signed-off-by: Roxana Nicolescu <[email protected]>
nvidia-bfigg
force-pushed
the
24.04_linux-nvidia
branch
from
August 21, 2024 15:00
6ad2527
to
9b809a2
Compare
nvidia-bfigg
force-pushed
the
24.04_linux-nvidia
branch
from
September 3, 2024 15:01
9b809a2
to
7d88485
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
… a pasid support
When an iommu_domain is set to IOMMU_DOMAIN_IDENTITY, the driver would skip the allocation of a CD table and set the CONFIG field of the STE to STRTAB_STE_0_CFG_BYPASS. This works well for devices that only have one substream, i.e. PASID disabled.
However, there could be a use case, for a pasid capable device, that allows bypassing the translation at the default substream while still enabling the pasid feature, which means the driver should not skip the allocation of a CD table nor simply bypass the CONFIG field. Instead, the S1DSS field should be set to STRTAB_STE_1_S1DSS_BYPASS and the SHCFG field should be set to STRTAB_STE_1_SHCFG_INCOMING.
Add s1dss in struct arm_smmu_s1_cfg, to allow a configuration in the finalise() to support this use case.
Also, according to "13.5 Summary of attribute/permission configuration fields" in the reference manual, the SHCFG field value is irrelevant. So, set the SHCFG field of the STE always to STRTAB_STE_1_SHCFG_INCOMING for simplification.
Reviewed-by: Pritesh Raithatha [email protected]
Tested-by: Matt Ochs [email protected]