Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull in upstream patches for NVIDIA erratum T241-FABRIC-4 #8

Open
wants to merge 145 commits into
base: main
Choose a base branch
from

Conversation

tdavenvidia
Copy link
Contributor

No description provided.

ianmay81 and others added 30 commits May 30, 2023 10:41
Ignore: yes
Signed-off-by: Ian May <[email protected]>
Ignore: yes
Signed-off-by: Dimitri John Ledkov <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1992162
Properties: no-test-build
Signed-off-by: Dimitri John Ledkov <[email protected]>
Ignore: yes
Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1996300
Properties: no-test-build
Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
Ignore: yes
Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1998795
Properties: no-test-build
Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
Ignore: yes
Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1999745
Properties: no-test-build
Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2002679

This reverts commit e96b739.

In order to use the same compiler as used for kinetic:linux (gcc-12) we have
made some changes to explicitly compile jammy:linux-hwe-5.19 with the same
compiler. However, such changes are breaking the build of a number of dkms
packages. Revert this commit, which would make the kernel compile with the
default GCC in Jammy (gcc-11), while a proper solution is being worked on.

Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
Ignore: yes
Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2002679

Changing the gcc version used for the build from gcc-12 to gcc (11)
removed CONFIG_INIT_STACK_ALL_ZERO and CONFIG_SHADOW_CALL_STACK from
being an option on the kernel config. Update the annotations file
accondingly, keeping the annotation instead of simply removing it so
when the configs are re-enabled in the future it stays as a reminder
to update the annotation.

Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2001755
Properties: no-test-build
Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
In the Jammy environment, some config options cannot be enabled as
enforced in the parent kernel. Add code to automatically adjust those.
This was attempted to be manual edits to the annotations. However those
get overwritten on each rebase. So instead add it to the local-mangle
script but run only if the compiler set to be used is gcc-11.

Ignore: yes
Signed-off-by: Stefan Bader <[email protected]>
ianmay81 and others added 29 commits May 31, 2023 11:56
Ignore: yes
Signed-off-by: Ian May <[email protected]>
Ignore: yes
Signed-off-by: Ian May <[email protected]>
Copied from master

Ignore: yes
Signed-off-by: Ian May <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1982519
nvbug: https://nvbugswb.nvidia.com/NVBugs5/redir.aspx?url=/3728100

With this change, the NFS driver would be enabled to
support GPUDirectStorage(GDS). The change is around
frwr_map and frwr_unmap in the NFS driver, where the
IO request is first intercepted to check for GDS pages and
if it is a GDS page then the request is served by GDS driver
component called nvidia-fs, else the request would be served
by the standard NFS driver code.

Signed-off-by: Sourab Gupta <[email protected]>
Acked-by: Kiran Kumar Modukuri <[email protected]>
Acked-by: Rebanta Mitra <[email protected]>
Signed-off-by: Brad Figg <[email protected]>
Ignore: yes
Signed-off-by: Brad Figg <[email protected]>
Ignore: yes
Signed-off-by: Roxana Nicolescu <[email protected]>
Signed-off-by: Brad Figg <[email protected]>
CVE-2023-1829
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Acked-by: Cengiz Can <[email protected]>
Acked-by: Stefan Bader <[email protected]>
Signed-off-by: Stefan Bader <[email protected]>
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Brad Figg <[email protected]>
CONFIG_SERIAL_8250_MID=y
CONFIG_SND_HDA_INTEL_DMI_SILENT_STREAM=y
CONFIG_HSU_DMA=y

Signed-off-by: Brad Figg <[email protected]>
…idia-fs version changed to 2.15.3~jammy

Signed-off-by: Brad Figg <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2021535

ARM architecture only has 'memory', so all devices are accessed by
MMIO if possible.

Signed-off-by: Jammy Huang <[email protected]>
Reviewed-by: Thomas Zimmermann <[email protected]>
Signed-off-by: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Acked-by: Brad Figg <[email protected]>
Acked-by: Jamie Nguyen <[email protected]>
(cherry picked from commit 4327a6137ed43a091d900b1ac833345d60f32228)
Signed-off-by: Ian May <[email protected]>
Ignore: yes
Signed-off-by: Ian May <[email protected]>
Copied from master

Ignore: yes
Signed-off-by: Ian May <[email protected]>
An IRQ's effective affinity can only be different from its configured
affinity if there are multiple CPUs. Make it clear that this option is
only meaningful when SMP is enabled. Most of the relevant code in
irqdesc.c is already hidden behind CONFIG_SMP anyway.

Signed-off-by: Samuel Holland <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit 0e6c027)
Signed-off-by: Tushar Dave <[email protected]>
The T241 platform suffers from the T241-FABRIC-4 erratum which causes
unexpected behavior in the GIC when multiple transactions are received
simultaneously from different sources. This hardware issue impacts
NVIDIA server platforms that use more than two T241 chips
interconnected. Each chip has support for 320 {E}SPIs.

This issue occurs when multiple packets from different GICs are
incorrectly interleaved at the target chip. The erratum text below
specifies exactly what can cause multiple transfer packets susceptible
to interleaving and GIC state corruption. GIC state corruption can
lead to a range of problems, including kernel panics, and unexpected
behavior.

>From the erratum text:
  "In some cases, inter-socket AXI4 Stream packets with multiple
  transfers, may be interleaved by the fabric when presented to ARM
  Generic Interrupt Controller. GIC expects all transfers of a packet
  to be delivered without any interleaving.

  The following GICv3 commands may result in multiple transfer packets
  over inter-socket AXI4 Stream interface:
   - Register reads from GICD_I* and GICD_N*
   - Register writes to 64-bit GICD registers other than GICD_IROUTERn*
   - ITS command MOVALL

  Multiple commands in GICv4+ utilize multiple transfer packets,
  including VMOVP, VMOVI, VMAPP, and 64-bit register accesses."

  This issue impacts system configurations with more than 2 sockets,
  that require multi-transfer packets to be sent over inter-socket
  AXI4 Stream interface between GIC instances on different sockets.
  GICv4 cannot be supported. GICv3 SW model can only be supported
  with the workaround. Single and Dual socket configurations are not
  impacted by this issue and support GICv3 and GICv4."

Link: https://developer.nvidia.com/docs/t241-fabric-4/nvidia-t241-fabric-4-errata.pdf

Writing to the chip alias region of the GICD_In{E} registers except
GICD_ICENABLERn has an equivalent effect as writing to the global
distributor. The SPI interrupt deactivate path is not impacted by
the erratum.

To fix this problem, implement a workaround that ensures read accesses
to the GICD_In{E} registers are directed to the chip that owns the
SPI, and disable GICv4.x features. To simplify code changes, the
gic_configure_irq() function uses the same alias region for both read
and write operations to GICD_ICFGR.

Co-developed-by: Vikram Sethi <[email protected]>
Signed-off-by: Vikram Sethi <[email protected]>
Signed-off-by: Shanker Donthineni <[email protected]>
Acked-by: Sudeep Holla <[email protected]> (for SMCCC/SOC ID bits)
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit 35727af2b15d98a2dd2811d631d3a3886111312e)
(tdave: minor conflict in drivers/irqchip/irq-gic-v3.c)
Signed-off-by: Tushar Dave <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.