Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caprevoke merge though 2023-06-16 #1738

Merged
merged 1,654 commits into from
Jul 7, 2023
Merged

Conversation

brooksdavis
Copy link
Member

PR for CI.

Merge up to the merge of FreeBSD as of April 14th.

np-2020 and others added 30 commits June 15, 2023 13:53
This is catch up with efe5885.

MFC after:	1 week
Sponsored by:	Chelsio Communications
Each physical port has an associated loopback tx channel and anything
transmitted over that channel by the driver is looped back internally by
the hardware as if received on that physical port.  This change allows
tracing filters to be installed in this loopback path.

MFC after:	1 week
Sponsored by:	Chelsio Communications
Use lo<n> to tap the loopback for port <n>.

MFC after:	1 week
Sponsored by:	Chelsio Communications
Heimdal's lib/hdb/db3.c is only built if DB3 is enabled, i.e. #if HAVE_DB3.
FreeBSD's bdb is DB1. Therefore the entire db3.c file is #ifdef'd out.
Let's avoid building a file that results in a useless object file.

MFC after:	1 week
Since 8116724 the size of struct pfs_node is 280 bytes, so the kernel
memory allocator takes memory from 384 bytes sized bucket. However, the
length of the node name is mostly short, e.g., for Linux emulation layer
it is up to 16 bytes. The size of struct pfs_node w/o pfs_name is 152
bytes, i.e., we have 104 bytes left to fit the node name into the 256
bytes-sized bucket.

Reviewed by:		des
Differential revision:	https://reviews.freebsd.org/D39381
MFC after:		1 month
This will be used later in the linsysfs module to filter out VNETs.

Reviewed by:		des
Differential revision:	https://reviews.freebsd.org/D39382
MFC after:		1 month
Reviewed by:		des
Differential revision:	https://reviews.freebsd.org/D39383
MFC after:		1 month
CID:		1506956
MFC after:	2 weeks
Use already-existing RTM_F_PREFIX rtm_flag to indicate that the
 request assumes exact-prefix lookup instead of the
 longest-prefix-match.

MFC after:	2 weeks
Copy-paste mistake.

Reported by:	Alastair Hogge <[email protected]>
Fixes:	f1d7ae3 ("linuxkpi: Add hdmi helpers")
NETLINK is going to replace rtsock and a number of other ioctl/sysctl interfaces.
In-base utilies such as route(8), netstat(8) and soon ifconfig(8)
 are being converted to use netlink sockets as a transport between
 kernel and userland.
In the current configuration, it still possible have the kernel
 without NETLINK (`nooptions NETLINK`) and use the aforementioned
 utilies by buidling the world with `WITHOUT_NETLINK` src.conf knob.
However, this approach does not cover the cases when person unintentionally
 builds a custom kernel without netlink and tries to use the standard userland.

This change adds `option NETLINK` to the default options for each
 architecture, fixing the custom kernel issue.
For arm, this change uses `std.armv6` and `std.armv7` (netlink already in)
 instead of DEFAULTS.

Reviewed By: imp
Differential Revision: https://reviews.freebsd.org/D39339
From static code analysis, some device drivers (cxgbe, mlx4, mthca, and qlnx)
do not enter net epoch before lagg_input_infiniband(). If IPoIB interface is a
member of lagg(4) interface, and after returning from lagg_input_infiniband()
the receiving interface of mbuf is set to lagg(4) interface, then when
concurrently destroying the lagg(4) interface, there is a small window that the
interface gets destroyed and becomes invalid before infiniband_input() re-enter
net epoch, thus leading use-after-free.

Widen NET_EPOCH coverage to prevent use-after-free.

Thanks hselasky@ for testing with mlx5 devices.

Reviewed by:	hselasky
Tested by:	hselasky
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39275
Different lagg protocols have different means and policies to process incoming
traffic. For example, for failover protocol, by default received traffic is only
accepted when they are received through the active port. For lacp protocol, LACP
control messages are tapped off, also traffic will be dropped if they are
received through the port which is not in collecting state or is not joined to
the active aggregator. It confuses if user dump and see inbound traffic on
lagg(4) interfaces but they are actually silently dropped and not passed into
the net stack.

Tap traffic after protocol processing so that user will have consistent view of
the inbound traffic, meanwhile mbuf is set with correct receiving interface and
bpf(4) will diagnose the right direction of inbound packets.

PR:		270417
Reviewed by:	melifaro (previous version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39225
VLAN identifier 0xFFF is reserved. It must not be configured or
transmitted.

Also validate during parsing to prevent potential integer overflow.

Reviewed by:	#network, melifaro
Fixes:		c7cffd6 Add support for stacked VLANs (IEEE 802.1ad, AKA Q-in-Q)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D39282
Remove two extra spaces.  No functional change.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Handle if arrival/departure events and VNETs.

Differential Revision:	https://reviews.freebsd.org/D38901
MFC after:		1 month
XMFC with:		ifAPI, pseudofs
Add VM_MEMATTR_DEVICE_NP to the arm64 vm.pmap.kernel_maps sysctl.

Reviewed by:	markj
Sponsored by:	Arm Ltd
 Differential Revision:	https://reviews.freebsd.org/D39371
Submitted by:	Tyuryukanov S.Y.
Notable upstream pull request merges:
  #12194 Fix short-lived txg caused by autotrim
  #13368 ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()
  #13392 Implementation of block cloning for ZFS
  #13741 SHA2 reworking and API for iterating over multiple implementations
  #14282 Sync thread should avoid holding the spa config write lock
         when possible
  #14283 txg_sync should handle write errors in ZIL
  #14359 More adaptive ARC eviction
  #14469 Fix NULL pointer dereference in zio_ready()
  #14479 zfs redact fails when dnodesize=auto
  #14496 improve error message of zfs redact
  #14500 Skip memory allocation when compressing holes
  #14501 FreeBSD: don't verify recycled vnode for zfs control directory
  #14502 partially revert PR 14304 (eee9362)
  #14509 Fix per-jail zfs.mount_snapshot setting
  #14514 Fix data race between zil_commit() and zil_suspend()
  #14516 System-wide speculative prefetch limit
  #14517 Use rw_tryupgrade() in dmu_bonus_hold_by_dnode()
  #14519 Do not hold spa_config in ZIL while blocked on IO
  #14523 Move dmu_buf_rele() after dsl_dataset_sync_done()
  #14524 Ignore too large stack in case of dsl_deadlist_merge
  #14526 Use .section .rodata instead of .rodata on FreeBSD
  #14528 ICP: AES-GCM: Refactor gcm_clear_ctx()
  #14529 ICP: AES-GCM: Unify gcm_init_ctx() and gmac_init_ctx()
  #14532 Handle unexpected errors in zil_lwb_commit() without ASSERT()
  #14544 icp: Prevent compilers from optimizing away memset()
         in gcm_clear_ctx()
  #14546 Revert zfeature_active() to static
  #14556 Remove bad kmem_free() oversight from previous zfsdev_state_list
         patch
  #14563 Optimize the is_l2cacheable functions
  #14565 FreeBSD: zfs_znode_alloc: lock the vnode earlier
  #14566 FreeBSD: fix false assert in cache_vop_rmdir when replaying ZIL
  #14567 spl: Add cmn_err_once() to log a message only on the first call
  #14568 Fix incremental receive silently failing for recursive sends
  #14569 Restore ASMABI and other Unify work
  #14576 Fix detection of IBM Power8 machines (ISA 2.07)
  #14577 Better handling for future crypto parameters
  #14600 zcommon: Refactor FPU state handling in fletcher4
  #14603 Fix prefetching of indirect blocks while destroying
  #14633 Fixes in persistent error log
  #14639 FreeBSD: Remove extra arc_reduce_target_size() call
  #14641 Additional limits on hole reporting
  #14649 Drop lying to the compiler in the fletcher4 code
  #14652 panic loop when removing slog device
  #14653 Update vdev state for spare vdev
  #14655 Fix cloning into already dirty dbufs
  #14678 Revert "Do not hold spa_config in ZIL while blocked on IO"

Obtained from:	OpenZFS
OpenZFS commit:	431083f
The latest import of openzfs broke the hacks that we used to omit the
special registers being used on arm64. Add snarky note documenting this
situation since it's a mess now since the hack was only partially
undone, leaving behind a mess.

Sponsored by:		Netflix

(cherry picked from commit 238271f)
Add one ifdef to upstrem code and get rid of compiling the horrible
checked-in aarch64 assembler for the boot loader that the loader will
never use. I'll attempt to upstream this and adjust as needed.

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D39897

(cherry picked from commit 6c8358c)
subrepo:
  subdir:   "sys/contrib/subrepo-openzfs"
  merged:   "a9089cc01265"
upstream:
  origin:   "https://github.com/CTSRD-CHERI/zfs.git"
  branch:   "cheri-hybrid"
  commit:   "a9089cc01265"
git-subrepo:
  version:  "0.4.3"
  origin:   "???"
  commit:   "???"
See the comment inside.

Reviewed by:	dim
Differential Revision:	https://reviews.freebsd.org/D39389
Reviewed by:	ae
Sponsored by:	Nvidia networking
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39393
Sponsored by:	Nvidia networking
MFC after:	1 week
ehem and others added 28 commits June 16, 2023 14:36
Keeping released xenisrcs in a known state simplifies allocation, but
forces the allocation function to maintain that state.  This turns into
a problem when trying to allow for interchangeable allocation functions.
Fix this issue by ensuring xenisrcs are always *fully* initialized
during binding.

Reviewed by: royger
xen_intr_handle_upcall() has two interfaces.  It needs to be called by
the x86 assembly code invoked by the APIC.  Second, it needs to be called
as a driver_filter_t for the XenPCI code and for architectures besides
x86.

Unfortunately the driver_filter_t interface was implemented as a wrapper
around the x86-APIC interface.  Now create a simple wrapper for the
x86-APIC code, which calls an architecture-independent
xen_intr_handle_upcall().

When called via intr_event_handle(), driver_filter_t functions expect
preemption to be disabled.  This removes the need for
critical_enter()/critical_exit() when called this way.

The lapic_eoi() call is only needed on x86 in some cases when invoked
directly as an APIC vector handler.

Additionally driver_filter_t functions have no need to handle interrupt
counters.  The intrcnt_add() calling function was reworked to match the
current situation.  intrcnt_add() is now only called via one path.

The increment/decrement of curthread->td_intr_nesting_level had
previously been left out.  Appears this was mostly harmless, but this
was noticed during implementation and has been added.

CONFIG_X86 is a leftover from use with Linux.  While the barrier isn't
needed for FreeBSD on x86, it will be needed for FreeBSD on other
architectures.

Copyright note.  xen_intr_intrcnt_add() was introduced at 76acc41
by Justin T. Gibbs.  xen_intrcnt_init() was introduced at fd036de
by John Baldwin.

sys/x86/xen/xen_arch_intr.c was originally created by Julien Grall in
2015 for the purpose of holding the x86 interrupt interface.  Later it
was found xen_intr_handle_upcall() was better earlier, and the x86
interrupt interface better later.  As such the filename and header list
belong to Julien Grall, but what those were created for is later.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30006
Move the xenisrc structure which needs to be shared between the core Xen
interrupt code and architecture-dependent code into a separate header.  A
similar situation exists for the NR_EVENT_CHANNELS constant.

Turn xi_intsrc into a type definition named xi_arch to reflect the new
purpose of being an architectural variable for the interrupt source.

This was originally implemented by Julien Grall, but has been heavily
modified.  The core side was renamed "intr-internal.h" and is #include'd
by "arch-intr.h" instead of the other way around.  This allows the
architecture to add function definitions which use struct xenisrc.

The original version only moved xi_intsrc into xen_arch_isrc_t.  Moving
xi_vector was done by the submitter.

The submitter had also moved xi_activehi and xi_edgetrigger into
xen_arch_isrc_t.  Those disappeared with the removal of PVHv1 support.

Copyright note.  The current xenisrc structure was introduced at
76acc41 by Justin T. Gibbs.  Traces remain, but the strength of
Copyright claims from before 2013 seem pretty weak.

Reviewed by: royger
Submitted by: Elliott Mitchell <[email protected]>, 2021-03-17 19:09:01
Original implementation: Julien Grall <[email protected]>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30648
[royger]
 - Adjust some line lengths
 - Fix comment about NR_EVENT_CHANNELS after movement.
 - Use #include instead of symlinks.
The evtchn_type enum is only touched by the Xen interrupt code.  Other
event channel uses no longer need the value, so that has been moved to
restrict its use.

Copyright note.  The current evtchn_type was introduced at 76acc41
by Justin T. Gibbs.  This in turn appears to have been heavily inspired
by 30d1eef done by Kip Macy.

Reviewed by: royger
Scanning the list of interrupts to find an unused entry is rather
inefficient.  Instead overlay a free list structure and use a list
instead.

This also has the useful effect of removing the last use of evtchn_type
values outside of xen_intr.c.

Reviewed by: royger
[royger]
 - Make avail_list static.
This value doesn't need to be set in xen_intr_alloc_isrc().  What is
needed is simply to ensure the allocated xenisrc won't appear as free,
even if xi_type is written non-atomically.  Since the type is no longer
used to indicate free or not, the calling function should take care of
all non-architecture initialization.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D31188
The x86 PIC interface is very much x86-specific and not used by other
architectures.  Since most of xen_intr.c can be shared with other
architectures, the PIC interface needs to be broken off.

Introduce wrappers for calls into the architecture-dependent interrupt
layer.  All architectures need roughly the same functionality, but the
interface is slightly different between architectures.  Due to the
wrappers being so thin, all of them are implemented as inline in
arch-intr.h.

The original implementation was done by Julien Grall in 2015, but this
has required major updating.

Removal of PVHv1 meant substantial portions disappeared.  The original
implementation took care of moving interrupt allocation to
xen_arch_intr.c, but this has required massive rework and was broken
off.

In the original implementation the wrappers were normal functions.  Some
had empty stubs in xen_intr.c and were removed.

Reviewed by: royger
Submitted by: Elliott Mitchell <[email protected]>
Original implementation: Julien Grall <[email protected]>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30909
Simply moving the interrupt allocation and release functions into files
which belong to the architecture.  Since x86 interrupt handling is quite
distinct from other architectures, this is a crucial necessary step.

Identifying the border between x86 and architecture-independent is
actually quite tricky.  Similarly, getting the prototypes for the
border right is also quite tricky.

Inspired by the work of Julien Grall <[email protected]>,
2015-10-20 09:14:56, but heavily adjusted.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30936
The event channel source code or equivalent is needed on all
architectures.  Since much of this is viable to share, get this moved out
of x86-land.  Each interrupt interface then needs a distinct back-end
implementation.

Reviewed by: royger
Submitted by: Elliott Mitchell <[email protected]>
Original implementation: Julien Grall <[email protected]>, 2014-01-13 17:41:04
Differential Revision: https://reviews.freebsd.org/D30236
The xen_domain_type and HYPERVISOR_shared_info variables are shared by
all Xen architectures, so they should be in common rather than
reimplemented by each architecture.

hvm_start_flags is used by xen_initial_domain() and so needs to be in
common.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D28982
Sponsored by:	Zenith Electronics LLC
Sponsored by:	Klara, Inc.
Current code checks whether or not txstreams are equal to rxstreams and if it
isn't - sets needed bits in "Transmit MCS Set". But if they are equal it sets
whole set to zero, which contradicts the standard, if tx and rx streams are
equal 'Tx MCS Set Defined' (table 9-186, IEEE Std 802.11-2020) must be set to
one.

Reviewed by:		bz
Approved by:		bz
Sponsored by:		Serenity Cybersecurity, LLC
Differential Revision:	https://reviews.freebsd.org/D39476
RX MCS set defines which MCSs are supported for RX, bits 0-31 are for equal
modulation of the streams, bits 33-76 are for unequal case. Current code checks
txstreams variable instead of rxstreams to set bits from 53 to 76 for 4 spatial
streams case.

The modulations are defined in tables 19-38 and 19-41 of the IEEE Std
802.11-2020.

Spotted by bz in https://reviews.freebsd.org/D39476

Reviewed by:		bz
Approved by:		bz
Sponsored by:		Serenity Cybersecurity, LLC
Differential Revision:	https://reviews.freebsd.org/D39568
This changes intends to reduce the bar to the kernel unit-testing by
 introducing a new kernel-testing framework ("ktest") based on Netlink,
 loadable test modules and python test suite integration.

This framework provides the following features:
* Integration to the FreeBSD test suite
* Automatic test discovery
* Automatic test module loading
* Minimal boiler-plate code in both kernel and userland
* Passing any metadata to the test
* Convenient environment pre-setup using python testing framework
* Streaming messages from the kernel to the userland
* Running tests in the dedicated taskqueues
* Skipping or parametrizing tests

Differential Revision: https://reviews.freebsd.org/D39385
MFC after:	2 weeks
Current Netlink message writer code relies on executing callbacks
 with arbitrary data (pointer or integer) to flush the completed
 messages.
This arbitrary data is stored as a union of { void *, uint64_t }.
At some stage, the message flushing code copied this data, using
 direct uint64_t assignment instead of copying the union. It lead
 to failure on CHERI, as sizeof(pointer) == 16 there.

Fix the code by making union non-anonymous and copying it entirely.

Reviewed by:	br, jhb, jrtc27
Differential Revision: https://reviews.freebsd.org/D39557
MFC after:	2 weeks
Reported by:	phk
Reviewed by:	imp, markj
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39573
Its possible to induce a crash in either rack or bbr. This would be done
if the rack stack were say the default and bbr was being used by a connection.
If the bbr stack is then unloaded and it was active, we will trigger a MPASS assert
in tcp_hpts since the new stack (default rack) would start a timer, and the old stack
(bbr) would have the inp already in hpts.

Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39576
RLIMIT_CPU applies to CPU time, not real (wall-clock) time.
This test failed in AWS, where the real time was 5-7 seconds.

Sum the user and system CPU time used, and validate that.

While I'm here, don't bother specifying -s exit:0 or -e empty,
since those are checked by default.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
When vm_map_remove() is called from vm_swapout_map_deactivate_pages()
due to swapout, PKRU attributes for the removed range must be kept
intact.  Provide a variant of pmap_remove(), pmap_map_delete(), to
allow pmap to distinguish between real removes of the UVA mappings
and any other internal removes, e.g. swapout.

For non-amd64, pmap_map_delete() is stubbed by define to pmap_remove().

Reported by:	andrew
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39556
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
The current content of local.trust.mk is mostly for example
purposes.
@brooksdavis brooksdavis merged commit 17f197f into caprevoke Jul 7, 2023
14 checks passed
@brooksdavis brooksdavis deleted the caprevoke-merge-though-20230616 branch July 7, 2023 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.