zfs-2.3.0-rc4 patchset #16760

behlendorf · 2024-11-15T00:59:47Z

Motivation and Context

Initial proposed patchset for zfs-2.3.0-rc4.

Description

Bug fixes, build fixes, ZTS updates.

How Has This Been Tested?

Clean backports from master. Will be retested by the CI.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

This commit fixes JSON output for zpool list when user properties are requested with -o flag. This case needed to be handled specifically since zpool_prop_to_name does not return property name for user properties, instead it is stored in pl->pl_user_prop. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes openzfs#16734

In zpool_get_user_prop, when called from zpool_expand_proplist and collect_pool, we often have zpool_props present in zpool_handle_t equal to NULL. This mostly happens when only one user property is requested using zpool list -o <user_property>. Checking for this case and correctly initializing the zpool_props field in zpool_handle_t fixes this issue. Interestingly, this issue does not occur if we query any other property like name or guid along with a user property with -o flag because while accessing properties like guid, zpool_prop_get_int is called which checks for this case specifically and calls zpool_get_all_props. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes openzfs#16734

mappedread_sf() may allocate pages; if it fails to populate a page can't free it, it needs to ensure that it's placed into a page queue, otherwise it can't be reclaimed until the vnode is destroyed. I think this is quite unlikely to happen in practice, it was noticed by code inspection. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes openzfs#16643

As a deadlock avoidance measure, zfs_getpages() would only try to acquire a rangelock, falling back to a single-page read if this was not possible. However, this is incompatible with direct I/O. Instead, release the busy lock before trying to acquire the rangelock in blocking mode. This means that it's possible for the page to be replaced, so we have to re-lookup. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes openzfs#16643

..., before we make the header or the log block visible to others. It should fix assertion on allocated space going negative if the header is freed once the lock is dropped, while the write is still going. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16040 Closes openzfs#16743

dsl_free() calls zio_free() to free the block. For most blocks, this simply calls metaslab_free() without doing any IO or putting anything on the IO pipeline. Some blocks however require additional IO to free. This at least includes gang, dedup and cloned blocks. For those, zio_free() will issue a ZIO_TYPE_FREE IO and return. If a huge number of blocks are being freed all at once, it's possible for dsl_dataset_block_kill() to be called millions of time on a single transaction (eg a 2T object of 128K blocks is 16M blocks). If those are all IO-inducing frees, that then becomes 16M FREE IOs placed on the pipeline. At time of writing, a zio_t is 1280 bytes, so for just one 2T object that requires a 20G allocation of resident memory from the zio_cache. If that can't be satisfied by the kernel, an out-of-memory condition is raised. This would be better handled by improving the cases that the dmu_tx_assign() throttle will handle, or by reducing the overheads required by the IO pipeline, or with a better central facility for freeing blocks. For now, we simply check for the cases that would cause zio_free() to create a FREE IO, and instead put the block on the pool's freelist. This is the same place that blocks from destroyed datasets go, and the async destroy machinery will automatically see them and trickle them out as normal. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#6783 Closes openzfs#16708 Closes openzfs#16722 Closes openzfs#16697

- If we don't want dmu_read_pages() to perform extra readahead/behind, pass a pointer to 0 instead of a null pointer, as dum_read_pages() expects rahead and rbehind to be non-null. - Avoid unneeded iterations in a loop. Sponsored-by: Klara, Inc. Reported-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Mark Johnston <[email protected]> Closes openzfs#16758

Since zvol read and write can process up to (DMU_MAX_ACCESS / 2) bytes in a single operation, the current optimal I/O size is too low. SCST directly reports this value as the optimal transfer length for the target SCSI device. Increasing it from the previous volblocksize results in performance improvement for large block parallel I/O workloads. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes openzfs#16750

This patch fixes compilation with uClibc by applying the same fallback as commit e12d761 to the `getversion.c` file, which was previously overlooked. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: José Luis Salvador Rufo <[email protected]> Closes openzfs#16735 Closes openzfs#16741

Welcome to the party 🎉 Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16762

If we write less than 113 bytes with enabled compression we get embeded block, which then fails check for number of cloned blocks in bclone_test. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16740

We are doing exactly the same checks around all brt_pending_add(). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16740

Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16740

While block cloning operation from the beginning was made per-vdev, before this change most of its data were protected by two pool- wide locks. It created lots of lock contention in many workload. This change makes most of block cloning data structures per-vdev, which allows to lock them separately. The only pool-wide lock now it spa_brt_lock, protecting array of per-vdev pointers and in most cases taken as reader. Also this splits per-vdev locks into three different ones: bv_pending_lock protects the AVL-tree of pending operations in open context, bv_mos_entries_lock protects BRT ZAP object from while being prefetched, and bv_lock protects the rest of per-vdev context during TXG commit process. There should be no functional difference aside of some optimizations. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Brian Atkinson <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16740

When an OFFLINE device is physically removed, a spare is automatically activated. However, this behavior differs in FreeBSD, where we do not transition from OFFLINE state to REMOVED. Our support team has encountered cases where customers experienced unexpected behavior during drive replacements, with multiple spares activating for the same VDEV due to a single disk replacement. This patch ensures that a drive in an OFFLINE state remains in that state, preventing it from transitioning to REMOVED and being automatically replaced by a spare. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Ameer Hamza <[email protected]> Closes openzfs#16751

Signed-off-by: Brian Behlendorf <[email protected]>

snajpa · 2024-11-20T19:31:12Z

We've had to update the largest production node we have with Linux 6.11, so I took this PR and applied it onto rc3 along with my latest changes (the three PRs altogether) - so far so good :)

behlendorf · 2024-11-21T15:53:13Z

Replaced by #16794.

usaleem-ix and others added 9 commits November 14, 2024 16:51

behlendorf changed the title ~~zfs-2.3.0-rc3 patchset~~ zfs-2.3.0-rc4 patchset Nov 15, 2024

robn and others added 7 commits November 15, 2024 15:08

AUTHORS: refresh with recent new contributors

18474ef

Welcome to the party 🎉 Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#16762

Tag 2.3.0-rc4

7a67c61

Signed-off-by: Brian Behlendorf <[email protected]>

behlendorf force-pushed the zfs-2.2.0-rc4-staging branch from 3ab1d1a to 7a67c61 Compare November 15, 2024 23:09

satmandu mentioned this pull request Nov 19, 2024

Linux: Fix zfs_prune panics #16770

Merged

13 tasks

behlendorf closed this Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zfs-2.3.0-rc4 patchset #16760

zfs-2.3.0-rc4 patchset #16760

behlendorf commented Nov 15, 2024

snajpa commented Nov 20, 2024

behlendorf commented Nov 21, 2024

zfs-2.3.0-rc4 patchset #16760

zfs-2.3.0-rc4 patchset #16760

Conversation

behlendorf commented Nov 15, 2024

Motivation and Context

Description

How Has This Been Tested?

Types of changes

snajpa commented Nov 20, 2024

behlendorf commented Nov 21, 2024