`btrfs scrub start -r` tries to write data unless mounted read-only #934

m0gg · 2024-12-21T11:53:29Z

Happened to me while readonly-checking a recovered md raid.
System information:

# btrfs --version
btrfs-progs v6.12
-EXPERIMENTAL -INJECT -STATIC +LZO +ZSTD +UDEV +FSVERITY +ZONED CRYPTO=builtin

# uname -a
Linux <redacted> 6.12.5-gentoo-dist #1 SMP PREEMPT_DYNAMIC Sun Dec 15 03:17:02 -00 2024 x86_64 Intel(R) Xeon(R) CPU E3-1246 v3 @ 3.50GHz GenuineIntel GNU/Linux

This lsblk snip visualizes the block device layers:

NAME                        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
loop0                         7:0    0   4,5T  0 loop  
└─md127                       9:127  0  13,6T  1 raid5 
  ├─vg--archive-data--crypt 253:0    0     4T  0 lvm   
  │ └─data                  253:3    0     4T  0 crypt /run/media/system/dm-3

Note, that md127 was started in readonly mode.

When running btrfs scrub -r on the fs of data (mounted rw), the kernel reports attempted writes to the read-only device md127 after about 10G of scrubbed data:

[174366.203678] BTRFS info (device dm-3): first mount of filesystem e18f0c40-88de-413f-9d7e-dcc8136ad6dd
[174366.203691] BTRFS info (device dm-3): using crc32c (crc32c-intel) checksum algorithm
[174366.203696] BTRFS info (device dm-3): using free-space-tree
[174441.289198] BTRFS info (device dm-3): scrub: started on devid 1
[174475.439500] Trying to write to read-only block-device md127
[174475.439546] btrfs_dev_stat_inc_and_print: 362 callbacks suppressed
[174475.439554] BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
[174475.439588] BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
[174475.439610] BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
[174475.439657] BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
[174475.439693] BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 5, rd 0, flush 0, corrupt 0, gen 0
[174475.439722] BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 6, rd 0, flush 0, corrupt 0, gen 0
[174475.439758] BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
[174475.439787] BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
[174475.439815] BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
[174475.439852] BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
[174475.445886] BTRFS: error (device dm-3) in btrfs_commit_transaction:2523: errno=-5 IO failure (Error while writing out transaction)
[174475.445915] BTRFS info (device dm-3 state E): forced readonly
[174475.445927] BTRFS warning (device dm-3 state E): Skipping commit of aborted transaction.
[174475.445938] BTRFS error (device dm-3 state EA): Transaction aborted (error -5)
[174475.445948] BTRFS: error (device dm-3 state EA) in cleanup_transaction:2017: errno=-5 IO failure
[174475.446157] BTRFS warning (device dm-3 state EA): failed setting block group ro: -5
[174475.446192] BTRFS info (device dm-3 state EA): scrub: not finished on devid 1 with status: -5

Everything's fine when mounted ro.

The text was updated successfully, but these errors were encountered:

Forza-tng · 2024-12-21T14:34:44Z

It is expected that Btrfs tries to write to the block devices, even when mounting ro (log replay, etc). I do not think btrfs can run on a ro block device.

m0gg · 2024-12-21T14:49:42Z

It is expected that Btrfs tries to write to the block devices, even when mounting ro (log replay, etc). I do not think btrfs can run on a ro block device.

The man-page - btrfs-scrub(8) - about the -r flag:

run in read-only mode, do not attempt to correct
anything, can be run on a read-only filesystem

As i wrote, everything's fine when mounted ro. No complaints about writes to an ro-device.

Zygo · 2024-12-21T15:22:51Z

There are multiple agents here. The documentation could be clearer.

The scrub is read-only, i.e. errors found in blocks that are read and verified by the scrub ioctl are not corrected.

The filesystem is read-write. Errors have been found while running the scrub, so the device stats are incremented. These updates to the device stats items will be committed in the next transaction, which is what failed in the logs above.

Also, scrub reads the filesystem metadata trees in order to get device maps, extent maps, and data csums for verification. If any of these reads fail, the filesystem will attempt to correct these pages on disk by writing the correct data over the incorrect data.

If any other process reads the filesystem while the scrub is running, the other process is not affected by the -r flag on scrub. If those reads encounter correctable errors, the filesystem will attempt to correct the data and overwrite bad blocks.

Try it with the preferred metadata patches and set up data-only and metadata-only drives. You should see that scrub -r will never write to a data-only drive.

m0gg · 2024-12-21T15:36:32Z

That's what I guessed too after finding out I forgot to mount ro the first time. A process running with an ro option causing writes was still scary enough for me to report it.

The documentation could be clearer.

I agree. While this might be a corner-case, I still think it should be noted, that the fs itself could still try to fix stuff by itself.

adam900710 · 2024-12-21T21:38:28Z

Firstly, if scrub finds no error, it should not trigger any write into the fs, thus even if the target block device is RO, and no data/metadata/superblock errors are found, scrub itself will not trigger the write.

According to your output, at least scrub found no error so far, so the write is not triggered by scrub itself.

The direct cause is that, there is a transaction needs to be committed, and we failed to commit the transaction.

The root cause is that, since scrub is done on commit roots, to avoid write and scrub on the same block group, we mark the current scrub target as read-only.

But that marking read-only operation needs to start a transaction and even force a chunk allocation, which will need to join/start a new transaction, which will cause new metadata to be created and written back.
And that writeback triggered the error.

That's why scrub provides read-only mode, which will not try to allocate a chunk (aka, update the metadata) during scrub.

Then talking about why if your fs is mount RO, even a RW scrub will be fine.

That's because the function btrfs_inc_block_group_ro() utilized by scrub will automatically avoid chunk allocation if the fs is already mounted RO, thus even if it's a RW scrub, as long as no error is found, everything is fine.

So there is nothing special, nothing related to whatever patchset, it's just some corner cases related to scrub implementation.
The overall rules are:

RW scrub on RW fs
High chance to write to the fs, no matter if errors are found.
RW scrub on RO fs
If no errors found, it's the same as RO scrub
RO scrub on RO fs
Purely RO.
RO scrub on RW fs
Scrub itself will not cause any write by itself.

And your report matches the first RW scrub on RW fs case, thus write is expected.

m0gg · 2024-12-21T22:10:21Z

And your report matches the first RW scrub on RW fs case, thus write is expected.

That statement is not true. I clearly stated that i started an RO scrub on an RW fs which resides on an RO device.

Worth mentioning:
I successfully copied all of the FS contents in that setup without triggering the error. Only the scrub (or any intentional write operation) would trigger it.

Since you already closed this issue, I guess you do not deem "RO scrub may cause writes to the underlaying device unless mounted RO" worthy enough to be noted?

adam900710 · 2024-12-21T22:42:34Z

OK, the problem is in the btrfs_inc_block_group_ro(), which doesn't really honor the scrub RO, but only the fs RO flag.

Thus a RO scrub will trigger a transaction on RW mounted fs.

I can add an extra check to avoid this. Although on such RW mounted fs, you may hit -ENOSPC if there is not much space left.

m0gg · 2024-12-21T22:58:38Z

which doesn't really honor the scrub RO, but only the fs RO flag

This sounds unintentional and IMHO deserves to be fixed. Thank you very much!

Although on such RW mounted fs, you may hit -ENOSPC if there is not much space left.

This seems like a very minor inconvenience.

adam900710 · 2024-12-21T23:37:15Z

Unfortunately the code is not that easy to handle the RO scrub on RW mount:

We have to start a transaction
To ensure there is no conflicts between marking block group RO, and writing back the target block group.
Thus we hold a transaction handle to prevent the current transaction to be committed, until we lock the ro_block_group_mutex.
We will still update the super blocks even if the current transaction is empty

So this means even if we skip the chunk allocation part, we will have an empty transaction to commit and have to update the super block.

But if we skip holding a transaction and continue, it means we will have the chance to conflict and corrupt the target block group.
The best solution is to make btrfs to detect empty transaction and fully skip it (aka, no writes at all), but will require quite some changes.

I'd go with a doc update for now, to warn about the modification to the fs.

[BUG] There is a bug report that read-only scrub on a read-write fs still causes writes into the fs, and that will be caught if there is a read-only block device among the storage stack. This will cause a kernel warning on failed transaction commit: BTRFS info (device dm-3): first mount of filesystem e18f0c40-88de-413f-9d7e-dcc8136ad6dd BTRFS info (device dm-3): using crc32c (crc32c-intel) checksum algorithm BTRFS info (device dm-3): using free-space-tree BTRFS info (device dm-3): scrub: started on devid 1 Trying to write to read-only block-device md127 btrfs_dev_stat_inc_and_print: 362 callbacks suppressed BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 2, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 5, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 6, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 7, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 8, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 9, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 10, rd 0, flush 0, corrupt 0, gen 0 BTRFS: error (device dm-3) in btrfs_commit_transaction:2523: errno=-5 IO failure (Error while writing out transaction) BTRFS info (device dm-3 state E): forced readonly BTRFS warning (device dm-3 state E): Skipping commit of aborted transaction. BTRFS error (device dm-3 state EA): Transaction aborted (error -5) BTRFS: error (device dm-3 state EA) in cleanup_transaction:2017: errno=-5 IO failure BTRFS warning (device dm-3 state EA): failed setting block group ro: -5 BTRFS info (device dm-3 state EA): scrub: not finished on devid 1 with status: -5 [CAUSE] The root cause is inside btrfs_inc_block_group_ro(), where we need to hold a transaction handle, to prevent the transaction to be committed, until we hold ro_block_group_mutex. This will cause an empty transaction by itself, thus even if we can mark the block group read-only without any extra workload, we still need to commit the new and empty transaction. Unfortunately this means RO scrub on RW filesystem will always cause the fs to be updated. [FIX] The best fix is to make btrfs to avoid empty commit transaction, but even with that done, read-only scrub on rw mount can still cause real metadata updates (e.g. allocate new chunks and update device error statistics). It will be very complex to make read-only scrub to be fully read-only on a read-write btrfs. Thankfully read-only scrub on read-write mount with read-only device in the storage stack is pretty rare, thus a documentation update should be enough. Issue: kdave#934 Signed-off-by: Qu Wenruo <[email protected]>

adam900710 closed this as completed Dec 21, 2024

adam900710 reopened this Dec 21, 2024

adam900710 mentioned this issue Dec 22, 2024

btrfs-progs: docs: extra notes about read-only scrub on read-write fs #935

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`btrfs scrub start -r` tries to write data unless mounted read-only #934

`btrfs scrub start -r` tries to write data unless mounted read-only #934

m0gg commented Dec 21, 2024

Forza-tng commented Dec 21, 2024

m0gg commented Dec 21, 2024 •

edited

Loading

Zygo commented Dec 21, 2024 •

edited

Loading

m0gg commented Dec 21, 2024

adam900710 commented Dec 21, 2024

m0gg commented Dec 21, 2024

adam900710 commented Dec 21, 2024

m0gg commented Dec 21, 2024 •

edited

Loading

adam900710 commented Dec 21, 2024 •

edited

Loading

btrfs scrub start -r tries to write data unless mounted read-only #934

btrfs scrub start -r tries to write data unless mounted read-only #934

Comments

m0gg commented Dec 21, 2024

Forza-tng commented Dec 21, 2024

m0gg commented Dec 21, 2024 • edited Loading

Zygo commented Dec 21, 2024 • edited Loading

m0gg commented Dec 21, 2024

adam900710 commented Dec 21, 2024

m0gg commented Dec 21, 2024

adam900710 commented Dec 21, 2024

m0gg commented Dec 21, 2024 • edited Loading

adam900710 commented Dec 21, 2024 • edited Loading

`btrfs scrub start -r` tries to write data unless mounted read-only #934

`btrfs scrub start -r` tries to write data unless mounted read-only #934

m0gg commented Dec 21, 2024 •

edited

Loading

Zygo commented Dec 21, 2024 •

edited

Loading

m0gg commented Dec 21, 2024 •

edited

Loading

adam900710 commented Dec 21, 2024 •

edited

Loading