Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blktests zbd/009 failure #150

Open
yizhanglinux opened this issue Nov 19, 2024 · 6 comments
Open

blktests zbd/009 failure #150

yizhanglinux opened this issue Nov 19, 2024 · 6 comments

Comments

@yizhanglinux
Copy link
Contributor

Recently I found zbd/009 always failed, and after reverting commit[1], the test can pass.
The tests failed with "No space left on device", but from df -h, there still has much space on the disk, could you help check it when you have a chance, thanks.

[1]
951ad82

[2]

zbd/009 (test gap zone support with BTRFS)                   [failed]
    runtime  9.681s  ...  9.705s
    --- tests/zbd/009.out	2024-11-19 02:15:02.202488258 +0000
    +++ /root/blktests/results/nodev/zbd/009.out.bad	2024-11-19 05:59:26.815119782 +0000
    @@ -1,2 +1,2 @@
     Running zbd/009
    -Test complete
    +Test failed

# cat results/nodev/zbd/009.full
btrfs-progs v6.11
See https://btrfs.readthedocs.io for more information.

Resetting device zones /dev/sda (256 zones) ...
NOTE: several default settings have changed in version 5.15, please make sure
      this does not affect your deployments:
      - DUP for metadata (-m dup)
      - enabled no-holes (-O no-holes)
      - enabled free-space-tree (-R free-space-tree)

Label:              (null)
UUID:               88a4aefd-8be9-4e0c-9ee2-ad1a3f20bee6
Node size:          16384
Sector size:        4096	(CPU page size: 4096)
Filesystem size:    1.00GiB
Block group profiles:
  Data:             single            4.00MiB
  Metadata:         DUP               4.00MiB
  System:           DUP               4.00MiB
SSD detected:       yes
Zoned device:       yes
  Zone size:        4.00MiB
Features:           extref, skinny-metadata, no-holes, free-space-tree, zoned
Checksum:           crc32c
Number of devices:  1
Devices:
   ID        SIZE  ZONES  PATH
    1     1.00GiB    256  /dev/sda

fio: io_u error on file /root/blktests/results/tmpdir.zbd.009.vr4/mnt/verify.0.0: No space left on device: write offset=901120, buflen=4096
fio: io_u error on file /root/blktests/results/tmpdir.zbd.009.vr4/mnt/verify.0.0: No space left on device: write offset=860160, buflen=4096
fio: io_u error on file /root/blktests/results/tmpdir.zbd.009.vr4/mnt/verify.0.0: No space left on device: write offset=425984, buflen=4096
fio: io_u error on file /root/blktests/results/tmpdir.zbd.009.vr4/mnt/verify.0.0: No space left on device: write offset=131072, buflen=4096
fio: io_u error on file /root/blktests/results/tmpdir.zbd.009.vr4/mnt/verify.0.0: No space left on device: write offset=61440, buflen=4096
fio: io_u error on file /root/blktests/results/tmpdir.zbd.009.vr4/mnt/verify.0.0: No space left on device: write offset=331776, buflen=4096
fio: io_u error on file /root/blktests/results/tmpdir.zbd.009.vr4/mnt/verify.0.0: No space left on device: write offset=77824, buflen=4096
fio: io_u error on file /root/blktests/results/tmpdir.zbd.009.vr4/mnt/verify.0.0: No space left on device: write offset=163840, buflen=4096
fio: io_u error on file /root/blktests/results/tmpdir.zbd.009.vr4/mnt/verify.0.0: No space left on device: write offset=69632, buflen=4096
fio: io_u error on file /root/blktests/results/tmpdir.zbd.009.vr4/mnt/verify.0.0: No space left on device: write offset=999424, buflen=4096
fio: io_u error on file /root/blktests/results/tmpdir.zbd.009.vr4/mnt/verify.0.0: No space left on device: write offset=548864, buflen=4096
fio: io_u error on file /root/blktests/results/tmpdir.zbd.009.vr4/mnt/verify.0.0: No space left on device: write offset=278528, buflen=4096
fio: io_u error on file /root/blktests/results/tmpdir.zbd.009.vr4/mnt/verify.0.0: No space left on device: write offset=770048, buflen=4096
fio: io_u error on file /root/blktests/results/tmpdir.zbd.009.vr4/mnt/verify.0.0: No space left on device: write offset=311296, buflen=4096
fio: io_u error on file /root/blktests/results/tmpdir.zbd.009.vr4/mnt/verify.0.0: No space left on device: write offset=327680, buflen=4096
fio: io_u error on file /root/blktests/results/tmpdir.zbd.009.vr4/mnt/verify.0.0: No space left on device: write offset=1003520, buflen=4096
fio exited with status 1
fio: verification read phase will never start because write phase uses all of runtime
4;fio-3.37;verify;0;28;715776;353819;88454;2023;3;531;9.074568;5.894974;1;1885;164.221348;60.053806;1.000000%=28;5.000000%=87;10.000000%=99;20.000000%=116;30.000000%=136;40.000000%=158;50.000000%=183;60.000000%=191;70.000000%=191;80.000000%=193;90.000000%=199;95.000000%=207;99.000000%=305;99.500000%=452;99.900000%=815;99.950000%=897;99.990000%=1335;0%=0;0%=0;0%=0;12;1889;173.295916;61.492700;0;0;0.000000%;0.000000;0.000000;716160;120140;30037;5961;5;11722;26.838560;62.466622;0;12975;486.421405;275.659557;1.000000%=87;5.000000%=175;10.000000%=218;20.000000%=309;30.000000%=374;40.000000%=403;50.000000%=456;60.000000%=514;70.000000%=552;80.000000%=610;90.000000%=741;95.000000%=905;99.000000%=1302;99.500000%=1531;99.900000%=2244;99.950000%=2539;99.990000%=8716;0%=0;0%=0;0%=0;18;12982;513.208844;280.993445;71152;93453;74.503340%;89508.312500;5379.433821;0;0;0;0;0;0;0.000000;0.000000;0;0;0.000000;0.000000;1.000000%=0;5.000000%=0;10.000000%=0;20.000000%=0;30.000000%=0;40.000000%=0;50.000000%=0;60.000000%=0;70.000000%=0;80.000000%=0;90.000000%=0;95.000000%=0;99.000000%=0;99.500000%=0;99.900000%=0;99.950000%=0;99.990000%=0;0%=0;0%=0;0%=0;0;0;0.000000;0.000000;0;0;0.000000%;0.000000;0.000000;8.401152%;33.078753%;248378;0;21;0.4%;0.8%;1.6%;3.1%;94.1%;0.0%;0.0%;0.18%;0.04%;0.01%;0.18%;0.69%;4.46%;49.69%;23.09%;16.71%;3.18%;1.69%;0.06%;0.01%;0.01%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%

# df -h
Filesystem                                            Size  Used Avail Use% Mounted on
/dev/sda                                              1.0G  7.1M  982M   1% /root/blktests/results/tmpdir.zbd.009.pAG/mnt
@kawasaki
Copy link
Collaborator

@yizhanglinux Thanks for the report. This failure is interesting. I've not ever seen this failure, and the symptom looks weird.

Question, when you revert 951ad82, do you see the fio io_u errors in the full file? I guess the errors could be reported in the full file regardless of the revert. It this guess is correct, the revert just hides the failure.

For further debug, more detailed fio report is required. Could you apply the change below to the common/fio file then run zbd/009? With this change, the full file will record detailed fio debug log.

diff --git a/common/fio b/common/fio
index b9ea087..f7a0c41 100644
--- a/common/fio
+++ b/common/fio
@@ -174,7 +174,7 @@ _fio_perf() {
 # passed --runtime will override the configured $TIMEOUT, which is useful for
 # tests that should run for a specific amount of time.
 _run_fio() {
-       local args=("--output=$TMPDIR/fio_perf" "--output-format=terse" "--terse-version=4" "--group_reporting=1")
+       local args=("--group_reporting=1" "--debug=io,verify")
 
        if [[ "${TIMEOUT:-}" ]]; then
                args+=("--runtime=$TIMEOUT")

@yizhanglinux
Copy link
Contributor Author

yizhanglinux commented Nov 20, 2024

Yes, the "fio: io_u error" can be seen in the full file after the revert.

Since the file 009.full is so large, I attached the file which contains the last 1000 lines of 009.full, please help check it.

# du -sh results/nodev/zbd/009.full
254M	results/nodev/zbd/009.full

009.txt

@yizhanglinux
Copy link
Contributor Author

Addd the CKI tracking issue:
https://datawarehouse.cki-project.org/issue/3257

@kawasaki
Copy link
Collaborator

@yizhanglinux Thanks for sharing the fio log. TL;DR, I think this failure is a known issue, and fixes delivery is planned.

I noticed that this ENOSPC looks like the known issues that @naota is chasing. It is known that zoned-btrfs causes ENOSPC when write speed is faster than reclaim speed. Based on this understanding, I tried to recreate the failure by 1) disabling kernel debug options to speed up writes and 2) extending fio runtime to increase the reclaim size, then succeeded to recreate the failure.

Recently @naota posted a fix patch. I tried this patch. It avoided the failure under some conditions, but still ENOSPC failures are observed with longer fio runtime. He will post some ENOSPC fix patches, and I expect that they will avoid the failure of this test case.

@yizhanglinux
Copy link
Contributor Author

Good to know it, thanks for the update.

@zhijianli88
Copy link
Contributor

/cc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants