Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unixbench on ubuntu 23.10 Double-Precision Whetstone slow than Qemu #123

Open
myftptoyman opened this issue Feb 4, 2024 · 3 comments
Open
Labels
bug Something isn't working inefficiency Better implementation is desired

Comments

@myftptoyman
Copy link

myftptoyman commented Feb 4, 2024

I run the unixbench on RVMM and Qemu on Ubuntu 23.10

The test environment

EM780 mini pc, linux Ubuntu 23.10

RVVM: master branch: @21433cca22d63a748b3c6d0b1cfbcb10c307badf
Qemu: ubuntu apt install qemu

RVMM:

release.linux.x86_64/rvvm_x86_64 -mem 8G -smp 8 -nogui fw_jump.bin -k u-boot.bin -i ubuntu-23.10-preinstalled-server-riscv64.img -jitcache 1G

`------------------------------------------------------------------------
Benchmark Run: Sun Feb 04 2024 02:45:51 - 03:13:50
0 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables 10744333.0 lps (10.0 s, 7 samples)
Double-Precision Whetstone 464.9 MWIPS (10.0 s, 7 samples)
Execl Throughput 276.0 lps (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 101814.3 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 26649.3 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 308874.0 KBps (30.0 s, 2 samples)
Pipe Throughput 143627.2 lps (10.0 s, 7 samples)
Pipe-based Context Switching 5810.2 lps (10.0 s, 7 samples)
Process Creation 308.9 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 751.1 lpm (60.1 s, 2 samples)
Shell Scripts (8 concurrent) 318.5 lpm (60.1 s, 2 samples)
System Call Overhead 325415.9 lps (10.0 s, 7 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 10744333.0 920.7
Double-Precision Whetstone 55.0 464.9 84.5
Execl Throughput 43.0 276.0 64.2
File Copy 1024 bufsize 2000 maxblocks 3960.0 101814.3 257.1
File Copy 256 bufsize 500 maxblocks 1655.0 26649.3 161.0
File Copy 4096 bufsize 8000 maxblocks 5800.0 308874.0 532.5
Pipe Throughput 12440.0 143627.2 115.5
Pipe-based Context Switching 4000.0 5810.2 14.5
Process Creation 126.0 308.9 24.5
Shell Scripts (1 concurrent) 42.4 751.1 177.1
Shell Scripts (8 concurrent) 6.0 318.5 530.9
System Call Overhead 15000.0 325415.9 216.9
========
System Benchmarks Index Score 145.8
`
Qemu

qemu-system-riscv64 -machine virt -nographic -m 8192 -smp 8 -bios /usr/lib/riscv64-linux-gnu/opensbi/generic/fw_jump.bin -kernel /usr/lib/u-boot/qemu-riscv64_smode/uboot.elf -device virtio-net-device,netdev=eth0 -netdev user,id=eth0 -device virtio-rng-pci -drive file=ubuntu-23.10-preinstalled-server-riscv64.img,format=raw,if=virtio

`------------------------------------------------------------------------
Benchmark Run: Sun Feb 04 2024 03:19:31 - 03:47:57
0 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables 5036880.1 lps (10.1 s, 7 samples)
Double-Precision Whetstone 924.7 MWIPS (10.3 s, 7 samples)
Execl Throughput 56.5 lps (29.7 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 21528.0 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 5470.1 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 77479.0 KBps (30.0 s, 2 samples)
Pipe Throughput 31337.4 lps (10.1 s, 7 samples)
Pipe-based Context Switching 6855.0 lps (10.1 s, 7 samples)
Process Creation 371.8 lps (30.1 s, 2 samples)
Shell Scripts (1 concurrent) 258.7 lpm (60.2 s, 2 samples)
Shell Scripts (8 concurrent) 109.6 lpm (60.2 s, 2 samples)
System Call Overhead 208745.2 lps (10.1 s, 7 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 5036880.1 431.6
Double-Precision Whetstone 55.0 924.7 168.1
Execl Throughput 43.0 56.5 13.1
File Copy 1024 bufsize 2000 maxblocks 3960.0 21528.0 54.4
File Copy 256 bufsize 500 maxblocks 1655.0 5470.1 33.1
File Copy 4096 bufsize 8000 maxblocks 5800.0 77479.0 133.6
Pipe Throughput 12440.0 31337.4 25.2
Pipe-based Context Switching 4000.0 6855.0 17.1
Process Creation 126.0 371.8 29.5
Shell Scripts (1 concurrent) 42.4 258.7 61.0
Shell Scripts (8 concurrent) 6.0 109.6 182.6
System Call Overhead 15000.0 208745.2 139.2
========
System Benchmarks Index Score 63.8
`

Double-Precision Whetstone is slow than Qemu twice.
Others result is faster. it is good for replace Qemu.

@myftptoyman myftptoyman added the bug Something isn't working label Feb 4, 2024
@LekKit
Copy link
Owner

LekKit commented Feb 4, 2024

Most FPU operations are non-JITed currently and fallback to an interpreter (Which however uses native FPU ops) whereas in QEMU it just emits calls to soft FPU implementations in translated code; So basically it's a known lacking optimization which manifests to different degree in different software.

@LekKit LekKit added the inefficiency Better implementation is desired label Feb 4, 2024
@LekKit
Copy link
Owner

LekKit commented Mar 27, 2024

The staging (v0.7-git) branch should have somewhat faster FPU. Reports 1000 MIPS on my machine.

@LekKit
Copy link
Owner

LekKit commented Mar 27, 2024

FPU JIT will be in the works either in this version or 0.8, and should outperform QEMU in all cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working inefficiency Better implementation is desired
Projects
None yet
Development

No branches or pull requests

2 participants