Source code for FAST '23 paper: MadFS: Per-File Virtualization for Userspace Persistent Memory Filesystems by Shawn Zhong*, Chenhao Ye*, Guanzhou Hu, Suyan Qu, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, and Michael Swift. (*equal contribution.) FAST '23. Paper. Video. Slides. Code.
Abstract
Persistent memory (PM) can be accessed directly from userspace without kernel involvement, but most PM filesystems still perform metadata operations in the kernel for security and rely on the kernel for cross-process synchronization.
We present per-file virtualization, where a virtualization layer implements a complete set of file functionalities, including metadata management, crash consistency, and concurrency control, in userspace. We observe that not all file metadata need to be maintained by the kernel and propose embedding insensitive metadata into the file for userspace management. For crash consistency, copy-on-write (CoW) benefits from the embedding of the block mapping since the mapping can be efficiently updated without kernel involvement. For cross-process synchronization, we introduce lock-free optimistic concurrency control (OCC) at user level, which tolerates process crashes and provides better scalability.
Based on per-file virtualization, we implement MadFS, a library PM filesystem that maintains the embedded metadata as a compact log. Experimental results show that on concurrent workloads, MadFS achieves up to 3.6x the throughput of ext4-DAX. For real-world applications, MadFS provides up to 48% speedup for YCSB on LevelDB and 85% for TPC-C on SQLite compared to NOVA.
BibTex
@inproceedings {285756,
author = {Shawn Zhong and Chenhao Ye and Guanzhou Hu and Suyan Qu and Andrea Arpaci-Dusseau and Remzi Arpaci-Dusseau and Michael Swift},
title = {{MadFS}: {Per-File} Virtualization for Userspace Persistent Memory Filesystems},
booktitle = {21st USENIX Conference on File and Storage Technologies (FAST 23)},
year = {2023},
isbn = {978-1-939133-32-8},
address = {Santa Clara, CA},
pages = {265--280},
url = {https://www.usenix.org/conference/fast23/presentation/zhong},
publisher = {USENIX Association},
month = feb,
}
-
MadFS is developed on Ubuntu 20.04.3 LTS and Ubuntu 22.04.1 LTS. It should work on other Linux distributions as well.
-
MadFS requires a C++ compiler with C++ 20 support. The compilers known to work includes GCC 11.3.0, GCC 10.3.0, Clang 14.0.0, and Clang 10.0.0.
-
Install dependencies and configure the system
-
Install build dependencies
sudo apt update sudo apt install -y cmake build-essential gcc-10 g++-10
-
Install development dependencies (optional)
# to run sanitizers and formatter sudo apt install -y clang-10 libstdc++-10-dev clang-format-10 # for perf sudo apt install -y linux-tools-common linux-tools-generic linux-tools-`uname -r` # for managing persistent memory and NUMA sudo apt install -y ndctl numactl # for benchmarking sudo apt install -y sqlite3
-
Configure the system
./scripts/init.py
-
-
Configure persistent memory
-
To emulate a persistent memory device using DRAM, please follow the guide here.
-
Initialize namespaces (optional)
# remove existing namespaces on region0 sudo ndctl destroy-namespace all --region=region0 --force # create new namespace `/dev/pmem0` on region0 sudo ndctl create-namespace --region=region0 --size=20G # create new namespace `/dev/pmem0.1` on region0 for NOVA (optional) sudo ndctl create-namespace --region=region0 --size=20G # list all namespaces ndctl list --region=0 --namespaces --human --idle
-
Use
/dev/pmem0
to mount ext4-DAX at/mnt/pmem0-ext4-dax
# create filesystem sudo mkfs.ext4 /dev/pmem0 # create mount point sudo mkdir -p /mnt/pmem0-ext4-dax # mount filesystem sudo mount -o dax /dev/pmem0 /mnt/pmem0-ext4-dax # make the mount point writable sudo chmod a+w /mnt/pmem0-ext4-dax # check mount status mount -v | grep /mnt/pmem0-ext4-dax
-
Use
/dev/pmem0.1
to mount NOVA at/mnt/pmem0-nova
(optional)# load NOVA module sudo modprobe nova # create mount point sudo mkdir -p /mnt/pmem0-nova # mount filesystem sudo mount -t NOVA -o init -o data_cow /dev/pmem0.1 /mnt/pmem0-nova # make the mount point writable sudo chmod a+w /mnt/pmem0-nova # check mount status mount -v | grep /mnt/pmem0-nova
-
To unmount the filesystems, run
sudo umount /mnt/pmem0-ext4-dax sudo umount /mnt/pmem0-nova
-
-
Build the MadFS shared library
# Usage: make [release|debug|relwithdebinfo|profile|pmemcheck|asan|ubsan|msan|tsan] # [CMAKE_ARGS="-DKEY1=VAL1 -DKEY2=VAL2 ..."] make BUILD_TARGETS="madfs"
-
Run your program with MadFS
LD_PRELOAD=./build-release/libmadfs.so ./your_program
Sample output
BuildOptions: build type: name: release debug: 0 use_pmemcheck: 0 hardware support: clwb: 1 clflushopt: 1 avx512f: 1 features: map_sync: 1 map_populate: 1 tx_flush_only_fsync: 1 enable_timer: 0 concurrency control: cc_occ: 1 cc_mutex: 0 cc_spinlock: 0 cc_rwlock: 0 RuntimeOptions: show_config: 1 strict_offset_serial: 0 log_file: None log_level: 1 # Your program output here MadFS unloaded
-
Run tests
./scripts/run.py [test_basic|test_rc|test_sync|test_gc] # See `./scripts/run.py --help` for more options
-
Run and plot single-threaded benchmarks
./scripts/bench_st.py --filter="seq_pread" ./scripts/bench_st.py --filter="rnd_pread" ./scripts/bench_st.py --filter="seq_pwrite" ./scripts/bench_st.py --filter="rnd_pwrite" ./scripts/bench_st.py --filter="cow" ./scripts/bench_st.py --filter="append_pwrite" # Limit to set of file systems ./scripts/bench_st.py -f MadFS SplitFS # Profile a data point ./scripts/bench_st.py --filter="seq_pread/512" -f MadFS -b profile # See `./scripts/bench_st.py` --help for more options
-
Run and plot multi-threaded benchmarks
./scripts/bench_mt.py --filter="unif_0R" ./scripts/bench_mt.py --filter="unif_50R" ./scripts/bench_mt.py --filter="unif_95R" ./scripts/bench_mt.py --filter="unif_100R" ./scripts/bench_mt.py --filter="zipf_2k" ./scripts/bench_mt.py --filter="zipf_4k"
-
Run and plot metadata benchmarks
./scripts/bench_open.py ./scripts/bench_gc.py
-
Run and plot macrobenchmarks (SQLite and LevelDB)
./scripts/bench_tpcc.py ./scripts/bench_ycsb.py
-
src/
: Source code for the MadFS shared library-
src/lib/
: Main entry point of the library -
src/file/
: Implementation of file operations -
src/tx/
: Implementation of I/O transactions -
src/alloc/
: Allocation of blocks and log entries -
src/block/
: Implementation of different block types -
src/cursor/
: An iterator-like interface for reading log -
src/block/
: Implementation of different block types -
src/utils/
: Utility functions
-
-
scripts/
: Scripts for building, running, and plotting benchmarks -
bench/
: Source code for benchmarks -
test/
: Source code for tests -
tools/
: Source code for tools (e.g., gc, conversion, info) -
cmake/
: CMake modules -
data/
: Data files for benchmarks
If you have any questions, feel free to open an issue or contact Shawn Zhong ([email protected]) and Chenhao Ye ([email protected]). We are also happy to accept pull requests.