Skip to content

jeremychan/thesis_erasure_coding_storage

Repository files navigation

Many modern storage systems adopt erasure coding to provide data availability guarantees with low redundancy. Log-based storage is often used to append new data rather than overwrite existing data to achieve high update efficiency, but introduces significant \mbox{I/O} overhead during recovery due to reassembling updates from data and parity chunks. We propose parity logging with reserved space, which comprises two key design features: (1) it takes a hybrid of in-place data updates and log-based parity updates to balance the costs of updates and recovery, and (2) it keeps parity updates in a reserved space next to the parity chunk to mitigate disk seeks. We further propose a workload-aware scheme to dynamically predict and adjust the reserved space size. We prototype an erasure-coded clustered storage system called CodFS, and conduct experiments on different update schemes under synthetic and real-world workloads. We show that our proposed update scheme achieves high update and recovery performance, which cannot be simultaneously achieved by pure in-place or log-based update schemes.

In addition to clustered storage, we also study the issues introduced by small random writes to SSD RAID. In the second part of this thesis, we propose TWEEN, a middleware application that aims toward write efficiency and endurance of SSD RAID. It comprises two key design features: (1) it combines the log-structured file system (LFS) and byte-addressable non-volatile RAM (NVRAM) to eliminate partial writes at both SSD and RAID levels, and (2) it mitigates the garbage collection overhead in LFS by grouping writes with similar update frequencies and absorbing LFS garbage collection writes in NVRAM.

About

MPhil Thesis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages