- Test driven development. Relevant etcd raft tests have been ported to dragonboat to ensure all corner cases identified by the etcd project have been handled.
- High test coverage. Extensively tested by unit testing and monkey testing.
- Linearizability checkers. Jepsen's Knossos and porcupine are utilized to check whether IOs are linearizable.
- Fuzz testing using go-fuzz.
- I/O error injection tests. charybdefs from scylladb is employed to inject I/O errors to the underlying file-system to ensure that Dragonboat handle them correctly.
- Power loss tests. We test the system to see what actually happens after power loss.
- 5 NodeNosts and 3 Drummer servers per process
- hundreds of Raft clusters per process
- randomly kill and restart NodeHosts and Drummer servers, each NodeHost usually stay online for a few minutes
- randomly delete all data owned by a certain NodeHost to emulate permanent disk failure
- randomly drop and re-order messages exchanged between NodeHosts
- randomly partition NodeHosts from rest of the network
- for selected instances, snapshotting and log compaction happen all the time in the background
- committed entries are applied with random delays
- snapshots are captured and applied with random delays
- a list of background workers keep writing to/reading from random Raft clusters with stale read checks
- client activity history files are verified by linearizability checkers such as Jepsen's Knossos
- run hundreds of above described processes concurrently on each test server, 30 minutes each iteration, many iterations every night
- run concurrently on many servers every night
- no linearizability violation
- no cluster is permanently stuck
- state machines must be in sync
- cluster membership must be consistent
- raft log saved in LogDB must be consistent
- no zombie cluster node
Some history files in Jepsen's Knossos edn format have been made publicly available.
- Three servers each with a single 22-core Intel XEON E5-2696v4 processor, all cores can boost to 2.8Ghz
- 40GE Mellanox NIC
- Intel 900P for storing the RocksDB's WAL and Intel P3700 1.6T for storing all other data
- Ubuntu 16.04 with Spectre and Meltdown patches, ext4 file-system
- 48 Raft clusters on three NodeHost instances across three servers
- Each Raft node is backed by a in-memory Key-Value data store as RSM
- Mostly update operations in the Key-Value store
- All I/O requests are launched from local processes
- Each request is handled in its own goroutine, simple threading model & easy for application debugging
- fsync is strictly honored
- MutualTLS is disabled
Compared with enterprise NVME SSDs such as Intel P3700, Optane based SSD doesn't increase throughput when payload is 16/128 bytes. It does slightly increase the throughput when the payload size is 1024 byte each. It also improves write latency when the payload size is 1024.