Skip to content

Commit

Permalink
updated readme
Browse files Browse the repository at this point in the history
  • Loading branch information
tomfran committed Oct 27, 2023
1 parent 375c039 commit e5120a1
Show file tree
Hide file tree
Showing 6 changed files with 89 additions and 59 deletions.
97 changes: 62 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ An implementation of the Log-Structured Merge Tree (LSM tree) data structure in
1. [SSTable](#sstable-1)
2. [Skip-List](#skip-list-1)
3. [Tree](#tree-1)
5. [Implementation status](#Implementation-status)
5. [Possible future improvements](#possible-improvements)
6. [References](#references)

## Console

Expand All @@ -24,6 +25,9 @@ To interact with a toy tree you can use `./gradlew run -q` to spawn a console.

# Architecture

Architecture overview, from SSTables, which are the disk-resident portion of the database, Skip Lists, used
as memory buffers, and finally to the combination of the twos to create insertion, lookup and deletion primitives.

## SSTable

Sorted String Table (SSTable) is a collection of files modelling key-value pairs in sorted order by key.
Expand Down Expand Up @@ -112,23 +116,46 @@ the operation on the node. All of them have an average time complexity of `O(log

## Tree

...

### Components

...
Having defined SSTables and Skip Lists we can obtain the final structure as a combination of the two.
The main idea is to use the latter as an in-memory buffer, while the former efficiently stores flushed
buffers.

### Insertion

...
Each insert goes directly to a Memtable, which is a Skip List under the hood, so the response time is quite fast.
There exists a threshold, over which the mutable structure is made immutable by appending it to the _immmutable
memtables LIFO list_ and replaced with a new mutable list.

The immutable memtable list is asynchronously consumed by a background thread, which takes the next available
list and create a disk-resident SSTable with its content.

### Lookup

...
While looking for a key, we proceed as follows:

1. Look into the in-memory buffer, if the key is recently written it is likely here, if not present continue;
2. Look into the immutable memtables list, iterating from the most recent to the oldest, if not present continue;
3. Look into disk tables, iterating from the most recent one to the oldest, if not present return null.

### Write-ahead logging
### Deletions

...
To delete a key, we do not need to delete all its replicas, from the on-disk tables, we just need a special
value called _tombstone_. Hence a deletion is the same as an insertion, but with a value set to null. While looking for
a key, if we encounter a null value we simply return null as a result.

### SSTable Compaction

The most expensive operation while looking for a key is certainly the disk search, and this is why bloom filters are
crucial for negative
lookup on SSTables. But no bloom filter can save us if too many tables are available to search, hence we need
_compaction_.

When flushing a Memtable, we create an SSTable of level one. When the first level reaches a certain threshold,
all its tables are merged into a level-two table, and so on. This permits us to save storage and query fewer
tables in lookups.

Note that this style of compaction is not standard, there are various sophisticated techniques, but for the sake of
this project this simple level-like compaction works wonders.

---

Expand All @@ -137,8 +164,6 @@ the operation on the node. All of them have an average time complexity of `O(log
I am using [JMH](https://openjdk.java.net/projects/code-tools/jmh/) to run benchmarks,
the results are obtained on AMD Ryzen™ 5 4600H with 16GB of RAM and 512GB SSD.

> Take those with a grain of salt, development is still in progress.
To run them use `./gradlew jmh`.

## SSTable
Expand Down Expand Up @@ -193,26 +218,28 @@ c.t.l.tree.LSMTreeGetBenchmark.get thrpt 5 9426.951 ± 241

---

## Implementation status

- [x] SSTable
- [x] Init
- [x] Read
- [x] Compaction
- [x] Ints compression
- [x] Bloom filter
- [x] Indexes persistence
- [x] File initialization
- [x] Skip-List
- [x] Operations
- [x] Iterator
- [x] Tree
- [x] Operations
- [x] Background flush
- [x] Background compaction
- [ ] Write ahead log
- [x] Benchmarks
- [x] SSTable
- [x] Bloom filter
- [x] Skip-List
- [x] Tree
## Possible improvements

There is certainly space for improvement on this project:

1. Blocked bloom filters: its a variant of a classic array-like bloom filter which is more cache efficient;
2. Search fingers in the Skip list: the idea is to keep a pointer to the last search, and start from there with
subsequent queries;
3. Proper level compaction in the LSM tree;
4. Write ahead log for the insertions, without this, a crash makes all the in-memory writes disappear;
5. Proper recovery: handle crashes and reboots, using existing SSTables and the write-ahead log.

I don't have the practical time to do all of this, perhaps the first two points will be handled in the future.

---

## References

- [Database Internals](https://www.databass.dev/) by Alex Petrov, specifically chapters about Log-Structured Storage and
File Formats;
- [A Skip List Cookbook](https://api.drum.lib.umd.edu/server/api/core/bitstreams/17176ef8-8330-4a6c-8b75-4cd18c570bec/content)
by William Pugh.

---

_If you found this useful or interesting do not hesitate to ask clarifying questions or get in touch!_
2 changes: 1 addition & 1 deletion src/jmh/java/com/tomfran/lsm/tree/LSMTreeAddBenchmark.java
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ public void setup() throws IOException {
}

@TearDown
public void teardown() throws IOException, InterruptedException {
public void teardown() throws InterruptedException {
tree.stop();
Thread.sleep(5000);
BenchmarkUtils.deleteDir(DIR);
Expand Down
3 changes: 2 additions & 1 deletion src/jmh/java/com/tomfran/lsm/tree/LSMTreeGetBenchmark.java
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,9 @@ public void setup() throws IOException {
}

@TearDown
public void teardown() throws IOException {
public void teardown() throws InterruptedException {
tree.stop();
Thread.sleep(5000);
BenchmarkUtils.deleteDir(DIR);
}

Expand Down
3 changes: 1 addition & 2 deletions src/jmh/java/com/tomfran/lsm/utils/BenchmarkUtils.java
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
import com.tomfran.lsm.types.ByteArrayPair;

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Random;
Expand All @@ -13,7 +12,7 @@

public class BenchmarkUtils {

public static LSMTree initTree(Path dir, int memSize, int levelSize) throws IOException {
public static LSMTree initTree(Path dir, int memSize, int levelSize) {
// setup directory
if (Files.exists(dir))
deleteDir(dir);
Expand Down
41 changes: 22 additions & 19 deletions src/main/java/com/tomfran/lsm/Main.java
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ public class Main {

static final String DIRECTORY = "LSM-data";

public static void main(String[] args) {
public static void main(String[] args) throws InterruptedException {

if (new File(DIRECTORY).exists())
deleteDir();
Expand Down Expand Up @@ -48,25 +48,28 @@ public static void main(String[] args) {
System.out.print("> ");
String command = scanner.nextLine();

String[] parts = command.split(" ");

String msg;
switch (parts[0]) {
case "s", "set" -> {
tree.add(new ByteArrayPair(parts[1].getBytes(), parts[2].getBytes()));
System.out.println("ok");
}
case "d", "del" -> {
tree.delete(parts[1].getBytes());
System.out.println("ok");
}
case "g", "get" -> {
byte[] value = tree.get(parts[1].getBytes());
System.out.println((value == null || value.length == 0) ? "not found" : new String(value));
try {
String[] parts = command.split(" ");

switch (parts[0]) {
case "s", "set" -> {
tree.add(new ByteArrayPair(parts[1].getBytes(), parts[2].getBytes()));
System.out.println("ok");
}
case "d", "del" -> {
tree.delete(parts[1].getBytes());
System.out.println("ok");
}
case "g", "get" -> {
byte[] value = tree.get(parts[1].getBytes());
System.out.println((value == null || value.length == 0) ? "not found" : new String(value));
}
case "h", "help" -> System.out.println(help);
case "e", "exit" -> exit = true;
default -> System.out.println("Unknown command");
}
case "h", "help" -> System.out.println(help);
case "e", "exit" -> exit = true;
default -> System.out.println("Unknown command");
} catch (Exception e) {
System.out.printf("### error while executing command: \"%s\"\n", command);
}
}
tree.stop();
Expand Down
2 changes: 1 addition & 1 deletion src/main/java/com/tomfran/lsm/tree/LSMTree.java
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ public byte[] get(byte[] key) {
/**
* Stop the background threads.
*/
public void stop() {
public void stop() throws InterruptedException {
memtableFlusher.shutdownNow();
tableCompactor.shutdownNow();
}
Expand Down

0 comments on commit e5120a1

Please sign in to comment.