Skip to content

Latest commit

 

History

History
306 lines (272 loc) · 15.6 KB

backend.org

File metadata and controls

306 lines (272 loc) · 15.6 KB

Backend implementation guide

Introduction

Konserve provides a minimalistic user interface that can be understood as a lense into key-value stores. It’s main purpose is portability to as many backends as possible. This document is describing how to implement them. Feel free to ignore parts of this document to get something running quickly, but be aware that we can only support backends that adhere to this document.

This document assumes the following context:

Necessary preconditions

  • understanding of the backend store you want to integrate
  • basic understanding of core.async

Provided postconditions

  • understanding of the protocols
  • understanding of how to do error handling
  • understanding of how to handle migrations

Target audience

Clojure developers with the need to durably store data with a flexible interface, e.g. users of Datahike or replikativ or developers who want to retain portability of their storage backend in general.

Related work

Datomic is providing a set of storage backends for its persistent indices. There are also libraries for JavaScript that provide a similar interface, but without the lense-like interface and to our knowledge all other libraries are not cross-platform between JVM and JavaScript.

Memory model

Konserve tracks contextual information in a binary header that is sufficient to determine how to access a value, it carries the version of the header format, the serialization context and the size of the metadata.

ACID

Konserve is providing ACID guarantees for each key-value pair stored in it.

Atomicity

Your write operation must either completely succeed or fail and cleanly reset on any error condition. In other words the store must never be corrupted by partial write operations! Typically this is provided by the underlying store, but you might need to first write a new value and then do an atomic swap, as is done in the filestore with an atomic file renaming. The metadata update and the value update need to be updated in the same atomic transaction.

Consistency

The underlying store should provide consistent views on the data. This is typically not a property you have to worry about, but it is a good idea to point your users to the consistency guarantees provided by the underlying store. A reasonable backend must at least provide read-committed semantics.

Isolation

You are guaranteed by the go-locked macro to have no concurrent state mutations on individual keys. This locking mechanism only holds inside a memory context of the current process. If you expect e.g. multiple JVM processes/machines to operate on one backend you have to make sure that write operations are properly locked between them. Often this can be provided by the underlying distributed storage.

Durability

You must have written all data when the go channel or synchronous function yields a value, i.e. everything needs to be transmitted to the backend. This guarantee depends on the guarantees of the backend and you are supposed clearly document this for your users. It is a good idea to provide configuration options for the backend if you see fit. The default store provides a flip to turn off blob-sync‘ing for instance, in which case a few write operations can be lost on a crash.

Implementations

User facing protocols

The boundary between the user facing API’s of konserve and the backends are defined in protocols.cljc The protocols PEDNKeyValueStore, PBinaryKeyValueStore and PKeyIterable are the protocols that need to be implemented by each backend. Each protocol method is provided with a map of options. One of these options is whether the code should be executed synchronously or asynchronously. In the asynchronous case the implementation is supposed to return a go channel (or throw an unsupported operation exception). Only the operations -update-in, -assoc-in, -dissoc and -bassoc can mutate the state of the durable medium. They need to ensure that the tracked metadata which can be accessed through get-meta is written consistently with the value. Fortunately you do not need to implement the high-level protocols, but in most cases it should be sufficient to implement our backing store protocols (see subsection bellow).

Semantics

The semantics of each implementation is supposed to follow the memory semantics of the corresponding Clojure functions get-in, update-in etc. The mapping can be conveniently studied in the memory store.

Backing store protocols

./figures/state_machine.png

The figure illustrates the different paths that are taken by read or update operations. io-operation, read-blob and update-blob are functions in the default store namespace while each phase dispatches internally on a context describing the top-level io operation, e.g. whether reading or writing is necessary. This explicit contextual environment is not strictly necessary, but reduces duplication of code. The default store uses core.async for its asynchronous implementation internally.

While the user facing protocols capture the intent and provide the biggest freedom of an implementation, almost all implementations follow the same pattern and share the logic between header handling, serialization, locking (see figure above). For this reason we have abstracted the low-level interfaces PBackingStore, PBackingBlob and PBackingLock from the filestore into the storage-layout namespace.

To implement a new backend you only need to provide these protocols and can focus only on the interface of loading and storing byte array values atomically and safely. In particular a combined write operation of -write-header, -write-meta, -write-value=/=write-binary must be visible either as a whole or not at all. Read operations must only fetch what is needed. You should fetch the needed combination of header, metadata and/or value=/=binary ahead-of-time (AOT) in one read by dispatching on the contextual :operation in the provided env argument if you cannot cheaply (locally) seek on the open underlying blob handle (such as an AsynchronousFileChannel in the file store). Look at the read-blob dispatch on operation in the default implementation to see which context maps to which reads.

You can acquire a lock (PBackingLock) with -get-lock to span a transactional context over the low-level store. In case you cannot guarantee transactional safety over the read and write operations on a blob you have to declare that your store cannot be used from multiple writing processes/instances at the same time and set lock-blob? to false. Locking is then automatically provided by the user-facing API in the core namespace with the go-locked macro.

The sync and sync-store operations are optional in case durability can be optionally enforced this way on the underlying store (e.g. fsync for the filestore). Key objects for your store are created through -path with a string representation for the key (store-key). This is what will be passed to your read and write functions. In the simplest case you can just use the string itself if your implementation does not require the construction of a custom path.

To provide unified code for both the synchronous and asynchronous variant of the code we provide the async+sync macro in the utils namespace. It expands both variants of the code and replaces the core.async operations go and <! (like async & await) with the synchronous no-ops (do). Feel free to use it as well, but be aware that it is not yet a hygienic macro transformation (you can get a conflict with it if you use the translated `=go…=` and `=<..=` names as your variable names (which is unlikely, but possible). It is also totally fine for implementations to duplicate code between synchronous and asynchronous operations if this is easier to manage than using this macro. We are in general interest in extending the abilities of this cross-platform, cross-execution model meta-programming in the future.

In summary, a simple (starting) implementation can rely on konserve’s high-level locking (-get-lock can just return nil). It is reading values selectively according to the protocol invoked and makes sure that the combined write operation is atomic. In this case :in-place? mode is always active and atomic-move can therefore be a no-op. Copying (for backups) with copy is also not needed if the underlying store provides atomic writes. sync is redundant in this case as well and can be a no-op.

Finally, if the protocols do not work for your implementation, please reach out to us and we will try to fix extend them accordingly if possible. Having more backing store implementations is very much in our interest.

Error handling

All internal errors must be caught and returned as a throwable object on the return channel following this simple pattern. We provide similar helper macros in utils.cljc. In the future we plan to add the ability to also pass in an error callback to directly call on errors.

Blocking IO

Be aware that you must not use blocking IO operations in go-routines. The easiest solution is to spawn threads with clojure.core.async/thread, but ideally you should provide synchronous and asynchronous IO operations to provide maximum scalability. In particular the usage of <!! and >!! is not allowed since it will deadlock core.async.

Serializers

Konserve provides the protocol PStoreSerializer with a -serialize and -deserialize method. You need to use these methods to serialize the edn values, as they are supposed to be configurable by the user. This protocol is also used by the encryptor s and compressor s. The serializer-compressor-encryptor combination is configurable and stored in the header for each value such that store reconfiguration can happen without downtime and migration. This also allows to bulk copy (or cache) values from another store locally without changing their representation (you only pay the IO cost upfront). The current store configuration will be used whenever they are updated or overwritten though.

Metadata

Konserve internally uses metadata that is kept separate from the operations on the values. This metadata is used to track the time when values are written for example, a functionality that is needed by the concurrent garbage collector. The metadata itself is an edn map. There are no size limitations of metadata fixed in the protocol, if you cannot implement your backend without a size limit then please document it. Metadata can also be used by third parties to track important contextual information such as user access control rights, or, maybe even edit histories, and therefore supporting at least a megabyte in size would be future proof. If for some very unfortunate reason you have to allocate a fixed size for each metadata element then for now at least 16 KiB must be supported. The get-meta protocol reads only the metadata part of the respective key and therefore should not load or fetch the actual value. This functionality is used by the garbage collector and potentially other monitoring processes that otherwise would read all data regularly.
  1. 20 bytes for metadata size
  2. serialized metadata
  3. serialized or binary value

Storing the metadata size is necessary to allow to read only the metadata or value (and also to skip it) in stores that have to use one big blog and seek in it (such as the filestore). You can store the metadata also separately if your store allows atomic transactions over both objects, e.g. using two columns in a SQL database.

Migration

Sometimes the chosen internal representation of a store turns out to be insufficient as it was for the addition of metadata support as described in this document. In this unfortunate situation a migration of the existing durable data becomes necessary. Migrations have the following requirements:
  1. They must not lose data, including on concurrent garbage collection.
  2. They should work without user intervention.
  3. They should work incrementally, upgrade each key on access, allowing version upgrades in production.
  4. They can break old library versions running on the same store.

To determine the version of an old key we cannot read it since we do not know its version… the version is therefore stored in the first byte of the header and will allow you to read old values in case the binary layout of konserve or your implementation has to change.

Tests

TODO We provide a compliance-test suite that your store has to satisfy to be compliant and are working on a low-level compliance test suite for the default store.

Work in progress

  • monitoring, e.g. of cache sizes, migration processes, performance …
  • benchmarks
  • document test suite