You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The goal of this post is to discuss how data belonging to individual datasets can be (get) placed in the system, how this could be modeled, and what are the compromises around this.
At the abstract level, each dataset uploaded in the system is stored on a well defined set of servers. The contribution of individual servers can be different from system to system: in the simplest form it could be the storage of a copy of the whole file, but it can also be the storage of a continuous piece of the file, the storage of some segments of the file, the storage of some parity blocks, or, in the most generic definition, the storage of information that can be used to retrieve some parts of the file through a coding scheme. Storage can also change over time, as some systems use lazy encoding starting first with replication and slowly replacing it with erasure coding based redundancy.
Mapping of datasets to nodes
Whichever the system design, we can define a mapping between datasets and nodes (at any given time) based on whether a node is contributing to the storage of that particular dataset in any way. This mapping, and its properties such as how many nodes are storing a file, how clustered is the map, etc. are important system properties.
The contribution of individual nodes to the storage of the whole dataset can also be of several types, which is mainly determined by the underlying segmentation and logical structuring of the dataset.
Mapping in detail
datasets to segments
Datasets are often handled in independent segments. Independent in the sense that there is no relation between the redundancy protection applied to individual segments, i.e. information stored related to a segment has no relevance for the protection of another segment.
The mapping from datasets to segments is often seen as being outside the core functionality of the system. Such is the case of [Luby2019] that differentiates between user object (datasets) and storage objects (segments).
More generally speaking, when such segmentation is used, it is also important to differentiate whether placement of segments of the same file are related or independent. In a more "traditional" model, segments are always spread across the same set of nodes. It is however also possible to have independent placement, or, e.g. to make sure placement is non-overlapping.
segments to pieces
The protection of a segment itself is based on spreading it (using replication or erasure coding) over nodes, each node storing a piece. In most systems these pieces are generated using erasure coding based on the code symbols of the codeword, effectively spreading each codeword over the set of nodes. However, segments are larger than codewords, for example segments might be 64MB, while a codeword of a typical Reed Solomon code using 8-bit symbols and k=20, n=40 is 40 bytes. How symbols of codewords are "concatenated" into pieces should also be defined. The typical way is to use a 2D interleaved structure akin to RAID stripe based encoding, but several variants exist.
pieces to nodes
Finally, pieces of segments should be mapped to nodes.
One way of mapping pieces to nodes is through the concept of Placement group. A placement group is an ordered group of nodes with a predetermined storage pattern for pieces of the segment. In a system based on placement groups, each segment is assigned a placement group, which determines which nodes store it and how. This clearly simplifies the mapping of segments to nodes.
More generally speaking, each segment could have an individual pieces to nodes mapping. Whether this mapping has to be stored as metadata or it is something algorithmically determined is an important factor in deciding which mapping method to choose.
Mapping structure and Locality
Having defined all the above mappings between datasets, segments, pieces, and nodes, it is worth studying the effects of various mapping strategies. In particular, we define different notions of locality and study their effects on system performance.
In real life, nodes are not equal. They differ in several ways, some of which are node properties while others are better represented as relations between nodes.
storage and compute resources: these are node properties and less interesting for us now
network connectivity: both latency and network capacity are better modeled in the node-to-node context (edges of a graph)
independence of failures (and of control): while most theoretical studies assume independent node failures, this is far from being the reality. Both because of common underlying infrastructure, and because of being under the same control, some nodes tend to fail together.
Placement locality
Placement locality is related to the structure of the dataset to node mapping, and it essentially determines what fails together and what independence assumptions can be made.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Data Placement (allocation) strategies
The goal of this post is to discuss how data belonging to individual datasets can be (get) placed in the system, how this could be modeled, and what are the compromises around this.
At the abstract level, each dataset uploaded in the system is stored on a well defined set of servers. The contribution of individual servers can be different from system to system: in the simplest form it could be the storage of a copy of the whole file, but it can also be the storage of a continuous piece of the file, the storage of some segments of the file, the storage of some parity blocks, or, in the most generic definition, the storage of information that can be used to retrieve some parts of the file through a coding scheme. Storage can also change over time, as some systems use lazy encoding starting first with replication and slowly replacing it with erasure coding based redundancy.
Mapping of datasets to nodes
Whichever the system design, we can define a mapping between datasets and nodes (at any given time) based on whether a node is contributing to the storage of that particular dataset in any way. This mapping, and its properties such as how many nodes are storing a file, how clustered is the map, etc. are important system properties.
The contribution of individual nodes to the storage of the whole dataset can also be of several types, which is mainly determined by the underlying segmentation and logical structuring of the dataset.
Mapping in detail
datasets to segments
Datasets are often handled in independent segments. Independent in the sense that there is no relation between the redundancy protection applied to individual segments, i.e. information stored related to a segment has no relevance for the protection of another segment.
The mapping from datasets to segments is often seen as being outside the core functionality of the system. Such is the case of [Luby2019] that differentiates between user object (datasets) and storage objects (segments).
More generally speaking, when such segmentation is used, it is also important to differentiate whether placement of segments of the same file are related or independent. In a more "traditional" model, segments are always spread across the same set of nodes. It is however also possible to have independent placement, or, e.g. to make sure placement is non-overlapping.
segments to pieces
The protection of a segment itself is based on spreading it (using replication or erasure coding) over nodes, each node storing a piece. In most systems these pieces are generated using erasure coding based on the code symbols of the codeword, effectively spreading each codeword over the set of nodes. However, segments are larger than codewords, for example segments might be 64MB, while a codeword of a typical Reed Solomon code using 8-bit symbols and k=20, n=40 is 40 bytes. How symbols of codewords are "concatenated" into pieces should also be defined. The typical way is to use a 2D interleaved structure akin to RAID stripe based encoding, but several variants exist.
pieces to nodes
Finally, pieces of segments should be mapped to nodes.
One way of mapping pieces to nodes is through the concept of Placement group. A placement group is an ordered group of nodes with a predetermined storage pattern for pieces of the segment. In a system based on placement groups, each segment is assigned a placement group, which determines which nodes store it and how. This clearly simplifies the mapping of segments to nodes.
More generally speaking, each segment could have an individual pieces to nodes mapping. Whether this mapping has to be stored as metadata or it is something algorithmically determined is an important factor in deciding which mapping method to choose.
Mapping structure and Locality
Having defined all the above mappings between datasets, segments, pieces, and nodes, it is worth studying the effects of various mapping strategies. In particular, we define different notions of locality and study their effects on system performance.
In real life, nodes are not equal. They differ in several ways, some of which are node properties while others are better represented as relations between nodes.
Placement locality
Placement locality is related to the structure of the dataset to node mapping, and it essentially determines what fails together and what independence assumptions can be made.
Code locality
TBD
Network locality
TBD
Beta Was this translation helpful? Give feedback.
All reactions