-
Notifications
You must be signed in to change notification settings - Fork 23
File Upload And Merkle Root Calculation
Consider you have an allocation with EC ratio of n:p
where n
is number of data-blobbers and p
is number of parity-blobbers.
Following are some of the constants/definitions:
- BlockSize(currently aka chunkSize) --> 64 KB
- Number of Merkle Leaves --> 1024
- Merkle leaf's block size(aka merkle-chunk-size) --> 64 Bytes
- Fragment: Result of
n
* 64KB
For some file of size S
, it is divided into multiple fragments. Last fragment that is short of n * 64KB
will be padded with null bytes to make upto n * 64KB
.
Each fragment is taken and splitted into 64KB of data resulting into n
such blocks with above consideration.
Now with n
data-shards, using reed-solomon erasure encoding, p
parity-shards are calculated.
After parity-shards are calculated, respective shards are sent to the respective blobbers. So there will be n+p
upload requests. Blobber will create new file in its temporary directory.
For next fragment, same process is followed. Blobber will append to existing file. And so on.
Consider there were m
fragments. So blobbers will get m * 64KB
of data each.
For a blobber, merkle root is calculated as follow:
- Divide each 64KB into block of 64 Bytes resulting in 1024 blocks. So the data indexes are from 0...1023
- For some index
i
, there arem
number of64 Bytes
, which are treated as continuous data (Each index thus has data ofm* 64 Bytes
as its leaf) - Hash of this continuour
m* 64 Bytes
is thus a leaf of an index. - Merkle tree and hence merkle root is calculated from these 1024 leaves.
Obviously merkle root of a file for each blobber will be different.
When a file is uploaded, it has num_of_blocks
field which is number of 64KB that a blobber received. A directory will have num_of_blocks
too which is simply sum of num_of_blocks
of its children. So a root directory /
will have record of total_num_of_blocks
in an allocation.
Network will pose blobber with random_seed
, which is used to calculate random block_num
of an allocation. So block_num
is basically calculated as; block_num = rand.NewSource(random_seed).Int63n(total_num_of_blocks)
.
Using GetObjectPath
which has function signature func GetObjectPath(ctx context.Context, allocationID string, blockNum int64) (*ObjectPath, error)
, we get an object path which provides file/ref
that is related to above block_num
, list of all refs
with hierarchy maintained that were retrieved from database while getting respective ref, root_hash
of an allocation and other few details.
Challenge to a blobber is issued in following manner:
-
Mining network posts a transaction, that has a list of eligible validators, random seed to determine which data block a blobber must submit and the latest writemarker.
-
Blobber checks network if any challenges has been issued.
-
It then uses random seed to calculate block number.
-
Blobber finds a file which stores this block of data. (In blobber code, using
GetObjectPath
) -
The file is splitted into blocks of 64KB.
-
Each 64KB are splitted into block of 64 Bytes resulting into 1024 blocks.
-
64 Bytes in same indices are treated as continuous data, which is a leaf for that respective index.
-
Merkle tree is calculated from 1024 leaves.
-
Random block of a file from indexes of 0...1023 is also selected using same random seed above
-
Merkle tree, data of some index i,
object_path
and list of writemarkers are sent to the validator. Note: These writemarkers are list of writemarkers that fall in range of writemarker's allocation_root that is issued by blockchain for the challenge and the latest writemarker's allocation_root of an allocation -
Validator uses same mechanism to calculate random block and verify file metadata from
object_path
. Validation of anobject_path
is crucial here as it will also verify that,merkle_root
of file is a valid hash. With thismerkle_root
,merkle_tree
along with responded data is validated. -
Validator issues ticket regarding the validity of a challenge. Upon reaching consensus among validators, transaction is written to Blockchain which transfers some tokens from challenge pool to blobber's account.
Note: @cnlangzi has different implementation regarding leaf's hash in gosdk. His implementation calculates hash in each index as merkle root of m number of 64 Bytes as leaves. This is not an issue as it is used by blobber and validators as well to re-calculate merkle tree for challenge hash.
BlockSize and Number of merkle leaves and its blocksize(merkle-chunk-size) is always fixed to 64KB and 1024 and 64 Bytes respectively. This size may not be optimal in multiple cases. As an example, we can maybe have the following configurations;
For files lesser than 1KB, We can either consider using BlockSize to 1KB along with setting merkle-chunk-size to 1 Byte instead of 64 Bytes thus conserving 1024 number of merkle leaves or, consider decreasing number of merkle leaves and increasing merkle-chunk-size
For files greater than 1GB, We can consider using BlockSize of 50 MB along with setting merkle-chunk-size to 50 KB instead of 64 Bytes thus conserving 1024 number of merkle leaves.
We can also make it dynamic based on number of data-shards that an allocation has.
Note: Above example is simply an observation. To implement it we need to test its effect in the token economy, computational efficiency and blobber's convenience to store small files