Skip to content

File Upload And Merkle Root Calculation

Laxmi Prasad Oli edited this page Apr 7, 2022 · 1 revision

File upload mechanism

Consider you have an allocation with EC ratio of n:p where n is number of data-blobbers and p is number of parity-blobbers.

Following are some of the constants/definitions:

  1. BlockSize(currently aka chunkSize) --> 64 KB
  2. Number of Merkle Leaves --> 1024
  3. Merkle leaf's block size(aka merkle-chunk-size) --> 64 Bytes
  4. Fragment: Result of n * 64KB

For some file of size S, it is divided into multiple fragments. Last fragment that is short of n * 64KB will be padded with null bytes to make upto n * 64KB. Each fragment is taken and splitted into 64KB of data resulting into n such blocks with above consideration. Now with n data-shards, using reed-solomon erasure encoding, p parity-shards are calculated. After parity-shards are calculated, respective shards are sent to the respective blobbers. So there will be n+p upload requests. Blobber will create new file in its temporary directory.

For next fragment, same process is followed. Blobber will append to existing file. And so on.

Merkle root calculation

Consider there were m fragments. So blobbers will get m * 64KB of data each. For a blobber, merkle root is calculated as follow:

  1. Divide each 64KB into block of 64 Bytes resulting in 1024 blocks. So the data indexes are from 0...1023
  2. For some index i, there are m number of 64 Bytes, which are treated as continuous data (Each index thus has data of m* 64 Bytes as its leaf)
  3. Hash of this continuour m* 64 Bytes is thus a leaf of an index.
  4. Merkle tree and hence merkle root is calculated from these 1024 leaves.

Obviously merkle root of a file for each blobber will be different.

Challenge Issuance from Smart Contract

When a file is uploaded, it has num_of_blocks field which is number of 64KB that a blobber received. A directory will have num_of_blocks too which is simply sum of num_of_blocks of its children. So a root directory / will have record of total_num_of_blocks in an allocation.

Network will pose blobber with random_seed, which is used to calculate random block_num of an allocation. So block_num is basically calculated as; block_num = rand.NewSource(random_seed).Int63n(total_num_of_blocks).

Using GetObjectPath which has function signature func GetObjectPath(ctx context.Context, allocationID string, blockNum int64) (*ObjectPath, error), we get an object path which provides file/ref that is related to above block_num, list of all refs with hierarchy maintained that were retrieved from database while getting respective ref, root_hash of an allocation and other few details.

Challenge to a blobber is issued in following manner:

  1. Mining network posts a transaction, that has a list of eligible validators, random seed to determine which data block a blobber must submit and the latest writemarker.

  2. Blobber checks network if any challenges has been issued.

  3. It then uses random seed to calculate block number.

  4. Blobber finds a file which stores this block of data. (In blobber code, using GetObjectPath)

  5. The file is splitted into blocks of 64KB.

  6. Each 64KB are splitted into block of 64 Bytes resulting into 1024 blocks.

  7. 64 Bytes in same indices are treated as continuous data, which is a leaf for that respective index.

  8. Merkle tree is calculated from 1024 leaves.

  9. Random block of a file from indexes of 0...1023 is also selected using same random seed above

  10. Merkle tree, data of some index i, object_path and list of writemarkers are sent to the validator. Note: These writemarkers are list of writemarkers that fall in range of writemarker's allocation_root that is issued by blockchain for the challenge and the latest writemarker's allocation_root of an allocation

  11. Validator uses same mechanism to calculate random block and verify file metadata from object_path. Validation of an object_path is crucial here as it will also verify that, merkle_root of file is a valid hash. With this merkle_root, merkle_tree along with responded data is validated.

  12. Validator issues ticket regarding the validity of a challenge. Upon reaching consensus among validators, transaction is written to Blockchain which transfers some tokens from challenge pool to blobber's account.

Note: @cnlangzi has different implementation regarding leaf's hash in gosdk. His implementation calculates hash in each index as merkle root of m number of 64 Bytes as leaves. This is not an issue as it is used by blobber and validators as well to re-calculate merkle tree for challenge hash.

Observations

BlockSize and Number of merkle leaves and its blocksize(merkle-chunk-size) is always fixed to 64KB and 1024 and 64 Bytes respectively. This size may not be optimal in multiple cases. As an example, we can maybe have the following configurations;

For files lesser than 1KB, We can either consider using BlockSize to 1KB along with setting merkle-chunk-size to 1 Byte instead of 64 Bytes thus conserving 1024 number of merkle leaves or, consider decreasing number of merkle leaves and increasing merkle-chunk-size

For files greater than 1GB, We can consider using BlockSize of 50 MB along with setting merkle-chunk-size to 50 KB instead of 64 Bytes thus conserving 1024 number of merkle leaves.

We can also make it dynamic based on number of data-shards that an allocation has.

Note: Above example is simply an observation. To implement it we need to test its effect in the token economy, computational efficiency and blobber's convenience to store small files

Clone this wiki locally