fix(content): update filecoin piece content #1016

yiannisbot · 2020-07-25T07:17:26Z

This PR is updating the description of the Filecoin Piece and provides a description of Data Representations in the Filecoin Network. Although the section will likely go through some restructuring, with this PR I would like to make sure that the content in this subsection is correct and up-to-date.

content/systems/filecoin_files/piece/_index.md

rvagg · 2020-08-05T04:50:11Z

Hey, while we're doing this, if we end up with the longer description of how to get to CommP, how about we take the opportunity to insert a definition of "Fr32" into the specs because it's going to keep getting asked. @ribasushi was the latest one to ask just 2 days ago. Combining and rephrasing parts of @porcuquine's responses we could come up with something like:

The term Fr32 is derived from the name of a struct Filecoin uses to represent the elements of the arithmetic field of a pairing-friendly curve, specifically Bls12-381—which justifies use of 32 bytes. F stands for "Field", while r is simply a mathematic letter-as-variable substitution used to denote the modulus of this particular field.

yiannisbot · 2020-09-17T11:37:09Z

@porcuquine @rvagg @ribasushi can you have another look at this after a revision I did to the description? I think I've addressed all the issues discussed above.

rvagg · 2020-09-18T01:41:35Z

content/systems/filecoin_files/piece/_index.md

+
+2. In order to make a _Filecoin Piece_, the IPLD DAG is serialised into a ["Content-Addressable aRchive" (.car)](https://github.com/ipld/specs/blob/master/block-layer/content-addressable-archives.md#summary) file, which is in raw bytes format. A CAR file is an opaque blob of data that packs together and transfers IPLD nodes. The _Payload CID_ is common between the CAR'ed and un-CAR'ed constructions. This helps later during data retrieval, when data is transferred between the storage  client and the storage provider as we discuss later.
+
+3. The resulting .car file is _padded_ with extra zero bits in order for the file to make a binary Merkle tree. To achieve a clean binary Merkle Tree the the .car file size has to be in some power of two (^2) size. Therefore, the padding process takes the input file and finds the size above the input size that makes for a power of two size. The gap between the input size and the next power of two size is padded with zeros. Finally, an extra padding process is applied, called `fr32 padding` which adds two (2) zero bits to every 254 bits (to make 256 bits in total).


It looks like this para has gone through some iterations and lost the initial ordering in an attempt to make it easier to explain - which I understand, but it's now not quite accurate. The final size needs to be a ^2 but the fr32 padding gets in there before it goes to ^2. So to calculate how much zero padding to add, you have to account for the 254/256 inflation that's going to occur.

I think in an earlier iteration it was suggested to switch the order of the steps and say that fr32 happens first and then inflation to ^2 after that. I think this is the ideal framing because it makes it easier to understand (even though in code it happens the other way around, it doesn't actually matter so it's worth doing what's best for descriptive purposes). So that would mean pulling the last sentence to be first, "A process called fr32 padding which adds ... is applied. Extra padding is applied to the end of the data, adding zero bytes until the resulting blob is a power of two sie.". Something like that.

Good catch @rvagg! I've tried to explain it both ways, but indeed it becomes tricky, you're right. I've updated in commit 49a025b and included only the easier to grasp process.

Forgot to mention that I've removed the definition of fr32 from here and included it in the Glossary (outstanding PR atm).

I'm just noticing this now. I'm commenting here without an entirely clear point, but so it doesn't get lost in the context shift:

Fr32 is well-defined in the rust-fil-proofs implementation. As far as I can tell from the glossary, it's not given a substantive definition there. It's use here as an adjective was never particularly well-defined (from what I saw), although some true statements were made justifying the adjective.

Without making a specific suggestion for how to untangle this, let me offer the definition underlying the type called Fr32 in rust-fil-proofs. Fr32 is a 32-bit representation of a field element (which, in our case, is now, is the arithmetic field of BLS12-381). To be well-formed, a value of type Fr32 must actually fit within that field, but this is not enforced by the type system. It is an invariant which must be perserved by correct usage.

In the case of so-called Fr32 padding, two zero bits are inserted 'after' a number requiring at most 254 bits to represent. This guarantees that the result will be Fr32, regardless of the value of the initial 254 bits. This is a 'conservative' technique, since for some initial values, only one bit of zero-padding would actually be required.

It's possible that none of this needs to be spelled out explicitly here. It's not clear that an 'official' definition of the term makes much sense without at least some of this context, though. I've put this comment here rather than there because it's in the context of this conversation that the added detail is obviously relevant.

I should note that Fr32 is not obviously a concept which belongs in the spec, even though it seems to have made its way into the preferred name for what I historically called 'bit padding'. I don't mind if it does stay in the spec as a glossary term. I mention this just to clarify that its primary definition is as an implementation type with relatively elaborate characteristics.

In particular, it's important to understand that Fr32-padded data represents a smaller set of values than inhabit the Fr32 type.

@porcuquine this is useful input. I have added your definition above as an extra "important note" in this section (see commit 972763d) and will also add this as a definition in the glossary too, if you don't mind.

I should note that Fr32 is not obviously a concept which belongs in the spec

I generally tend to agree. However, if we don't mention anything specific to Fr32, then we should just mention what is in one of the notes: "any implementation has to make sure that the initial IPLD DAG is serialised and padded so that it gives a clean binary tree, and therefore, calculating the Merkle root out of the resulting blob of data gives the same Piece CID", and leave it to that, without any mention of padding, .car etc.

I suggest leaving the text as is for now, given we mention that all this is Lotus-specific and the alternative would be slightly confusing to describe. Let me know if you're ok with this.

content/systems/filecoin_files/piece/_index.md

rvagg

sgtm, at least the bits I've commented on, some of the rest are out of my scope

porcuquine · 2020-09-21T18:11:18Z

content/systems/filecoin_files/piece/_index.md

- Steps 2 and 3 above are specific to the Lotus implementation. The same outcome can be achieved in different ways. However, any implementation has to make sure that the initial IPLD DAG is serialised and padded so that calculating the Merkle root out of the resulting blob of data gives the same **Piece CID**. As long as this is the case, implementations can deviate from the first three steps above.
+- `Fr32` is a 32-bit representation of a field element (which, in our case, is the arithmetic field of BLS12-381). To be well-formed, a value of type `Fr32` must _actually_ fit within that field, but this is not enforced by the type system. It is an invariant which must be perserved by correct usage.
+In the case of so-called `Fr32 padding`, two zero bits are inserted 'after' a number requiring at most 254 bits to represent. This guarantees that the result will be `Fr32`, regardless of the value of the initial 254 bits. This is a 'conservative' technique, since for some initial values, only one bit of zero-padding would actually be required.
+- Steps 2 and 3 above are specific to the Lotus implementation. The same outcome can be achieved in different ways, e.g., without using `Fr32` bit-padding. However, any implementation has to make sure that the initial IPLD DAG is serialised and padded so that it gives a clean binary tree, and therefore, calculating the Merkle root out of the resulting blob of data gives the same **Piece CID**. As long as this is the case, implementations can deviate from the first three steps above.


Two things:

I do think the padding method is prescribed. Otherwise there will not be a deterministic mapping from raw data to CommP. It's just that this padding method is not the only possible way to convert from raw data into a sequence of Fr32 elements.

I don't think anything about Fr32 is Lotus-specific: it's a rust-fil-proofs implementation detail.

I don't make these suggestions to complicate things. Probably incorporating them will lead to a simpler explanation. The current text is still not quite accurate.

@porcuquine can you please provide the right phrasing for this, so that we finalise?

yiannisbot requested a review from porcuquine July 25, 2020 08:02

rvagg reviewed Jul 27, 2020

View reviewed changes

content/systems/filecoin_files/piece/_index.md Outdated Show resolved Hide resolved

rvagg reviewed Jul 27, 2020

View reviewed changes

content/systems/filecoin_files/piece/_index.md Outdated Show resolved Hide resolved

porcuquine reviewed Jul 27, 2020

View reviewed changes

content/systems/filecoin_files/piece/_index.md Outdated Show resolved Hide resolved

porcuquine reviewed Jul 28, 2020

View reviewed changes

content/systems/filecoin_files/piece/_index.md Outdated Show resolved Hide resolved

yiannisbot changed the base branch from beta to master September 17, 2020 06:53

yiannisbot added 5 commits September 17, 2020 14:20

Piece description update

9d32675

padding explained

079b8c4

CommR description update

ba67553

CommC description update

c6cb3e9

new description of padding and fr32

13e33f3

yiannisbot force-pushed the pieces-fix branch from 94aa3a1 to 13e33f3 Compare September 17, 2020 11:23

piecestore update

51171eb

rvagg reviewed Sep 18, 2020

View reviewed changes

content/systems/filecoin_files/piece/_index.md Outdated Show resolved Hide resolved

yiannisbot added 2 commits September 18, 2020 15:47

padding process correction

49a025b

clarifies .car guidelines and Lotus-specific implementation points

7db9051

rvagg previously approved these changes Sep 21, 2020

View reviewed changes

yiannisbot added the scope: content Scope: Editing content of the spec label Sep 21, 2020

clarifying fr32 definition and use

972763d

yiannisbot dismissed rvagg’s stale review via 972763d September 21, 2020 10:57

yiannisbot added the hint: ready to merge Hint: PR is ready to be merged label Sep 21, 2020

Merge branch 'master' into pieces-fix

7877dd3

porcuquine reviewed Sep 21, 2020

View reviewed changes

hugomrdias approved these changes Sep 22, 2020

View reviewed changes

hugomrdias changed the title ~~Piece description update~~ fix(content): update filecoin piece content Sep 22, 2020

hugomrdias merged commit d4148dd into master Sep 22, 2020

hugomrdias deleted the pieces-fix branch September 22, 2020 09:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(content): update filecoin piece content #1016

fix(content): update filecoin piece content #1016

yiannisbot commented Jul 25, 2020

rvagg commented Aug 5, 2020

yiannisbot commented Sep 17, 2020

rvagg Sep 18, 2020

yiannisbot Sep 18, 2020

yiannisbot Sep 19, 2020

porcuquine Sep 21, 2020

porcuquine Sep 21, 2020

yiannisbot Sep 21, 2020

rvagg left a comment

porcuquine Sep 21, 2020

yiannisbot Sep 22, 2020


		2. In order to make a _Filecoin Piece_, the IPLD DAG is serialised into a ["Content-Addressable aRchive" (.car)](https://github.com/ipld/specs/blob/master/block-layer/content-addressable-archives.md#summary) file, which is in raw bytes format. A CAR file is an opaque blob of data that packs together and transfers IPLD nodes. The _Payload CID_ is common between the CAR'ed and un-CAR'ed constructions. This helps later during data retrieval, when data is transferred between the storage client and the storage provider as we discuss later.

		3. The resulting .car file is _padded_ with extra zero bits in order for the file to make a binary Merkle tree. To achieve a clean binary Merkle Tree the the .car file size has to be in some power of two (^2) size. Therefore, the padding process takes the input file and finds the size above the input size that makes for a power of two size. The gap between the input size and the next power of two size is padded with zeros. Finally, an extra padding process is applied, called `fr32 padding` which adds two (2) zero bits to every 254 bits (to make 256 bits in total).

fix(content): update filecoin piece content #1016

fix(content): update filecoin piece content #1016

Conversation

yiannisbot commented Jul 25, 2020

rvagg commented Aug 5, 2020

yiannisbot commented Sep 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rvagg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment