Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Implementing Parity Volume Set Specification 2.0 #47

Open
Ironlenny opened this issue May 6, 2019 · 11 comments
Open

[Question] Implementing Parity Volume Set Specification 2.0 #47

Ironlenny opened this issue May 6, 2019 · 11 comments
Labels

Comments

@Ironlenny
Copy link

I'm writing an implementation of Parity Volume Set Specification 2.0. I'm a Rust novice, but it seems to me that your library as is won't work for me. If that's true, I'd like to extent it, but I'm not sure how to go about doing that. It seems based on this that I'd need to add inputs for a constant and an exponent. Am I right? Do you have any other suggestions?

@darrenldl
Copy link
Contributor

it seems to me that your library as is won't work for me.

PAR2 requires a GF16 (Galois field 2^8) engine while the published versions of this library (3.X.X) currently only provides GF8, so the published versions would not fit your need indeed.

There is some work put into implementing GF16 (Galois field 2^16) by @rphmeier, @drskalman and @burdges, which has been merged into the master branch. You can see issue #33 for more context and our discussion. The relevant PRs are #40 and #43.

I have been quite busy for past few months so I haven't managed to keep track of it and forgot where we were mostly, so I'll need to review the code to see if it's a complete implementation yet. Or perhaps they could answer that for you if they happen to be free.

It seems based on this that I'd need to add inputs for a constant and an exponent. Am I right?

Sorry I can't really answer that right now. I'm not familiar with the specifics of PAR2 and I don't have time to investigate as other project's deadline is closing in.

Do you have any other suggestions?

PAR2 is a very complex standard from what I know, and is a massive undertaking to implement. AFAIK it's not very commonly used as well nowadays, so rewriting such a massive, established and presumably well tested project in Rust requires a really good reason.

Mind if I ask what's the context and reason of implementing PAR2 in Rust?

@Ironlenny
Copy link
Author

Ironlenny commented May 7, 2019

PAR2 is actually a rather simple spec. It defines a series of packets. What the type and size of the contents of each packet are. And what the algorithms are to be used in the creation of said packets. The only packet that's giving me trouble is the recovery packet as it requires the Reed-Solomon algorithm. And I'm still learning that.

As for why I'm doing this. I want to learn more about Rust, and parity as I'm writing a file system in the future.

I have no trouble pulling from master if that's the only code that will work. I've already forked another library, and am using it in my project.

@darrenldl
Copy link
Contributor

PAR2 is actually a rather simple spec.

Whoops, you're right, I was way wrong. I guess I was thinking of another specification.

As for why I'm doing this. I want to learn more about Rust, and parity as I'm writing a file system in the future.

I see. There are some projects using this library should you want any very concrete examples, hbbft is one, but it's used for distribution type of deal rather than local data error recovery iirc. For local data error recovery, I have made an archiver which does that, but otherwise I'm not sure if there are more Rust examples.

I have no trouble pulling from master if that's the only code that will work. I've already forked another library, and am using it in my project.

Yep forking is a good idea - ideally master doesn't break anything, but as we resume development later I can't guarantee that.


I took a brief look at the specification, and can see roughly where it is going. Seems to have poor spatial locality during encoding and decoding, but is likely the only design that can accommodate this type of error recovery.

@rphmeier
Copy link
Contributor

rphmeier commented May 7, 2019

The GF(2^16) implementation @drskalman and I did is working, although unoptimized. We are using it for a fault-tolerant data replication service as part of a larger project.

@burdges
Copy link
Collaborator

burdges commented May 7, 2019

We're quite interested in optimizations to the GF(2^16) arithmetic. I'm not convinced the sum of logs optimization is being used quite right, although I've not hard the time to dig into it myself. I'm afraid that's kinda subtle topic though.

@Ironlenny
Copy link
Author

Ironlenny commented May 8, 2019

At the moment I'm just looking how to fit this to purpose. This is the formula outlined by the spec as I understand it: recovery = (inputA * constantA ^ exponent) + (inputB * constantB ^ exponent). Where recovery is the recovery shard, and inputA && B are input shards, constantA && B are 2^n where 'n' has an order of 65535, and exponent is the power of 2 assigned to the recovery shard. All operations are done with Galois Fields.

I don't know how to fit this in with your library. I don't really understand what this algo is doing.

@darrenldl
Copy link
Contributor

I don't really understand what this algo is doing.

Sorry did you mean the algorithm of this library or the algorithm of PAR2?

@Ironlenny
Copy link
Author

The algorithm for PAR2.

@darrenldl
Copy link
Contributor

darrenldl commented May 8, 2019

Right. I think I know where it is going very roughly after reviewing the specs as well as Plank's paper, but I don't have time to fully investigate. Understanding the algorithm being PAR2 is part of your project anyway.

I did have a look at par2cmdline's source code, and judging by how Reed Solomon coding is used there, this library may be usable with some adjustment, but not too certain. Specifically I don't know if the picked static parameters used at several places match up.

This library is compatible with BackBlaze's and also Klaus Post's implementation, but I have no idea if that means it is incompatible with PAR2 as a result. Actually hold on...they are using only Galois 8...right I still don't quite understand how the list of constants come into play in PAR2 specs, so I still don't know this library can be adjusted to suit the use for PAR2.

@darrenldl
Copy link
Contributor

There is some interesting talk on GF16 performance and PAR stuff over at Parchive/par2cmdline#130.

Most of the performance talk flew over my head cause I'm pretty clueless on very low-level performance stuff and also not well versed in the accelerated Galois paper, but reckon you guys probably have better luck extracting info from it.

@k3d3
Copy link

k3d3 commented Jun 23, 2022

Now that galois_16 is a thing, does that make it any easier to use this to implement PAR2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants