-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize dettman algorithm #1601
Conversation
89d85cc
to
2296a3d
Compare
FWIW, I am unsure if the current files in fiat-amd64/fiat_secp256k1_dettman_mul (and square respectively) would still be proven correct. |
Ah! I forgot about updating those. Indeed, they failed. I'll replace them with the CryptOpt files that you linked. |
9e0e022
to
c4e7c20
Compare
I'm trying to add this enhancement, via a new and/or generalized rewriting rule, before merging this PR. |
c4e7c20
to
3c99c7f
Compare
…tion should consist of just writing "register_width = 64" in Arith/DettmanMultiplcation. something broke.
…s it originally was.
function did rather unreasonable things when the limbwidth was fractional. In this commit, I tried to fix this issue, but I ended up making the reduce_carry_borrow function pretty obnoxious. Also, fixing this issue requires somewhat strange (and strong) hypotheses on the weight function. We'd want, for instance, (weight (2 * limbs - 2) / weight (limbs - 2)) mod s = 0. I plan to revert my changes (from this commit) to the reduce_carry_borrow function, so that it will continue to do unreasonable things given fractional limbwidths. This was not worth it.
…tation. wrote some (messy) arithmetic proofs. still more proofs to go in Arithmetic/DettmanMultiplication.v. Then need to change PushButtonSynthesis appropriately.
…s to Arithmetic/DettmanMultiplication.v
Replaced a bunch of references to (Z.log2_up s) with (Z.log2 s), since (in the dettman_multiplication_mod_ops module) we now stipulate that s is a power of 2.
…iminated the unnecessary 64-bit mask, but it also caused weird stuff to happen and killed off a bunch of 52-bit masks. Where'd they go? idk.
3c99c7f
to
a70781b
Compare
It now comes later in the rewriting process. Apparently it's important that it comes after the fancy stuff. For reasons that I don't understand, proofs in Barrett256 and Montgomery256 break if I put it before.
The dettman code generation actually works now (with the redundant Code generation for @andres-erbsen, @JasonGross - do you have any idea what would cause this? Might it be the same issue as in #1604? |
Oops, never mind my previous comment. I've inferred (correctly, I think?) from #1604 that my problem has something to do with dead code elimination, so I've removed the dead code elimination thing that I added in |
In this commit, I just removed the bit about dead code elimination that I added in BoundsPipeline.v. The C code synthesizes without error now, at least. I haven't built everything yet, so we'll see if things actually work nicely.
by removing the dead-code-elimination thing]
Looks like the redundant ands, which are removed by the new rewrite pass that I added, actually appeared in a lot of places other than the dettman code. Adding the new rewrite pass caused the total length of generated code to decrease by about 27000 lines. I was not expecting this, but I think it's a good thing. |
This cleanup is indeed more impressive than anticipated :D. Great job! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than needing to update one more 3 into 4 for accurate error messages, this looks good to me!
This implements the optimization described in #1582.
This should resolve almost all of #1582; the only thing missing is the aesthetic improvement mentioned here about removing an unnecessary & operation, which I don't know how to fix.
Incidentally, I've realized that the dettman template, in its state before this PR, did unreasonable things when given a fractional limbwidth. I've rewritten it so that it will hopefully---I haven't tested this or anything---now behave reasonably given a fractional limbwidth. This happened to require the same sort of change that I was making to implement the optimization. This change should make the algorithm a bit easier to read, too.