-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Argon2: Reconsider MKPC, OMP_SCALE, benchmark hashes #5567
Comments
I was actually going to complain about just that (not realizing I was the one who put the figures there). My 8-core laptop and some 4-core workstation I happen to have nearby show just as fine speeds with them at 1-1-1 and that's obviously better for heavier costs. As for test vectors, FWIW KeePassXC defaults to Argon2d, 64 MiB, p=2 and then aims for a t that ends up in about a second (I think) which can be about 10 - 35 on vanilla hardware. During my work with keepass formats I created a test vector matching our current ones for Argon2 in order to see where I'm at (on GPU I only get 10% of Alain's speed). I think I will do the reverse as well, add a keepass-matching test vector to the argon2 formats that can be benchmarked with |
Well, I've just tested in a VM on my laptop under other load, and 1-1-1 is usually significantly slower at max threads. But when I limit threads to core count minus 1, it's fine. Indeed, having more work to do per Edit: I ran these tests in code without the pending memory allocation regression fix. So the (de)allocation was still in the loop. I don't know if fixing that would maybe affect this.
I wonder/worry how much it would slow down self-test. Even if not included in benchmarks by default, it would be in tests. Also, we could need the equivalent of my 1d7397a that we currently only have in the Armory format. At large thread counts, having e.g. 256x 64 MiB still allocated during cracking when actual loaded hashes are e.g. 16 MiB can be quite wasteful. (It was even worse in the Armory format because it uses SIMD to handle multiple "hashes" per thread. So it was easily tens of gigabytes during self-test on a huge AWS instance.) |
We should add a flag field to format_tests. In this case flagging "test vectors that are to be ignored for normal self-test". |
I was wondering why on a system under other load, where I get a speedup at // Find most expensive salt, for auto-tune
{
struct db_salt *s = db->salts;
tune_cost = MIN(db->max_cost[0], options.loader.max_cost[0]);
while (s->next && s->cost[0] < tune_cost)
s = s->next;
salt = s->salt;
} Why is that? Perhaps so that we don't exceed the maximum We could want to reconsider and instead auto-tune using one of the first two test vectors that we use for benchmarking. |
Suggested auto-tune patch: +++ b/src/omp_autotune.c
@@ -126,7 +126,7 @@ void omp_autotune_run(struct db_main *db)
{
struct db_salt *s = db->salts;
- tune_cost = MIN(db->max_cost[0], options.loader.max_cost[0]);
+ tune_cost = MIN(db->real ? db->max_cost[0] : s->cost[0], options.loader.max_cost[0]);
while (s->next && s->cost[0] < tune_cost)
s = s->next; which I think makes us auto-tune to first test vector when benchmarking, but to highest first cost hash when cracking. |
No, this doesn't always work right - seemed to work for Argon2, but took a 32 MiB test vector for Armory whereas its first one is 8 MiB. I think the problem is the salts are not in the same order as test vectors. Looks like we'll need an extra loop to find the salt corresponding to the first test vector. |
We've reset MKPC and OMP_SCALE to 1. As to benchmark hashes, there doesn't appear to be one common set of Argon2 parameters in use for password hashing. 4 MiB is certainly too low, but at least it's what we've been using so far. It would be relevant to see what NetBSD uses. But per these blog posts, they may have gone with something ridiculous for that use case (p > 1): https://blog.netbsd.org/tnf/entry/gsoc_2019_report_incorporating_the |
Testing on https://3v4l.org I see that recent PHP uses: <?php
echo PASSWORD_ARGON2_DEFAULT_MEMORY_COST . "\n";
echo PASSWORD_ARGON2_DEFAULT_TIME_COST . "\n";
echo PASSWORD_ARGON2_DEFAULT_THREADS . "\n";
echo 'Argon2i hash: ' . password_hash('password', PASSWORD_ARGON2I) . "\n";
echo 'Argon2i hash: ' . password_hash('php', PASSWORD_ARGON2I) . "\n";
echo 'Argon2id hash: ' . password_hash('password', PASSWORD_ARGON2ID). "\n";
echo 'Argon2id hash: ' . password_hash('php', PASSWORD_ARGON2ID). "\n";
(but many of their builds of PHP don't seem to support Argon2 at all - perhaps it's compile-time optional) There doesn't appear to be a default between 2i and 2id. Since Argon2id is newer, I guess if they ever want to make Argon2 the default, this is a more likely choice. So we may want to standardize on that for benchmarks. |
BTW isn't the CPU Argon2 code capable of threading (for p > 1) by itself? Would it ever be beneficial to use that instead of (or together with, depending on |
I was thinking of this too, and we could want to have a separate issue for that. Some notes and thoughts:
|
I notice that in 6b9f850 we got:
These values puzzle me. For something as slow as Argon2 normally is, I'd expect us to be able to set all of these to 1 without performance loss. Like we have them in the scrypt format.
We should re-test this with different values on different systems.
Also, our default benchmark hashes for the Argon2 formats are rather unusual - they use 4 MiB. In scrypt, we use 16 MiB. We could want to upgrade these to 16 MiB as well, which would contribute to lower MKPC and OMP_SCALE being optimal.
The text was updated successfully, but these errors were encountered: