Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Drum" word duping in generated SSKR shares #32

Closed
DmKoshelek opened this issue May 4, 2024 · 36 comments · Fixed by LedgerHQ/app-seed-tool#7
Closed

"Drum" word duping in generated SSKR shares #32

DmKoshelek opened this issue May 4, 2024 · 36 comments · Fixed by LedgerHQ/app-seed-tool#7
Assignees
Labels
bug Something isn't working

Comments

@DmKoshelek
Copy link

I tried to generate SSKR phrases with the app and they are looking weird. All of them except first one just contains repeated "Drum" word in positions from 11 to 42 (I believe other words just contains meta information and checksums). And for sure, I can't recover my bip39 phrase from this SSKR shares. My Seed phrace contains 24 words but to create this issue I generated a new one. I tried to use this shares to recover my seed phrase using seedtool-cli but for sure got error that shares are invalid

Steps to reproduce the behavior:

  1. Reset Ledger Nano S+ by next 24 words seed phrase (it's new generated one don't worry)
head bulb beauty miracle vessel kick toddler bamboo defense open hill fiscal feature illness fence reunion hub song glory occur cannon enjoy wasp common
  1. Install the Seed App
  2. Open Seed App
  3. Select "Check BIP39 phase"
  4. Select "24 words"
  5. Input the seed phrase and complete check
  6. Select number of shares - 6
  7. Select number of threshold - 3
  8. Get the list of shares
1. tuna acid epic hard data axis whiz able also able yurt poem iron note idle vast quiz when each zone unit curl twin quiz navy aunt dice whiz onyx skew yurt meow visa tuna calm fuel  whiz purr cola cola legs hang able able numb frog
2. tuna acid epic hard data axis whiz able also acid drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum able able oval flux
3. tuna acid epic hard data axis whiz able also also drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum able able vows calm
4. tuna acid epic hard data axis whiz able also apex drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum able able jazz jowl
5. tuna acid epic hard data axis whiz able also aqua drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum able able fuel draw
6. tuna acid epic hard data axis whiz able also arch drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum able able scar gyro
  1. Check seed phrace back using seedtool-cli
docker run --rm -it seedtool-cli -i sskr
tuna acid epic hard data axis whiz able also apex drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum able able jazz jowl
tuna acid epic hard data axis whiz able also aqua drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum able able fuel draw
tuna acid epic hard data axis whiz able also arch drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum drum able able scar gyro
^D
seedtool: Invalid Bytewords.

Device:

  • Device: Ledger Nano S+
  • Firmware version: 1.1.1
  • App Version v1.7.1
  • Free Space on device: 539 Kb
@DmKoshelek DmKoshelek changed the title "Drum" word duping in generated SSKR phrases "Drum" word duping in generated SSKR shares May 4, 2024
@aido
Copy link
Owner

aido commented May 4, 2024

Hi @DmKoshelek ,

I think there may be a few problems here. One of these problems may be related to this issue:
#23 (comment)

Note that the first 10 words of each SSKR share is a CBOR header plus metadata and this looks correct in all the shares you provided. The next 32 words is the share data and the last 4 words are a CRC checksum.

From previous experience with similar issues the physical device seems to be operating differently to the emulator used for testing (Speculos). Previously CRC32 functionality was shown to be faulty on the physical devices but that was fixed by Ledger.

I shall investigate further.

@aido
Copy link
Owner

aido commented May 4, 2024

I have tested v1.7.1 of the app installed from provider 4 on a physical Ledger Nano S. The meta data and share data for all shares seems to be generated correctly but the checksum is not.

The checksums in your test on a Nano S+ and also the checksums in my test on a Nano S exhibit the same behaviour as this comment:
#23 (comment).

The first 2 words of every checksum is always able able (i.e. 0x0000). This would suggest that issue #23 is not fixed.

One possibility is that the app on provider 4 was built with a version of the SDK where CRC32 functionality is still broken.

I will wait to get Ledger's or @lpascal-ledger opinion on this. A unit test for the CRC32 function in the SDK may help.

@aido aido added the bug Something isn't working label May 4, 2024
@DmKoshelek
Copy link
Author

I used the build from My Ledger provider 4 as it was suggested here #15 (reply in thread). I am not sure which SDK was used there to build the app

@aido aido moved this to In Progress in app-seed-tool Project May 5, 2024
@lpascal-ledger
Copy link

Hi folks,

Indeed the deployed version was buggy. It uses the correct SDK but the code uses the cx_crc32_hw function which redirects to the OS buggy implementation, rather than the fixed cx_crc32 SDK implementation.

I'll redeploy the app with the commits of @xchapron-ledger. @aido could you confirm this changelog / version suits your constraints?

@lpascal-ledger
Copy link

Fine by me!
Your code already contains the correct calls, we only deployed before they were merged.

@xchapron-ledger
Copy link

While at it, you should now be able to drop this line https://github.com/aido/app-seed-tool/blob/develop/src/ux_common/onboarding_seed_sskr.c#L6

@aido
Copy link
Owner

aido commented May 6, 2024

While at it, you should now be able to drop this line https://github.com/aido/app-seed-tool/blob/develop/src/ux_common/onboarding_seed_sskr.c#L6

Yes, there is a PR currently a work in progress that adds the Stax UI but also removes that line::
LedgerHQ#5

So, line will be gone in next version after v1.7.2.

@lpascal-ledger
Copy link

@aido I take from your approval that we don't wait for another PR, we can merge & deploy, is that correct?

@aido
Copy link
Owner

aido commented May 6, 2024

Yes.
No need to wait for another PR. Your v1.7.2 bump PR is good enough to get a working and decent version of the app deployed.

Any other minor updates I need to make can be rolled into the Stax PR that is currently a work in progress.

Thanks.

@lpascal-ledger
Copy link

lpascal-ledger commented May 6, 2024

v1.7.2 has been deployed on provider 4.

@aido aido moved this from In Progress to Release 1.72 in app-seed-tool Project May 6, 2024
@aido
Copy link
Owner

aido commented May 7, 2024

I have installed v1.7.2 from provider 4 and all looks good. Shares and CRC32 seem correct .

Device Nano S
OS 2.1.0
App Version 1.7.2
Status $${\color{green}working}$$

If @DmKoshelek confirms same for Nano S+ then I will close this ticket.

@DmKoshelek
Copy link
Author

Hi guys,

Today I tried to generate shares again but with my main seed phrase on Nano S+ and suddenly it doesn't work anyway. CRC32 part looks different compare to previous version, last 43-46 words are different for each phrase
2. ... when figs jolt sets
3. ... peck door user tent
4. ... news trip glow cusp
5. ... code arch ruby zero
6. ... door yell drop away

but data words (11-42) for 2-5 shares are still "drum" only

@aido
Copy link
Owner

aido commented May 7, 2024

Hi @DmKoshelek ,

Thanks for checking.

but data words (11-42) for 2-5 shares are still "drum" only

This is strange behaviour. Strange on two counts, Strange that the word drum is consistently repeating and also strange that the first share seems to be good and after that it just goes berserk.

The code for generating phrases on a Nano S is exactly the same as code for generating phrases on a Nano S+ except for one thing.
The Nano S does not have the Galois Field multiplication syscall cx_bn_gf2_n_mul() so it uses its own function for this. All other devices use the cx_bn_gf2_n_mul() syscall.

So

  • Functional tests (Speculos) use OpenSSL functions for Galois Field multiplication and all tests pass.
  • Nano S does not use the cx_bn_gf2_n_mul() sysall but instead uses a function within the app for Galois Field multiplication and all tests pass:
    #define GF2_8_MPI_BYTES 1
    #if defined(TARGET_NANOS) && !defined API_LEVEL
    /**
    * @brief Performs a multiplication over GF(2^n).
    *
    * @param[out] bn_r BN index for the result.
    *
    * @param[in] bn_a BN index of the first operand.
    *
    * @param[in] bn_b BN index of the second operand.
    *
    * @param[in] bn_n BN index of the modulus.
    * The modulus must be an irreducible polynomial over GF(2)
    * of degree n.
    *
    * @param[in] bn_h BN index of the second montgomery constant.
    *
    * @return Error code:
    * - CX_OK on success
    * - CX_NOT_LOCKED
    * - CX_INVALID_PARAMETER
    * - CX_MEMORY_FULL
    */
    cx_err_t cx_bn_gf2_n_mul(cx_bn_t bn_r,
    const cx_bn_t bn_a,
    const cx_bn_t bn_b,
    const cx_bn_t bn_n,
    const cx_bn_t bn_h __attribute__((unused))) {
    cx_err_t error = CX_OK;
    cx_bn_t bn_x, bn_y, bn_temp;
    int cmp_x, cmp_y;
    uint32_t degree = 0;
    size_t nbytes;
    bool bit_set = 0;
    // Preliminaries
    CX_CHECK(cx_bn_nbytes(bn_n, &nbytes));
    CX_CHECK(cx_bn_alloc(&bn_x, nbytes));
    CX_CHECK(cx_bn_alloc(&bn_y, nbytes));
    CX_CHECK(cx_bn_alloc(&bn_temp, nbytes));
    CX_CHECK(cx_bn_copy(bn_x, bn_a));
    CX_CHECK(cx_bn_copy(bn_y, bn_b));
    // Calculate the degree of the modulus polynomial
    CX_CHECK(cx_bn_copy(bn_temp, bn_n));
    do {
    CX_CHECK(cx_bn_cmp_u32(bn_temp, (uint32_t) 0, &cmp_x));
    CX_CHECK(cx_bn_shr(bn_temp, 1));
    } while (cmp_x != 0 && ++degree);
    // After loop degree is offset by 1
    degree--;
    if (degree < 1) {
    error = CX_INVALID_PARAMETER;
    goto end;
    }
    // Ensure both operands are in field
    CX_CHECK(cx_bn_shr(bn_x, degree));
    CX_CHECK(cx_bn_shr(bn_y, degree));
    // Maybe change cx_bn_cmp_u32 to cx_bn_cnt_bits
    CX_CHECK(cx_bn_cmp_u32(bn_x, (uint32_t) 0, &cmp_x));
    CX_CHECK(cx_bn_cmp_u32(bn_y, (uint32_t) 0, &cmp_y));
    if (cmp_x != 0 || cmp_y != 0) {
    error = CX_INVALID_PARAMETER;
    goto end;
    }
    // Check if both operands are non-zero
    CX_CHECK(cx_bn_copy(bn_x, bn_a));
    CX_CHECK(cx_bn_copy(bn_y, bn_b));
    // Maybe cx_bn_cmp_u32 change to cx_bn_cnt_bits
    CX_CHECK(cx_bn_cmp_u32(bn_x, (uint32_t) 0, &cmp_x));
    CX_CHECK(cx_bn_cmp_u32(bn_y, (uint32_t) 0, &cmp_y));
    CX_CHECK(cx_bn_set_u32(bn_r, (uint32_t) 0));
    // Main loop for multiplication
    while (cmp_x != 0 && cmp_y != 0) {
    CX_CHECK(cx_bn_tst_bit(bn_y, 0, &bit_set));
    if (bit_set) {
    CX_CHECK(cx_bn_copy(bn_temp, bn_r));
    CX_CHECK(cx_bn_xor(bn_r, bn_x, bn_temp));
    }
    CX_CHECK(cx_bn_shl(bn_x, 1));
    CX_CHECK(cx_bn_tst_bit(bn_x, degree, &bit_set));
    if (bit_set) {
    CX_CHECK(cx_bn_copy(bn_temp, bn_x));
    CX_CHECK(cx_bn_xor(bn_x, bn_n, bn_temp));
    }
    CX_CHECK(cx_bn_shr(bn_y, 1));
    // Maybe change cx_bn_cmp_u32 to cx_bn_cnt_bits
    CX_CHECK(cx_bn_cmp_u32(bn_x, (uint32_t) 0, &cmp_x));
    CX_CHECK(cx_bn_cmp_u32(bn_y, (uint32_t) 0, &cmp_y));
    }
    // Clean up
    CX_CHECK(cx_bn_destroy(&bn_x));
    CX_CHECK(cx_bn_destroy(&bn_y));
    CX_CHECK(cx_bn_destroy(&bn_temp));
    end:
    return error;
    }
    #endif
  • Unit tests use this same function as Nano S and all tests pass
  • An older version of the app used a set of functions from Blockchain Commons for Galois Field multiplication within the app and it worked for Nano S+ (see Check BIP39 generates invalid SSKR shares #23). Apart from dodgy CRC calculations the share data seems correct.
  • Current version uses cx_bn_gf2_n_mul() syscall on Nano S+ and does not seem to be working.

All this evidence would seem to suggest that the cx_bn_gf2_n_mul() syscall on Nano S+ may be flawed.
Another possibility is that the app is not using the syscall correctly.

Unfortunately I do not have a Nano S+ or Nano X device to perform my own hardware testing.

Oh dear!

@aido
Copy link
Owner

aido commented May 7, 2024

Further investigation reveals that the cx_bn_gf2_n_mul() syscall requires a parameter for the Second Montgomery constant which none of the other functions mentioned above use.

The app is using a value of 0xA1 for this constant. However sandra-Ledger has informed me on Ledger's Discord channel that 0x02 would be a more appropriate value. I think the Second Montgomery only makes the multiplication more efficient rather than changing the actual result. But we shall see.

I will try this recommended value and create a new PR for v1.7.3

@aido
Copy link
Owner

aido commented May 30, 2024

Hi @DmKoshelek ,

v1.7.3 has now been deployed and hopefully fixes this issue.

v1.7.3 changes the value of the Second Montgomery constant as suggested by @srasoamiaramanana-ledger. This constamt is used by the cx_bn_gf2_n_mul() syscall on Nano S+ and Nano X devices.

@aido
Copy link
Owner

aido commented May 30, 2024

I have installed v1.7.3 from provider 4 and all looks good on a physical Nano S.

Device Nano S
OS 2.1.0
App Version 1.7.3
Status $${\color{green}working}$$

Note

The Nano S does not use the cx_bn_gf2_n_mul() syscall so v1.7.3 will also need to be tested on a physical Nano S+ or Nano X to confirm that the change to the Montgomery constant fixed this issue.

@DmKoshelek
Copy link
Author

Hi @aido,

Thanks for the update! Today I checked the new version (1.7.3) and, unfortunately, it still doesn't work. First share is fine as previously but 2-6 shares now have repeated "toil" word instead of "drum" on data part of the shares (11-42 words)

Also just for fun I tried two generate 2 shares with 2 shares threshold and both shares was invalid with this configuration and had "able" word on data word positions

OS version: 1.1.1

@aido
Copy link
Owner

aido commented May 30, 2024

Damn! 😢

@DmKoshelek Thanks a lor for testing.

So, here's the situation:

  • All devices work when tested using Speculos. Speculos wraps OpenSSL functions for Galois Field multiplication: WORKS
  • Previous versions of the app used Blockchain Commons implementation of Galois Field multiplication: WORKS
  • Unit tests use an internal cx_bn_gf2_n_mul() function and all tests pass: WORKS
  • Nano S hardware uses the same internal cx_bn_gf2_n_mul() function, not the syscall: WORKS
  • Nano S+ hardware uses the cx_bn_gf2_n_mul() syscall and a Ledger recommended value for the Second Montgomery constant: FAILS

@lpascal-ledger or @srasoamiaramanana-ledger I am loath to say that the above evidence would suggest that the cx_bn_gf2_n_mul() syscall is broken.

I still find it strange that the first share on a Nano S+ seems fine but after that the rest are bad. I will investigate this and try figure out why some hardware devices are doing something different to Speculos. From a previous issue, when Speculos behaved differently to the hardware the syscall turned out to be the issue e.g. cx_crc32(). Just sayin'! 😃

I may create a new PR for a v1.7.4 that replaces the cx_bn_gf2_n_mul() syscall with an in app function on all devices. This should eliminate or confirm cx_bn_gf2_n_mul() as the cause of this issue. Only problem is that the internal function will probably not be as efficient as the syscall.

@srasoamiaramanana-ledger

Hi,
We have some tests for the cx_bn_gf2_n_mul() syscall and I verified that the result for these tests are the same on a physical device (LNS+) and on speculos. We test multiplication over GF(2^32), GF(2^64), GF(2^128). For your specific case, I added test vectors over GF(2^8) and I also tested that the results on a real device and on speculos are identical.
You can find here GF2NMUL.txt the test vectors with the parameters described as follows:
extension degree : modulus : second montgomery constant : e : f : r :
I've generated the parameters for the GF(2^8) multiplication with sagemath as follows:

R.<x> = GF(2)[]
F = GF(2).extension(modulus = x^8 + x^4 + x^3 + x + 1, name = 'a')
for e in F:
    f = F.random_element()
    r = e * f
    le = e.polynomial().coefficients(sparse=False)
    ne = ZZ(le, 2)
    lf = f.polynomial().coefficients(sparse=False)
    nf = ZZ(lf, 2)
    lr = r.polynomial().coefficients(sparse=False)
    nr = ZZ(lr, 2)
    ln = F.modulus().coefficients(sparse=False)
    nn = ZZ(ln, 2)
    h = F(x**256)
    lh = h.polynomial().coefficients(sparse=False)
    nh = ZZ(lh, 2)
    print(f"8:{nn}:{nh}:{ne}:{nf}:{nr}:")

Could you try to call cx_bn_gf2_n_mul() without overlapping the variable for the result and the variables for the operand ?

@aido
Copy link
Owner

aido commented May 31, 2024

@srasoamiaramanana-ledger,

Thanks for investigating.

I shall try what you suggested. I just don't have a physical Nano S+ to test on making this type of troubleshooting difficult.

I had thought that overlapping the result variables with the inputs may possibly cause a problem but the documentation did not discourage it so I didn't think it would be an issue. But we'll see.

I may create a v1.7.4.rc1

@DmKoshelek
Copy link
Author

I shall try what you suggested. I just don't have a physical Nano S+ to test on making this type of troubleshooting difficult.

@aido Maybe I can run some test builds on a physical device if it helps to speed up the bug fixing. Just tell me what to run and how =)

@aido
Copy link
Owner

aido commented May 31, 2024

Thanks @DmKoshelek,
I very much appreciate your assistance on this.

I have created a pull request on the Ledger fork which removes the cx_bn_gf2_n_mul() syscall from the equation altogether. Then by the process of elimination we will know for sure if this is the problem area. Hopefully that pull request can be released on provider 4 for easy installation and testing.

If by this process of elimination we find cx_bn_gf2_n_mul() to be the culprit we can hopefully home in on a solution by implementing the suggestion above:

Could you try to call cx_bn_gf2_n_mul() without overlapping the variable for the result and the variables for the operand ?

We'll get there ... eventually! 😃

Have a great weekend.

@aido aido self-assigned this May 31, 2024
@aido
Copy link
Owner

aido commented Jun 3, 2024

Maybe I can run some test builds on a physical device if it helps to speed up the bug fixing. Just tell me what to run and how

Hi @DmKoshelek,

v1.7.4-rc.1 is now available on provider 4.
I am praying that this version works on Nano S+ and Nano X.

@DmKoshelek
Copy link
Author

Hi @aido

Great news for you, looks like this update is working! I was able to generate 6 shares and recover back my BIP39 phrase using 3 of them. Tomorrow I will try to use 4-6 shares to recover BIP39 seed but I believe it will also work

My congratulation!

@aido
Copy link
Owner

aido commented Jun 3, 2024

That is great news indeed. 🥳

This proves without doubt that the problem area is related to how the app is using the cx_bn_gf2_n_mul() syscall.

I shall now create a v1.7.4-rc.2 that ensures that in all calls to cx_bn_gf2_n_mul() the result parameter does not overlap with the operands. This rule will need to be documented in the SDK also but lets get a working v1.7.4-rc.2 first.

This hopefully will be the final pre-release of a v1.7.4 that should work on all devices without issue! 🤞

Thanks again @DmKoshelek for finding this bug and then helping troubleshoot it.

@aido
Copy link
Owner

aido commented Jun 11, 2024

Hi @DmKoshelek,

v1.7.4-rc.2 is now available to install from provider 4.
I have removed all overlapping results and operands in the cx_bn_gf2_n_mul() syscall and I am confident that this version now works on all devices.
If this version passes the test it will be the last pre-release before going to v1.7.4 final.

@DmKoshelek
Copy link
Author

Hi @aido, I see only v1.7.4-rc.1 version on provider 4 right now =( So I couldn't test it

@aido
Copy link
Owner

aido commented Jun 11, 2024

hmm, strange.
v1.7.4-rc.2 was deployed on provider 4 earlier today and I was able to install it on a Nano S.
Let me double check Ledger Live.

Update
Yep, v1.7.4-rc.2 from provider 4 was installed on my Nano S several hours ago.

6 hours ago @lpascal-ledger stated:

The deployment is in progress, but it could take a few hours depending on CI parameters (tooling update, runners availability, ...) and/or because we generally wait for a minimal batch of apps to sign & release them fully.

@DmKoshelek, I would imagine that if it was available for installation on Nano S it would also be available on the other devices at same time or very shortly thereafter.

@lpascal-ledger
Copy link

lpascal-ledger commented Jun 12, 2024

@DmKoshelek @aido It's probably because I deployed on the newest NanoS+ 1.2.0 OS version which is not fully distributed yet.
I'll deploy it on 1.1.1 today.

@lpascal-ledger
Copy link

Should be good now.

aido added a commit that referenced this issue Jun 14, 2024
@aido
Copy link
Owner

aido commented Jun 16, 2024

Hi @aido, I see only v1.7.4-rc.1 version on provider 4 right now =( So I couldn't test it

Hi @DmKoshelek,

Any luck installing v1.7.4-rc.2 on a Nano S+ with OS version 1.1.1?

@DmKoshelek
Copy link
Author

Hi @aido

Sorry for the delay, I totally forgot to test the new version.

I have just download it and it works fine as well, I was able to create shares and recover my seed phrase on my device!

@DmKoshelek
Copy link
Author

@aido

also I found one new issue during testing, I created a separate topic for it

@aido
Copy link
Owner

aido commented Jun 16, 2024

Thanks @DmKoshelek

While I investigate the new issue can you can confirm if this issue is fixed for threshold values other than 1?

@DmKoshelek
Copy link
Author

@aido, yes, the issue is fixed, thanks again for fixing!

@aido
Copy link
Owner

aido commented Jun 16, 2024

That s excellent. I am quite happy that I can close this issue.

I have also made a small bit of progress with the other issue. It does in fact seem to be happening when testing with Speculos. That will make testing and troubleshooting much much easier and quicker. The problem is most likely to be some edge case that needs to be caught. But we'll see.

@github-project-automation github-project-automation bot moved this from Release 1.7.3 to Done in app-seed-tool Project Jun 16, 2024
@aido aido moved this from Done to Release 1.7.4 in app-seed-tool Project Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Release 1.7.4
Development

Successfully merging a pull request may close this issue.

5 participants