TOTP hotfix: reduce memory usage #385

matheusmoreira · 2024-03-17T23:47:45Z

The TOTP face is working in the simulator but fails on the real hardware when loaded with lots of codes, just like the LFS version. This is likely caused by the recent refactoring of the TOTP face which introduced a declarative credential interface for ease of use. That's accomplished by decoding the secrets at runtime which increases the RAM requirements. Users are likely hitting memory limits.

In order to mitigate this, the algorithm is changed from decoding all of the secrets only once during initialization to on the fly decoding of the secret for the current TOTP credential only. This converts this face's dynamic memory usage from O(N) to O(1) at the cost of memory management when switching faces and credentials which could impact power consumption.

Due to variable key sizes, the memory cannot be statically allocated. Perhaps there's a maximum key size that can serve as worst case?

Also took this opportunity to restructure the code a bit. Also added code to check for memory allocation failure.

The TOTP face is working in the simulator but fails on the real hardware when loaded with lots of codes, just like the LFS version. This is likely caused by the recent refactoring of the TOTP face which introduced a declarative credential interface for ease of use. That's accomplished by decoding the secrets at runtime which increases the RAM requirements. Users are likely hitting memory limits. In order to mitigate this, the algorithm is changed from decoding all of the secrets only once during initialization to on the fly decoding of the secret for the current TOTP credential only. This converts this face's dynamic memory usage from O(N) to O(1) at the cost of memory management when switching faces and credentials which could impact power consumption. Issue is confirmed fixed by author of issue who has tested it on real hardware. Fixes joeycastillo#384. Due to variable key sizes, the memory cannot be statically allocated. Perhaps there's a maximum key size that can serve as worst case? Also took this opportunity to restructure the code a bit. Also added code to check for memory allocation failure. Reported-by: madhogs <[email protected]> Fixed-by: Matheus Afonso Martins Moreira <[email protected]> Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Issue: joeycastillo#384

maxz · 2024-03-18T01:43:37Z

Due to variable key sizes, the memory cannot be statically allocated. Perhaps there's a maximum key size that can serve as worst case?

There is not. I looked at that before. There is no limit in the standard. The biggest in-the-wild key I could find was 128 bytes.

I'll repeat my comment from #384 here:
Unless you find another solution to the possible repeated runtime allocation (on a quick glance it at least does not seem to call malloc for the same entry twice but to use memset) and decoding I guess I would prefer for the solution in the main branch to be the one from my PR #356 which has to deal with the explicit key array but has very little runtime overhead. Frankly I'd prefer a more complicated setup over increased battery usage.
I will at least maintain my own patch for my watch if this behaviour remains.

@joeycastillo also spoke out against runtime heap allocation within faces in the past (But I can't find the PR right now, it was some PR by @theAlexes so they might remember which one it was.)
In that particular case I think he accepted the allocation in the face's initialisier.

theAlexes · 2024-03-18T02:18:56Z

yeah, given how resource-constrained the RAM on this chip is, i think we should revert to the implementation with a static key array.

madhogs · 2024-03-18T08:00:40Z

I've tested this now on the sensor watch lite and it fixes the issue completely. I have checked the values too with my real codes and they are correct. To check the extreme I also tested a large number (100+) of 128 character secrets and that also works without any issues on the real watch.

To give my 2 cents on the above conversation:
I think this new method of configuring the secrets is a great improvement on having to manually enter the byte array, it is also simpler and less error prone for new users to configure their totp codes for the watch. As the physical watch can handle this code ram wise even for a large number of very large secrets I don't personally see why that would be a reason to go back to the old implementation.
In terms of power consumption, i don't know how much more this would use but I also would be surprised if it had any effect in the real world as users are unlikely to spend any significant amount of time cycling through the codes day to day (after setting up the watch of course).

All in all, unless there is something I fail to grasp I personally would prefer to keep this new implementation of managing credentials. If it is decided to go back to entering the raw bytes, I hope there is a way to keep an option for the users of adding the secrets directly, maybe through making similar changes to the lfs face? or having this as an optional configuration or a new totp face.

matheusmoreira · 2024-03-18T11:39:33Z

There is not. I looked at that before. There is no limit in the standard. The biggest in-the-wild key I could find was 128 bytes.

That's unfortunate. We could set up a maximum size that's overridable by preprocessor constant. That way we can have 128 bytes as a worst case and the user can override it if it turns out to not be enough, or if they know they won't need all those bytes.

This would completely eliminate the dynamic memory allocation at the cost of wasting something like ~90 bytes in the common case and maybe forcing the user to decide the maximum size of the variable, something that's likely to generate support requests in the discord.

Honestly this is something the compiler should be figuring out on its own. The C prepocessor is too limited...

on a quick glance it at least does not seem to call malloc for the same entry twice but to use memset

It uses malloc and memset once during watch initialization to allocate and zero the TOTP state structure. This memory is not freed and has essentially unlimited lifetime.

Then it uses malloc and free in the following patterns:

Face enters the foreground
- malloc(current_decoded_key)
Face leaves the foreground
- free(current_decoded_key)
User switches TOTP code
- free(current_decoded_key)
- current++
- malloc(current_decoded_key)

decoding I guess I would prefer for the solution in the main branch to be the one from my PR #356 which has to deal with the explicit key array but has very little runtime overhead.

Me too. The very next thing I'll do is work on getting the script you made merged and integrated into the build system. It's an excellent solution which has the added advantage of not requiring the user to edit the source code.

Frankly I'd prefer a more complicated setup over increased battery usage.

Me too... But we are programmers. Users might not be comfortable with munging bytes and editing them into source code.

The watch face's documentation also suggests using a web service to decode the bytes which could leak the secret. I really wanted to avoid the need for that.

I will at least maintain my own patch for my watch if this behaviour remains.

We should be able to resolve it soon. The static allocation approach is viable, and your Python script is the best solution.

@joeycastillo also spoke out against runtime heap allocation within faces
In that particular case I think he accepted the allocation in the face's initialisier.

Yes, in the discord:

fwiw I think it’s a best practice to avoid dynamic memory allocation outside of the watch face setup function. As it stands now (at least in the case where the watch is on-wrist and not plugged into USB) memory is only allocated once, at boot, and that doesn’t change until the battery dies. this feels good for the stability of the thing when it has to run for, y’know, two years

I agree with him and I will work to eliminate these dynamic allocations.

matheusmoreira · 2024-03-18T11:42:32Z

@theAlexes

yeah, given how resource-constrained the RAM on this chip is, i think we should revert to the implementation with a static key array.

I'm confident I can remove the need for those allocations by allocating a static 128 byte array and making it user overridable. I'll push a new commit soon.

matheusmoreira · 2024-03-18T14:02:25Z

Dynamic memory allocation has been removed. Now there's a single 128 byte buffer that's allocated on face setup. It's also compile time overridable by the user. I refrained from using a static buffer because movement supports multiple instances of the same face and the static approach fails to be reeentrant.

The key lengths are checked on initialization and if any key is found to be too large to fit the buffer it is turned off and ERROR is displayed on the watch instead of the code. The base32 encoded secrets are decoded dynamically to the buffer at the following times:

Face enters the foreground
User switches TOTP code

Therefore, there is still some extra runtime overhead that can still be eliminated via the use of generator scripts. In the future, I will make it so there is no overhead at all.

@madhogs I would really appreciate it if you could test this new version on your watch too! I'll make sure you are credited in the commit message just like before!

Allocate an unlimited extent 128 byte buffer once during setup instead of allocating and deallocating repeatedly. A static buffer was not used because it fails to be reentrant and prevents multiple instances of the watch face to be compiled by the user. The advantage is the complete prevention of memory management errors, improving the reliability of the watch. It also eliminates the overhead of the memory allocator itself since malloc is not free. The disadvantage is a worst case default size of 128 bytes was required, meaning about 90 bytes will be wasted in the common case since most keys are not that big. This can be overridden by the user via preprocessor. The key lengths are checked on TOTP watch face initialization and if any key is found to be too large to fit the buffer it is turned off and the label and ERROR is displayed instead. The base32 encoded secrets are decoded dynamically to the buffer at the following times: - Face enters the foreground - User switches TOTP code Therefore, there is still some extra runtime overhead that can still be eliminated by code generation. This will be addressed in future commits. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: joeycastillo#385

madhogs · 2024-03-18T15:58:20Z

@matheusmoreira I've checked the latest code on my watch, no issues, all is working. Checked values of real codes again and the 100+ large secrets, both are fine. 👍

The watch face's documentation also suggests using a web service to decode the bytes which could leak the secret. I really wanted to avoid the need for that.

I was really surprised by this too and think its a very bad thing to be suggesting users to do!

matheusmoreira · 2024-03-18T16:14:16Z

Great. Everything seems in order for the hotfix to be merged then.

Forgot to call watch_display_string on the error message. Of course the message isn't going to be displayed. Also, increase the buffer size to 10 characters and output a space to the last position. This ensures the segments are cleared. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: joeycastillo#385

Fixes a division by zero bug caused by calling getCodeFromTimestamp without having initialized the TOTP library with a secret first. This was happening because the face calls totp_display on activation, meaning the validity of the secret was not checked since this is done in the generate function. Now the validity of the key is determined solely by the size of the current decoded key. A general display function checks it and decides whether to display the code or just the error message. The size of the current decoded key is initialized to zero on watch face activation, ensuring fail safe operation. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: joeycastillo#385

matheusmoreira · 2024-03-20T14:47:40Z

@madhogs Are you available for some additional hardware testing?

I just pushed some additional bug fixes. There were some bugs in the error handling. Now the face should fail gracefully with an error message even if all keys are > 128 bytes. Can you please test that failure case?

madhogs · 2024-03-20T15:23:48Z

Just did the same checks as before on the physical watch and all working still 👍
Also added several very long keys (>> 128 characters) and it displays the word error as i assume is intended.

matheusmoreira · 2024-03-20T15:51:20Z

Thank you!!

@theAlexes The pull request is ready for your review!

Forgot to call watch_display_string on the error message. Of course the message isn't going to be displayed. Also, increase the buffer size to 10 characters and output a space to the last position. This ensures the segments are cleared. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: joeycastillo#385

Fixes a division by zero bug caused by calling getCodeFromTimestamp without having initialized the TOTP library with a secret first. This was happening because the face calls totp_display on activation, meaning the validity of the secret was not checked since this is done in the generate function. Now the validity of the key is determined solely by the size of the current decoded key. A general display function checks it and decides whether to display the code or just the error message. The size of the current decoded key is initialized to zero on watch face activation, ensuring fail safe operation. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: joeycastillo#385

madhogs · 2024-03-24T10:12:29Z

Just to update i've been using this on my physical watch and using the codes for almost a week now and no issues 😄

Allocate an unlimited extent 128 byte buffer once during setup instead of allocating and deallocating repeatedly. A static buffer was not used because it fails to be reentrant and prevents multiple instances of the watch face to be compiled by the user. The advantage is the complete prevention of memory management errors, improving the reliability of the watch. It also eliminates the overhead of the memory allocator itself since malloc is not free. The disadvantage is a worst case default size of 128 bytes was required, meaning about 90 bytes will be wasted in the common case since most keys are not that big. This can be overridden by the user via preprocessor. The key lengths are checked on TOTP watch face initialization and if any key is found to be too large to fit the buffer it is turned off and the label and ERROR is displayed instead. The base32 encoded secrets are decoded dynamically to the buffer at the following times: - Face enters the foreground - User switches TOTP code Therefore, there is still some extra runtime overhead that can still be eliminated by code generation. This will be addressed in future commits. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: joeycastillo#385

Forgot to call watch_display_string on the error message. Of course the message isn't going to be displayed. Also, increase the buffer size to 10 characters and output a space to the last position. This ensures the segments are cleared. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: joeycastillo#385

Fixes a division by zero bug caused by calling getCodeFromTimestamp without having initialized the TOTP library with a secret first. This was happening because the face calls totp_display on activation, meaning the validity of the secret was not checked since this is done in the generate function. Now the validity of the key is determined solely by the size of the current decoded key. A general display function checks it and decides whether to display the code or just the error message. The size of the current decoded key is initialized to zero on watch face activation, ensuring fail safe operation. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: joeycastillo#385

theAlexes · 2024-03-31T00:38:51Z

Alright, going to merge this in :)

Allocate an unlimited extent 128 byte buffer once during setup instead of allocating and deallocating repeatedly. A static buffer was not used because it fails to be reentrant and prevents multiple instances of the watch face to be compiled by the user. The advantage is the complete prevention of memory management errors, improving the reliability of the watch. It also eliminates the overhead of the memory allocator itself since malloc is not free. The disadvantage is a worst case default size of 128 bytes was required, meaning about 90 bytes will be wasted in the common case since most keys are not that big. This can be overridden by the user via preprocessor. The key lengths are checked on TOTP watch face initialization and if any key is found to be too large to fit the buffer it is turned off and the label and ERROR is displayed instead. The base32 encoded secrets are decoded dynamically to the buffer at the following times: - Face enters the foreground - User switches TOTP code Therefore, there is still some extra runtime overhead that can still be eliminated by code generation. This will be addressed in future commits. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: #385

Forgot to call watch_display_string on the error message. Of course the message isn't going to be displayed. Also, increase the buffer size to 10 characters and output a space to the last position. This ensures the segments are cleared. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: #385

Fixes a division by zero bug caused by calling getCodeFromTimestamp without having initialized the TOTP library with a secret first. This was happening because the face calls totp_display on activation, meaning the validity of the secret was not checked since this is done in the generate function. Now the validity of the key is determined solely by the size of the current decoded key. A general display function checks it and decides whether to display the code or just the error message. The size of the current decoded key is initialized to zero on watch face activation, ensuring fail safe operation. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: #385

matheusmoreira force-pushed the totp-hot-patch branch from b52f105 to df38c26 Compare March 17, 2024 23:50

matheusmoreira force-pushed the totp-hot-patch branch from 4749f92 to 3f850d7 Compare March 18, 2024 14:45

matheusmoreira added 2 commits March 20, 2024 12:54

matheusmoreira force-pushed the totp-hot-patch branch from 85278b9 to 10701f3 Compare March 20, 2024 15:54

theAlexes merged commit 1e29fe8 into joeycastillo:main Mar 31, 2024
2 checks passed

matheusmoreira deleted the totp-hot-patch branch March 31, 2024 22:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TOTP hotfix: reduce memory usage #385

TOTP hotfix: reduce memory usage #385

matheusmoreira commented Mar 17, 2024

maxz commented Mar 18, 2024

theAlexes commented Mar 18, 2024

madhogs commented Mar 18, 2024

matheusmoreira commented Mar 18, 2024

matheusmoreira commented Mar 18, 2024

matheusmoreira commented Mar 18, 2024

madhogs commented Mar 18, 2024 •

edited

Loading

matheusmoreira commented Mar 18, 2024

matheusmoreira commented Mar 20, 2024

madhogs commented Mar 20, 2024

matheusmoreira commented Mar 20, 2024

madhogs commented Mar 24, 2024

theAlexes commented Mar 31, 2024

TOTP hotfix: reduce memory usage #385

TOTP hotfix: reduce memory usage #385

Conversation

matheusmoreira commented Mar 17, 2024

maxz commented Mar 18, 2024

theAlexes commented Mar 18, 2024

madhogs commented Mar 18, 2024

matheusmoreira commented Mar 18, 2024

matheusmoreira commented Mar 18, 2024

matheusmoreira commented Mar 18, 2024

madhogs commented Mar 18, 2024 • edited Loading

matheusmoreira commented Mar 18, 2024

matheusmoreira commented Mar 20, 2024

madhogs commented Mar 20, 2024

matheusmoreira commented Mar 20, 2024

madhogs commented Mar 24, 2024

theAlexes commented Mar 31, 2024

madhogs commented Mar 18, 2024 •

edited

Loading