Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOTP hotfix: reduce memory usage #385

Merged
merged 4 commits into from
Mar 31, 2024

Conversation

matheusmoreira
Copy link
Collaborator

The TOTP face is working in the simulator but fails on the real hardware when loaded with lots of codes, just like the LFS version. This is likely caused by the recent refactoring of the TOTP face which introduced a declarative credential interface for ease of use. That's accomplished by decoding the secrets at runtime which increases the RAM requirements. Users are likely hitting memory limits.

In order to mitigate this, the algorithm is changed from decoding all of the secrets only once during initialization to on the fly decoding of the secret for the current TOTP credential only. This converts this face's dynamic memory usage from O(N) to O(1) at the cost of memory management when switching faces and credentials which could impact power consumption.

Due to variable key sizes, the memory cannot be statically allocated. Perhaps there's a maximum key size that can serve as worst case?

Also took this opportunity to restructure the code a bit. Also added code to check for memory allocation failure.

The TOTP face is working in the simulator but fails on the real hardware
when loaded with lots of codes, just like the LFS version.
This is likely caused by the recent refactoring of the TOTP face
which introduced a declarative credential interface for ease of use.
That's accomplished by decoding the secrets at runtime which increases
the RAM requirements. Users are likely hitting memory limits.

In order to mitigate this, the algorithm is changed from decoding
all of the secrets only once during initialization to on the fly
decoding of the secret for the current TOTP credential only.
This converts this face's dynamic memory usage from O(N) to O(1)
at the cost of memory management when switching faces and credentials
which could impact power consumption. Issue is confirmed fixed by
author of issue who has tested it on real hardware. Fixes joeycastillo#384.

Due to variable key sizes, the memory cannot be statically allocated.
Perhaps there's a maximum key size that can serve as worst case?

Also took this opportunity to restructure the code a bit.
Also added code to check for memory allocation failure.

Reported-by: madhogs <[email protected]>
Fixed-by: Matheus Afonso Martins Moreira <[email protected]>
Tested-by: Matheus Afonso Martins Moreira <[email protected]>
Tested-on-hardware-by: madhogs <[email protected]>
Signed-off-by: Matheus Afonso Martins Moreira <[email protected]>
GitHub-Issue: joeycastillo#384
@maxz
Copy link
Contributor

maxz commented Mar 18, 2024

Due to variable key sizes, the memory cannot be statically allocated. Perhaps there's a maximum key size that can serve as worst case?

There is not. I looked at that before. There is no limit in the standard. The biggest in-the-wild key I could find was 128 bytes.

I'll repeat my comment from #384 here:
Unless you find another solution to the possible repeated runtime allocation (on a quick glance it at least does not seem to call malloc for the same entry twice but to use memset) and decoding I guess I would prefer for the solution in the main branch to be the one from my PR #356 which has to deal with the explicit key array but has very little runtime overhead. Frankly I'd prefer a more complicated setup over increased battery usage.
I will at least maintain my own patch for my watch if this behaviour remains.

@joeycastillo also spoke out against runtime heap allocation within faces in the past (But I can't find the PR right now, it was some PR by @theAlexes so they might remember which one it was.)
In that particular case I think he accepted the allocation in the face's initialisier.

@theAlexes
Copy link
Collaborator

yeah, given how resource-constrained the RAM on this chip is, i think we should revert to the implementation with a static key array.

@madhogs
Copy link
Contributor

madhogs commented Mar 18, 2024

I've tested this now on the sensor watch lite and it fixes the issue completely. I have checked the values too with my real codes and they are correct. To check the extreme I also tested a large number (100+) of 128 character secrets and that also works without any issues on the real watch.

To give my 2 cents on the above conversation:
I think this new method of configuring the secrets is a great improvement on having to manually enter the byte array, it is also simpler and less error prone for new users to configure their totp codes for the watch. As the physical watch can handle this code ram wise even for a large number of very large secrets I don't personally see why that would be a reason to go back to the old implementation.
In terms of power consumption, i don't know how much more this would use but I also would be surprised if it had any effect in the real world as users are unlikely to spend any significant amount of time cycling through the codes day to day (after setting up the watch of course).

All in all, unless there is something I fail to grasp I personally would prefer to keep this new implementation of managing credentials. If it is decided to go back to entering the raw bytes, I hope there is a way to keep an option for the users of adding the secrets directly, maybe through making similar changes to the lfs face? or having this as an optional configuration or a new totp face.

@matheusmoreira
Copy link
Collaborator Author

There is not. I looked at that before. There is no limit in the standard. The biggest in-the-wild key I could find was 128 bytes.

That's unfortunate. We could set up a maximum size that's overridable by preprocessor constant. That way we can have 128 bytes as a worst case and the user can override it if it turns out to not be enough, or if they know they won't need all those bytes.

This would completely eliminate the dynamic memory allocation at the cost of wasting something like ~90 bytes in the common case and maybe forcing the user to decide the maximum size of the variable, something that's likely to generate support requests in the discord.

Honestly this is something the compiler should be figuring out on its own. The C prepocessor is too limited...

on a quick glance it at least does not seem to call malloc for the same entry twice but to use memset

It uses malloc and memset once during watch initialization to allocate and zero the TOTP state structure. This memory is not freed and has essentially unlimited lifetime.

Then it uses malloc and free in the following patterns:

  • Face enters the foreground
    • malloc(current_decoded_key)
  • Face leaves the foreground
    • free(current_decoded_key)
  • User switches TOTP code
    • free(current_decoded_key)
    • current++
    • malloc(current_decoded_key)

decoding I guess I would prefer for the solution in the main branch to be the one from my PR #356 which has to deal with the explicit key array but has very little runtime overhead.

Me too. The very next thing I'll do is work on getting the script you made merged and integrated into the build system. It's an excellent solution which has the added advantage of not requiring the user to edit the source code.

Frankly I'd prefer a more complicated setup over increased battery usage.

Me too... But we are programmers. Users might not be comfortable with munging bytes and editing them into source code.

The watch face's documentation also suggests using a web service to decode the bytes which could leak the secret. I really wanted to avoid the need for that.

I will at least maintain my own patch for my watch if this behaviour remains.

We should be able to resolve it soon. The static allocation approach is viable, and your Python script is the best solution.

@joeycastillo also spoke out against runtime heap allocation within faces
In that particular case I think he accepted the allocation in the face's initialisier.

Yes, in the discord:

fwiw I think it’s a best practice to avoid dynamic memory allocation outside of the watch face setup function. As it stands now (at least in the case where the watch is on-wrist and not plugged into USB) memory is only allocated once, at boot, and that doesn’t change until the battery dies. this feels good for the stability of the thing when it has to run for, y’know, two years

I agree with him and I will work to eliminate these dynamic allocations.

@matheusmoreira
Copy link
Collaborator Author

@theAlexes

yeah, given how resource-constrained the RAM on this chip is, i think we should revert to the implementation with a static key array.

I'm confident I can remove the need for those allocations by allocating a static 128 byte array and making it user overridable. I'll push a new commit soon.

@matheusmoreira
Copy link
Collaborator Author

Dynamic memory allocation has been removed. Now there's a single 128 byte buffer that's allocated on face setup. It's also compile time overridable by the user. I refrained from using a static buffer because movement supports multiple instances of the same face and the static approach fails to be reeentrant.

The key lengths are checked on initialization and if any key is found to be too large to fit the buffer it is turned off and ERROR is displayed on the watch instead of the code. The base32 encoded secrets are decoded dynamically to the buffer at the following times:

  • Face enters the foreground
  • User switches TOTP code

Therefore, there is still some extra runtime overhead that can still be eliminated via the use of generator scripts. In the future, I will make it so there is no overhead at all.

@madhogs I would really appreciate it if you could test this new version on your watch too! I'll make sure you are credited in the commit message just like before!

Allocate an unlimited extent 128 byte buffer once during setup
instead of allocating and deallocating repeatedly. A static buffer
was not used because it fails to be reentrant and prevents multiple
instances of the watch face to be compiled by the user.

The advantage is the complete prevention of memory management errors,
improving the reliability of the watch. It also eliminates the overhead
of the memory allocator itself since malloc is not free.
The disadvantage is a worst case default size of 128 bytes was required,
meaning about 90 bytes will be wasted in the common case since most keys
are not that big. This can be overridden by the user via preprocessor.

The key lengths are checked on TOTP watch face initialization
and if any key is found to be too large to fit the buffer
it is turned off and the label and ERROR is displayed instead.

The base32 encoded secrets are decoded dynamically to the buffer
at the following times:

 - Face enters the foreground
 - User switches TOTP code

Therefore, there is still some extra runtime overhead
that can still be eliminated by code generation.
This will be addressed in future commits.

Tested-by: Matheus Afonso Martins Moreira <[email protected]>
Tested-on-hardware-by: madhogs <[email protected]>
Signed-off-by: Matheus Afonso Martins Moreira <[email protected]>
GitHub-Pull-Request: joeycastillo#385
@madhogs
Copy link
Contributor

madhogs commented Mar 18, 2024

@matheusmoreira I've checked the latest code on my watch, no issues, all is working. Checked values of real codes again and the 100+ large secrets, both are fine. 👍

The watch face's documentation also suggests using a web service to decode the bytes which could leak the secret. I really wanted to avoid the need for that.

I was really surprised by this too and think its a very bad thing to be suggesting users to do!

@matheusmoreira
Copy link
Collaborator Author

Great. Everything seems in order for the hotfix to be merged then.

matheusmoreira added a commit to matheusmoreira/sensor-watch that referenced this pull request Mar 20, 2024
Forgot to call watch_display_string on the error message.
Of course the message isn't going to be displayed.

Also, increase the buffer size to 10 characters
and output a space to the last position.
This ensures the segments are cleared.

Tested-by: Matheus Afonso Martins Moreira <[email protected]>
Signed-off-by: Matheus Afonso Martins Moreira <[email protected]>
GitHub-Pull-Request: joeycastillo#385
matheusmoreira added a commit to matheusmoreira/sensor-watch that referenced this pull request Mar 20, 2024
Fixes a division by zero bug caused by calling getCodeFromTimestamp
without having initialized the TOTP library with a secret first.
This was happening because the face calls totp_display on activation,
meaning the validity of the secret was not checked since this is
done in the generate function.

Now the validity of the key is determined solely by the size
of the current decoded key. A general display function checks it
and decides whether to display the code or just the error message.

The size of the current decoded key is initialized to zero
on watch face activation, ensuring fail safe operation.

Tested-by: Matheus Afonso Martins Moreira <[email protected]>
Signed-off-by: Matheus Afonso Martins Moreira <[email protected]>
GitHub-Pull-Request: joeycastillo#385
@matheusmoreira
Copy link
Collaborator Author

@madhogs Are you available for some additional hardware testing?

I just pushed some additional bug fixes. There were some bugs in the error handling. Now the face should fail gracefully with an error message even if all keys are > 128 bytes. Can you please test that failure case?

@madhogs
Copy link
Contributor

madhogs commented Mar 20, 2024

Just did the same checks as before on the physical watch and all working still 👍
Also added several very long keys (>> 128 characters) and it displays the word error as i assume is intended.

@matheusmoreira
Copy link
Collaborator Author

Thank you!!

@theAlexes The pull request is ready for your review!

Forgot to call watch_display_string on the error message.
Of course the message isn't going to be displayed.

Also, increase the buffer size to 10 characters
and output a space to the last position.
This ensures the segments are cleared.

Tested-by: Matheus Afonso Martins Moreira <[email protected]>
Tested-on-hardware-by: madhogs <[email protected]>
Signed-off-by: Matheus Afonso Martins Moreira <[email protected]>
GitHub-Pull-Request: joeycastillo#385
Fixes a division by zero bug caused by calling getCodeFromTimestamp
without having initialized the TOTP library with a secret first.
This was happening because the face calls totp_display on activation,
meaning the validity of the secret was not checked since this is
done in the generate function.

Now the validity of the key is determined solely by the size
of the current decoded key. A general display function checks it
and decides whether to display the code or just the error message.

The size of the current decoded key is initialized to zero
on watch face activation, ensuring fail safe operation.

Tested-by: Matheus Afonso Martins Moreira <[email protected]>
Tested-on-hardware-by: madhogs <[email protected]>
Signed-off-by: Matheus Afonso Martins Moreira <[email protected]>
GitHub-Pull-Request: joeycastillo#385
@madhogs
Copy link
Contributor

madhogs commented Mar 24, 2024

Just to update i've been using this on my physical watch and using the codes for almost a week now and no issues 😄

madhogs pushed a commit to madhogs/Sensor-Watch that referenced this pull request Mar 24, 2024
Allocate an unlimited extent 128 byte buffer once during setup
instead of allocating and deallocating repeatedly. A static buffer
was not used because it fails to be reentrant and prevents multiple
instances of the watch face to be compiled by the user.

The advantage is the complete prevention of memory management errors,
improving the reliability of the watch. It also eliminates the overhead
of the memory allocator itself since malloc is not free.
The disadvantage is a worst case default size of 128 bytes was required,
meaning about 90 bytes will be wasted in the common case since most keys
are not that big. This can be overridden by the user via preprocessor.

The key lengths are checked on TOTP watch face initialization
and if any key is found to be too large to fit the buffer
it is turned off and the label and ERROR is displayed instead.

The base32 encoded secrets are decoded dynamically to the buffer
at the following times:

 - Face enters the foreground
 - User switches TOTP code

Therefore, there is still some extra runtime overhead
that can still be eliminated by code generation.
This will be addressed in future commits.

Tested-by: Matheus Afonso Martins Moreira <[email protected]>
Tested-on-hardware-by: madhogs <[email protected]>
Signed-off-by: Matheus Afonso Martins Moreira <[email protected]>
GitHub-Pull-Request: joeycastillo#385
madhogs pushed a commit to madhogs/Sensor-Watch that referenced this pull request Mar 24, 2024
Forgot to call watch_display_string on the error message.
Of course the message isn't going to be displayed.

Also, increase the buffer size to 10 characters
and output a space to the last position.
This ensures the segments are cleared.

Tested-by: Matheus Afonso Martins Moreira <[email protected]>
Tested-on-hardware-by: madhogs <[email protected]>
Signed-off-by: Matheus Afonso Martins Moreira <[email protected]>
GitHub-Pull-Request: joeycastillo#385
madhogs pushed a commit to madhogs/Sensor-Watch that referenced this pull request Mar 24, 2024
Fixes a division by zero bug caused by calling getCodeFromTimestamp
without having initialized the TOTP library with a secret first.
This was happening because the face calls totp_display on activation,
meaning the validity of the secret was not checked since this is
done in the generate function.

Now the validity of the key is determined solely by the size
of the current decoded key. A general display function checks it
and decides whether to display the code or just the error message.

The size of the current decoded key is initialized to zero
on watch face activation, ensuring fail safe operation.

Tested-by: Matheus Afonso Martins Moreira <[email protected]>
Tested-on-hardware-by: madhogs <[email protected]>
Signed-off-by: Matheus Afonso Martins Moreira <[email protected]>
GitHub-Pull-Request: joeycastillo#385
@theAlexes
Copy link
Collaborator

Alright, going to merge this in :)

@theAlexes theAlexes merged commit 1e29fe8 into joeycastillo:main Mar 31, 2024
2 checks passed
@matheusmoreira matheusmoreira deleted the totp-hot-patch branch March 31, 2024 22:42
matheusmoreira added a commit that referenced this pull request Aug 26, 2024
Allocate an unlimited extent 128 byte buffer once during setup
instead of allocating and deallocating repeatedly. A static buffer
was not used because it fails to be reentrant and prevents multiple
instances of the watch face to be compiled by the user.

The advantage is the complete prevention of memory management errors,
improving the reliability of the watch. It also eliminates the overhead
of the memory allocator itself since malloc is not free.
The disadvantage is a worst case default size of 128 bytes was required,
meaning about 90 bytes will be wasted in the common case since most keys
are not that big. This can be overridden by the user via preprocessor.

The key lengths are checked on TOTP watch face initialization
and if any key is found to be too large to fit the buffer
it is turned off and the label and ERROR is displayed instead.

The base32 encoded secrets are decoded dynamically to the buffer
at the following times:

 - Face enters the foreground
 - User switches TOTP code

Therefore, there is still some extra runtime overhead
that can still be eliminated by code generation.
This will be addressed in future commits.

Tested-by: Matheus Afonso Martins Moreira <[email protected]>
Tested-on-hardware-by: madhogs <[email protected]>
Signed-off-by: Matheus Afonso Martins Moreira <[email protected]>
GitHub-Pull-Request: #385
matheusmoreira added a commit that referenced this pull request Aug 26, 2024
Forgot to call watch_display_string on the error message.
Of course the message isn't going to be displayed.

Also, increase the buffer size to 10 characters
and output a space to the last position.
This ensures the segments are cleared.

Tested-by: Matheus Afonso Martins Moreira <[email protected]>
Tested-on-hardware-by: madhogs <[email protected]>
Signed-off-by: Matheus Afonso Martins Moreira <[email protected]>
GitHub-Pull-Request: #385
matheusmoreira added a commit that referenced this pull request Aug 26, 2024
Fixes a division by zero bug caused by calling getCodeFromTimestamp
without having initialized the TOTP library with a secret first.
This was happening because the face calls totp_display on activation,
meaning the validity of the secret was not checked since this is
done in the generate function.

Now the validity of the key is determined solely by the size
of the current decoded key. A general display function checks it
and decides whether to display the code or just the error message.

The size of the current decoded key is initialized to zero
on watch face activation, ensuring fail safe operation.

Tested-by: Matheus Afonso Martins Moreira <[email protected]>
Tested-on-hardware-by: madhogs <[email protected]>
Signed-off-by: Matheus Afonso Martins Moreira <[email protected]>
GitHub-Pull-Request: #385
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants