-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TOTP hotfix: reduce memory usage #385
Conversation
The TOTP face is working in the simulator but fails on the real hardware when loaded with lots of codes, just like the LFS version. This is likely caused by the recent refactoring of the TOTP face which introduced a declarative credential interface for ease of use. That's accomplished by decoding the secrets at runtime which increases the RAM requirements. Users are likely hitting memory limits. In order to mitigate this, the algorithm is changed from decoding all of the secrets only once during initialization to on the fly decoding of the secret for the current TOTP credential only. This converts this face's dynamic memory usage from O(N) to O(1) at the cost of memory management when switching faces and credentials which could impact power consumption. Issue is confirmed fixed by author of issue who has tested it on real hardware. Fixes joeycastillo#384. Due to variable key sizes, the memory cannot be statically allocated. Perhaps there's a maximum key size that can serve as worst case? Also took this opportunity to restructure the code a bit. Also added code to check for memory allocation failure. Reported-by: madhogs <[email protected]> Fixed-by: Matheus Afonso Martins Moreira <[email protected]> Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Issue: joeycastillo#384
b52f105
to
df38c26
Compare
There is not. I looked at that before. There is no limit in the standard. The biggest in-the-wild key I could find was 128 bytes. I'll repeat my comment from #384 here: @joeycastillo also spoke out against runtime heap allocation within faces in the past (But I can't find the PR right now, it was some PR by @theAlexes so they might remember which one it was.) |
yeah, given how resource-constrained the RAM on this chip is, i think we should revert to the implementation with a static key array. |
I've tested this now on the sensor watch lite and it fixes the issue completely. I have checked the values too with my real codes and they are correct. To check the extreme I also tested a large number (100+) of 128 character secrets and that also works without any issues on the real watch. To give my 2 cents on the above conversation: All in all, unless there is something I fail to grasp I personally would prefer to keep this new implementation of managing credentials. If it is decided to go back to entering the raw bytes, I hope there is a way to keep an option for the users of adding the secrets directly, maybe through making similar changes to the lfs face? or having this as an optional configuration or a new totp face. |
That's unfortunate. We could set up a maximum size that's overridable by preprocessor constant. That way we can have 128 bytes as a worst case and the user can override it if it turns out to not be enough, or if they know they won't need all those bytes. This would completely eliminate the dynamic memory allocation at the cost of wasting something like ~90 bytes in the common case and maybe forcing the user to decide the maximum size of the variable, something that's likely to generate support requests in the discord. Honestly this is something the compiler should be figuring out on its own. The C prepocessor is too limited...
It uses Then it uses
Me too. The very next thing I'll do is work on getting the script you made merged and integrated into the build system. It's an excellent solution which has the added advantage of not requiring the user to edit the source code.
Me too... But we are programmers. Users might not be comfortable with munging bytes and editing them into source code. The watch face's documentation also suggests using a web service to decode the bytes which could leak the secret. I really wanted to avoid the need for that.
We should be able to resolve it soon. The static allocation approach is viable, and your Python script is the best solution.
Yes, in the discord:
I agree with him and I will work to eliminate these dynamic allocations. |
I'm confident I can remove the need for those allocations by allocating a static 128 byte array and making it user overridable. I'll push a new commit soon. |
Dynamic memory allocation has been removed. Now there's a single 128 byte buffer that's allocated on face setup. It's also compile time overridable by the user. I refrained from using a static buffer because movement supports multiple instances of the same face and the static approach fails to be reeentrant. The key lengths are checked on initialization and if any key is found to be too large to fit the buffer it is turned off and
Therefore, there is still some extra runtime overhead that can still be eliminated via the use of generator scripts. In the future, I will make it so there is no overhead at all. @madhogs I would really appreciate it if you could test this new version on your watch too! I'll make sure you are credited in the commit message just like before! |
Allocate an unlimited extent 128 byte buffer once during setup instead of allocating and deallocating repeatedly. A static buffer was not used because it fails to be reentrant and prevents multiple instances of the watch face to be compiled by the user. The advantage is the complete prevention of memory management errors, improving the reliability of the watch. It also eliminates the overhead of the memory allocator itself since malloc is not free. The disadvantage is a worst case default size of 128 bytes was required, meaning about 90 bytes will be wasted in the common case since most keys are not that big. This can be overridden by the user via preprocessor. The key lengths are checked on TOTP watch face initialization and if any key is found to be too large to fit the buffer it is turned off and the label and ERROR is displayed instead. The base32 encoded secrets are decoded dynamically to the buffer at the following times: - Face enters the foreground - User switches TOTP code Therefore, there is still some extra runtime overhead that can still be eliminated by code generation. This will be addressed in future commits. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: joeycastillo#385
4749f92
to
3f850d7
Compare
@matheusmoreira I've checked the latest code on my watch, no issues, all is working. Checked values of real codes again and the 100+ large secrets, both are fine. 👍
I was really surprised by this too and think its a very bad thing to be suggesting users to do! |
Great. Everything seems in order for the hotfix to be merged then. |
Forgot to call watch_display_string on the error message. Of course the message isn't going to be displayed. Also, increase the buffer size to 10 characters and output a space to the last position. This ensures the segments are cleared. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: joeycastillo#385
Fixes a division by zero bug caused by calling getCodeFromTimestamp without having initialized the TOTP library with a secret first. This was happening because the face calls totp_display on activation, meaning the validity of the secret was not checked since this is done in the generate function. Now the validity of the key is determined solely by the size of the current decoded key. A general display function checks it and decides whether to display the code or just the error message. The size of the current decoded key is initialized to zero on watch face activation, ensuring fail safe operation. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: joeycastillo#385
@madhogs Are you available for some additional hardware testing? I just pushed some additional bug fixes. There were some bugs in the error handling. Now the face should fail gracefully with an error message even if all keys are > 128 bytes. Can you please test that failure case? |
Just did the same checks as before on the physical watch and all working still 👍 |
Thank you!! @theAlexes The pull request is ready for your review! |
Forgot to call watch_display_string on the error message. Of course the message isn't going to be displayed. Also, increase the buffer size to 10 characters and output a space to the last position. This ensures the segments are cleared. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: joeycastillo#385
Fixes a division by zero bug caused by calling getCodeFromTimestamp without having initialized the TOTP library with a secret first. This was happening because the face calls totp_display on activation, meaning the validity of the secret was not checked since this is done in the generate function. Now the validity of the key is determined solely by the size of the current decoded key. A general display function checks it and decides whether to display the code or just the error message. The size of the current decoded key is initialized to zero on watch face activation, ensuring fail safe operation. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: joeycastillo#385
85278b9
to
10701f3
Compare
Just to update i've been using this on my physical watch and using the codes for almost a week now and no issues 😄 |
Allocate an unlimited extent 128 byte buffer once during setup instead of allocating and deallocating repeatedly. A static buffer was not used because it fails to be reentrant and prevents multiple instances of the watch face to be compiled by the user. The advantage is the complete prevention of memory management errors, improving the reliability of the watch. It also eliminates the overhead of the memory allocator itself since malloc is not free. The disadvantage is a worst case default size of 128 bytes was required, meaning about 90 bytes will be wasted in the common case since most keys are not that big. This can be overridden by the user via preprocessor. The key lengths are checked on TOTP watch face initialization and if any key is found to be too large to fit the buffer it is turned off and the label and ERROR is displayed instead. The base32 encoded secrets are decoded dynamically to the buffer at the following times: - Face enters the foreground - User switches TOTP code Therefore, there is still some extra runtime overhead that can still be eliminated by code generation. This will be addressed in future commits. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: joeycastillo#385
Forgot to call watch_display_string on the error message. Of course the message isn't going to be displayed. Also, increase the buffer size to 10 characters and output a space to the last position. This ensures the segments are cleared. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: joeycastillo#385
Fixes a division by zero bug caused by calling getCodeFromTimestamp without having initialized the TOTP library with a secret first. This was happening because the face calls totp_display on activation, meaning the validity of the secret was not checked since this is done in the generate function. Now the validity of the key is determined solely by the size of the current decoded key. A general display function checks it and decides whether to display the code or just the error message. The size of the current decoded key is initialized to zero on watch face activation, ensuring fail safe operation. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: joeycastillo#385
Alright, going to merge this in :) |
Allocate an unlimited extent 128 byte buffer once during setup instead of allocating and deallocating repeatedly. A static buffer was not used because it fails to be reentrant and prevents multiple instances of the watch face to be compiled by the user. The advantage is the complete prevention of memory management errors, improving the reliability of the watch. It also eliminates the overhead of the memory allocator itself since malloc is not free. The disadvantage is a worst case default size of 128 bytes was required, meaning about 90 bytes will be wasted in the common case since most keys are not that big. This can be overridden by the user via preprocessor. The key lengths are checked on TOTP watch face initialization and if any key is found to be too large to fit the buffer it is turned off and the label and ERROR is displayed instead. The base32 encoded secrets are decoded dynamically to the buffer at the following times: - Face enters the foreground - User switches TOTP code Therefore, there is still some extra runtime overhead that can still be eliminated by code generation. This will be addressed in future commits. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: #385
Forgot to call watch_display_string on the error message. Of course the message isn't going to be displayed. Also, increase the buffer size to 10 characters and output a space to the last position. This ensures the segments are cleared. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: #385
Fixes a division by zero bug caused by calling getCodeFromTimestamp without having initialized the TOTP library with a secret first. This was happening because the face calls totp_display on activation, meaning the validity of the secret was not checked since this is done in the generate function. Now the validity of the key is determined solely by the size of the current decoded key. A general display function checks it and decides whether to display the code or just the error message. The size of the current decoded key is initialized to zero on watch face activation, ensuring fail safe operation. Tested-by: Matheus Afonso Martins Moreira <[email protected]> Tested-on-hardware-by: madhogs <[email protected]> Signed-off-by: Matheus Afonso Martins Moreira <[email protected]> GitHub-Pull-Request: #385
The TOTP face is working in the simulator but fails on the real hardware when loaded with lots of codes, just like the LFS version. This is likely caused by the recent refactoring of the TOTP face which introduced a declarative credential interface for ease of use. That's accomplished by decoding the secrets at runtime which increases the RAM requirements. Users are likely hitting memory limits.
In order to mitigate this, the algorithm is changed from decoding all of the secrets only once during initialization to on the fly decoding of the secret for the current TOTP credential only. This converts this face's dynamic memory usage from O(N) to O(1) at the cost of memory management when switching faces and credentials which could impact power consumption.
Due to variable key sizes, the memory cannot be statically allocated. Perhaps there's a maximum key size that can serve as worst case?
Also took this opportunity to restructure the code a bit. Also added code to check for memory allocation failure.