Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround WebUSB DAPLink firmware issue #5530

Closed
microbit-carlos opened this issue Mar 7, 2024 · 14 comments
Closed

Workaround WebUSB DAPLink firmware issue #5530

microbit-carlos opened this issue Mar 7, 2024 · 14 comments

Comments

@microbit-carlos
Copy link
Collaborator

microbit-carlos commented Mar 7, 2024

There is at least one known issue with WebUSB failing that we can apply a workaround in MakeCode.
If the micro:bit has a battery pack connected, when the USB cable is unplugged (in the middle of a WebUSB transaction) it can leave DAPLink in a bad state and WebUSB might stop working when the micro:bit is reconnected:

This will need a DAPLink firmware update, but a workaround can be applied to MakeCode as described in the DAPLink issue, which is desirable, as many user will never really update their firmware.

@microbit-carlos
Copy link
Collaborator Author

@abchatra The general idea of the workaround (from microbit-foundation/DAPLink#17 (comment)):

  • When the response for the wrong DAP command is received, start tracking a history of received and expected commands
  • Send a couple of innocuous different DAP command, the response value is not that important as we only need to track the command ID in the response message. For example it could read some DAP info
  • If DAPLink sends a response for the previous command instead we can detect we are in this "one off" state
    • Might be worth sending a couple of different commands to be 100% sure
  • From this point forward on this WebUSB session we can send a dummy command in-between real commands, to counteract this "one off" issue
    • So it looks like this:
      • Send command 01 -> get bad response
      • Send dummy command -> get response for command 01
      • Send command 02 -> get dummy response
      • Send dummy command -> get response for command 02

@abchatra
Copy link
Contributor

Thanks @microbit-carlos

@microbit-carlos
Copy link
Collaborator Author

We talked about it with @abchatra and we think we should first do the following:

  • Detect the issue and add it to the metrics
  • Create a UX workflow to instruct the user to unplug and replug all micro:bit power sources
web.usb.reset.short.small.mp4

@abchatra
Copy link
Contributor

@jwunderl lets discuss this before starting the work.

@microbit-carlos
Copy link
Collaborator Author

Hi @jwunderl, have you had a chance to start looking into this? Let me know if there is anything I can clarify or assist from the DAPLink side. We are keen to have the UX workflow improvements ready for this release.

@microbit-carlos
Copy link
Collaborator Author

Another way to trigger this is by refreshing the MakeCode browser tab, as described in:

@jwunderl
Copy link
Member

Pointing at #5734 and closing to aggregate related issues; as noted over there, a few fixes have made their way in to mitigate this by sending dummy commands / ignoring the failures which basically does this (just with no extra special cased logic to track previous commands), and seems to be covering the scenario (e.g. the makecode browser tab no longer reproduces to my knowledge)

@microbit-carlos
Copy link
Collaborator Author

I can still replicate this issue in the latest beta:

  • makecode.microbit.org version: 7.0.29
  • Microsoft MakeCode version: 10.2.23
  • microbit runtime version: v2.2.0-rc6
  • codal-microbit-v2 runtime version: v0.2.67

Here it shows the issue with the console log, after a while the flashing times out and it offers to download the hex file to the user's computer:

webusb.issue.mp4

Tagging @abchatra, as requested in the call earlier today. Let me know if you prefer this conversation to be moved to #5734 (which already contains a few different webusb issues), or prefer to reopen this one.

@microbit-carlos
Copy link
Collaborator Author

microbit-carlos commented Sep 3, 2024

@abchatra / @jwunderl Even with the latest change, I can still replicate in the latest beta v7.0.35. While I don't see the "bad response" error logs anymore, it does seem to be getting stuck at the same point:

webusb-issue.mp4

Btw, @jwunderl I've noticed the comment in microsoft/pxt#10139 (comment) that you still haven't been able to replicate this issue, does that mean you were never able to replicate? Or just in the latest v7?
If it's the former it would be interesting to figure out why, as in our experience, it's really easy to replicate with a battery pack.

@microbit-carlos
Copy link
Collaborator Author

microbit-carlos commented Sep 3, 2024

Can still get the "bad dapCmd response" log in v7.0.35:
image

Not sure why the first couple of times I didn't get this message when I wrote my previous comment, but tried it again 4 or 5 times now, and they all have shown the log (no timeout error shown).

@jwunderl
Copy link
Member

jwunderl commented Sep 3, 2024

@abchatra / @jwunderl Even with the latest change, I can still replicate in the latest beta v7.0.35. While I don't see the "bad response" error logs anymore, it does seem to be getting stuck at the same point:

webusb-issue.mp4
Btw, @jwunderl I've noticed the comment in microsoft/pxt#10139 (comment) that you still haven't been able to replicate this issue, does that mean you were never able to replicate? Or just in the latest v7? If it's the former it would be interesting to figure out why, as in our experience, it's really easy to replicate with a battery pack.

Could you share exact browser version you're reproducing this in? We noticed some consistent issues with multiple chromebooks last friday, but when testing today they just completely disappeared -- we were suspecting it to possibly be a browser update that fixed it & that 'relaunch to update' in view would match up with that guess.

To confirm, I have still yet to see any issues besides #5893 over the past week or so, while testing pretty thoroughly throughout that time / same experience when others tested. Will try and find where exactly that hang is occurring though, it's not immediately clear what circumstances cause it to fail before any flashing is attempted in that way

@microbit-carlos
Copy link
Collaborator Author

Unfortunately, as chrome had updated in the background and was waiting to "relaunch to update" it wasn't showing the old version in the about menu.
I've updated Chrome to the latest Version 128.0.6613.120 (Official Build) (x86_64) and can still reproduce.

This is a capture from a fresh session, with a micro:bit V2 with a battery pack connected, where I WebUSB connect and then unplug and replug the USB cable several times. By the 5th attempt the error is triggered:

webusb-issue-latest-chrome.mp4

Once the micro:bit gets in this state, the WebUSB connection error will persist until the board is completely power cycled, by removing the battery and USB cable.

To confirm, I have still yet to see any issues besides #5893 over the past week or so, while testing pretty thoroughly throughout that time / same experience when others tested.

Does that mean nobody from the team has been able to replicate this issue at all? Or just with the latest MakeCode beta? If that's the case maybe we should get in a call to check why that might be. I just had a quick check with our team here and we were able to replicate it as shown in the video, in all computers with the latest Chrome for Windows and Mac (arm and x86).

@jwunderl
Copy link
Member

jwunderl commented Sep 4, 2024

@microbit-carlos Okay, i was able to get this reproduced by unplugging and replugging very quickly / about a dozen times after flashing a few times. Not sure why it's seeming to be so much harder to reproduce then what you're seeing, but thanks for instructions on that.

However, this one appears to be pretty different -- the error message with the result not popping up isn't just a logging error, the transferin appears to just be fully stalling -- that is, breaking here: https://github.com/microsoft/pxt/blob/master/pxtlib/webusb.ts#L441 and stepping shows it enter the transferIn and never come back or throw:

Screen.Recording.2024-09-04.115837.mp4

This appears to be micro:bit fully locking up and not giving any responses. While sitting on that breakpoint I tried all the methods to deal with it that were immediately apparent -- dev.clearHalt(..) on the specific pins in case it had sent a stall and was addressable, dev.reset(), dev.close(), dev.releaseInterface(..), etc., with nothing appearing to have any impact. I believe I know the two spots where it's most likely to occur when reconnecting / downloading, so I should be able to add an extra timeout to detect the stall there & pop up a dialog in those cases with specific instructions on "unplug battery & reconnect" and another tick event for now, if there is anything else to try invoking in the locked state i can do so though..

and also a side note, the 'bad dapCmd response 0 -> 131; retrying' type log comes up when we are doing the 'ignore bad response and see if next response is appropriate' approach. it not resulting in a 'retry success log or a retry failed error would indicate a stall, as well.

@microbit-carlos
Copy link
Collaborator Author

Added a comment to the error message PR:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants