Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a threaded script debugger #4952

Closed
derammo opened this issue Jul 23, 2022 · 25 comments
Closed

Implement a threaded script debugger #4952

derammo opened this issue Jul 23, 2022 · 25 comments
Milestone

Comments

@derammo
Copy link

derammo commented Jul 23, 2022

Describe the project you are working on

This is requested core development.

Describe the problem or limitation you are having in your project

The current debugger does not support threads. This is an issue existing since at least 2015. godotengine/godot#2446

Describe the feature / enhancement and how it helps to overcome the problem or limitation

I am implementing a debugger that supports threads. This document is to update the team and solicit input as the first implementation solidifies.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

Debugger Threading Background

Godot Requirements

Godot has a number of unique requirements related to threading in the area of debugging. These are enumerated here. Since the LocalDebugger is trivial, all these topics will be explained in terms of the RemoteDebugger.

1. Captures

So-called Captures are plugins that register for specfic messages on the debugging connection (implemented by RemoteDebuggerPeerTCP.) These messages are dispatched to those Captures to implement modular areas of debugging. For example, when inspecting a scene graph Node remotely, the SceneDebugger is used to implement the messages. Since the Godot scene graph may only safely be accessed by Main (the main thread,) the SceneDebugger must always be serviced by Main.

2. Thread Sharing

Godot must operate correctly even on platforms without threads. In other words, it must be possible to run everything on Main. For the debugger, this means that while debugging, Main must never be held indefinitely. It has to keep servicing other functions, including running the debugger itself.

Debugger Requirements

Debuggers in threaded systems have to be able to handle various scenarios and requirements. These are some important requirements not specific to Godot but that are important for the Godot threaded debugger design.

3. Threads wait on other threads

A thread being debugged may block on a resource (for example, a lock) held by another thread, or even on the completion of another thread (reaping.) When this happens, the other thread must be allowed to run so that the debugged thread won't block forever. This would otherwise be deadlock due to debugger halting threads.

4. Asking another thread for its debug stack must be thread-safe

When debugging multiple threads, displaying the stack frames and variables of other threads means synchronizing access with those other threads. Only the thread that runs the debugging code itself can safely query itself without any synchronization.

It is desirable to have threads run lock-free when they are not held in the debugger. Threads should be able to access their stack information and update their debugging state without hitting a mutex after every step. They should be able to execute as "naturally" as possible to not change the timing of everything when debugging a thread-related problem.

Existing Single-Thread Debugger (Godot 3.x, 4.0 master)

In the currently shipping debugger, these requirements are met in clever but trivial ways. Since Main is always available to run (it is the thread that called into the debugger,) most of these problems just go away.

Requirement 1 (see above) is met trivially. It simply does not allow debugging any thread except Main. This means that Main will always be the thread calling into the debugger, where it is guaranteed to be available to service all the various Godot functions, including debugging. It is always available because debugging only happens when it calls into the debugger.

Requirement 2 is met by implementing polling on Main. This way, Main is only ever either running normally or looping in the debugger, servicing all Captures and other functions that must continue to operate.

Requirement 3 is met trivially. Threads other than main are never held or step executed. This means they won't block Main from executing indefinitely, as they will eventually release any locks Main might be waiting on. Additionally, if Main were to reap one of those threads, it would just normally wait and return just like it does when not debugging.

Requirement 4 is avoided entirely. It is not possible to query any other threads. The debugger is always Main and it only ever accesses Main's stack. In fact, there is only one stack being saved and it is in the ScriptLanguage object for the current language. It isn't even separated as an object.

Threaded Debugger Design

Requirement 1 is difficult to meet entirely. Since the various Captures are not thread-safe and must only run on Main, that thread must always be available to run. However, it is very easy to construct a user script that blocks Main simply by using mutexes or even Thread::join. If a thread other than Main is being debugged and step executed, Main may be blocked entirely until the debugged thread makes some progress. For example, the debugged thread may need to finish for Thread::join or it may need to finish a critical section and release a mutex needed by Main. In such scenarios, SceneDebugger will not be able to query the scene graph. This is simply a reality of Godot, and the current design does not try to hide that fact. However, since debugger messages are asynchronous and the Debugger UI is signal-based, it will simply not update. The user can click around on the scene but won't receive any updates. The Thread List debugger view will show Main as blocked because it does not check in after step execution was requested.

When debugging a thread other than Main, the thread being debugged (called focused thread) is used to run the debugger, because it is the only thread that is guaranteed to not be blocked. However, this thread is not allowed to service Captures that must continue to be read only by Main. To make this possible, the debugger connection was extended with Channels in the same way as SCTP Streams or AMQP Channels. Main continues to service messages intended only for it whenever it is not blocked. The focused thread (which may or may not be Main itself) services the core debugger messages to make sure it can process the message to resume things. Note that in the current implementation, all channels run over the same TCP connection, so there is no back pressure flow control on channels. This can be corrected by simply using a channelized transport like multiple TCP connections, for example. Such a requirement would likely only be relevant if this connection protocol was reused for something game-related. Since Godot RPC already seems to have its own channels, this will likely not become relevant.

Requirement 2 is met in the same way as the existing debugger. Main will devolve to polling the debugger connection when running without threads. It can debug itself and run all the Captures.

Requirements 3 and 4 are met by step execution with barriers. When a thread is debugged and step executed (step in, step over, step out of the current frame), all other threads are released to run while the focused thread is allowed to execute its prescribed number of steps. After the debugged thread breaks again, a thread barrier is raised (atomic flag only, no locks if pausing is not requested) and all other threads will block themselves on semaphores after "checking in" their stack contexts. This means that as each thread temporarily halts, ownership of its debuggable information (stack, variables) passes to the debugger, so it can be accessed without lock. There are locks associated with checking in or querying the thread information, but threads do not need to lock anything while running, which is the only performance-sensitive state.

Only threads that are thus "at rest" are able to be interrogated in the debugger UI. In practice, all threads check in immediately and the user never sees it happen. However, if a thread were to block for a long time, this would be visible in the UI and its stack would be unknown for that duration.

When the next step is taken, all threads reclaim their debuggable stack information and resume until the next barrier. Because threads are allowed to run freely until the debugged thread makes progress, they won't block that thread indefinitely.

But why not just...?

On other platforms, threaded debuggers can run the UI and the interrogation of contexts separately from the threads being debugged. For this, they use dedicated threads just for the debugger.

I did not go this route and instead extended the existing model of running the debugger on the focused thread, because:

  • Extra threads would require more mutexes that aren't necessary in the single-threaded case or on platforms without threads. The existing debugger works without them, and I wanted to keep that benefit as much as possible.

  • Extra threads are a very sparse resource on embedded or web platforms.

Threaded Debugger Implementation

All debuggable stack information was removed from ScriptLanguage and moved into ScriptLanguageThreadContext, which exists once for each registered language (see ScriptServer) and for each thread, by being stored in thread-local storage.

Each of these classes is derived for GDScript, VisualScript, C# (no longer does much since the dotnet6 merge), and ScriptLanguageExtension. This means that threads from all languages meet in the ScriptDebugger and check in their various thread context implementations. The debugger UI can then show all stacks from all "at rest" threads including what language they are running in. A current limitation of Godot is that these stacks will not be interleaved. In other words, the user cannot see if a thread originated in some GDScript code, passed through some native code, and then hit a breakpoint in python. These would show as two separate 'threads.' The system will provide the UI with a 'thread tag' which is just an opaque representation of the thread ID so that these threads can be shown together. However, it is not currently possible to show how stack frames from different languages stack or interleave.

For ScriptLanguageExtension, the extension language is the factory for the debuggable thread context. This means that a threaded script language can offer up debug information for all the stack frames of all its threads.


The rest of this document is intended for people familiar with the current debugger. Functionality is explained briefly, showing the messages exchanged over the debug connection.

The intention is to show the design of an implementation I am currently doing and solicit input. Once the debugger protocol is finalized and tested, the next phase would be requirements for Debugger UI. [update: The UI is implemented.]

Flows

Various remote debugger protocol messages have been extended, and some new ones have been added. The following are the supported flows in the new implementation with an explanation of what happens. In this write-up, "VM" refers to whatever script engine is being used.

1. Breakpoint Hit

A thread hits a breakpoint. The main thread enters the remote debugger and handles the debugging in a blocking manner, only executing VM OPCODEs when allowed by the debugger, but always servicing the debugger connection and captures. All other threads will hit a barrier within one OPCODE, once they get scheduled.

VM ---> debug_enter_thread TID ---> Debugger

This mesage was renamed because the old message debug_enter did not have any additional arguments and it was handled as a hard error if it had any. For compatibility, this new message indicates to the debugger that threads are supported and the thread that hit the breakpoint is TID. Servers that implement the old protocol would ignore or error this message and not engage with the debugging session, which is important because some of the follow up messages have additional arguments now.

VM <--- get_stack_dump [] <--- Debugger

VM ---> stack_dump data data data... TID ---> Debugger

VM sends stack dump for debugged thread as before, because optional TID was not specified on request.

VM ---> thread_break TID ---> Debugger

This message indicates that the thread specified is held at a barrier and its debug data is available. This thread is one of the other running threads, not the one being debugged at the moment. The UI can now fetch debug information for that thread also, via:

VM <--- get_stack_dump TID <--- Debugger

VM ---> stack_dump data data data... TID ---> Debugger

VM sends stack dump for specified thread, if the thread is still suspended by the time the request arrives. Otherwise an empty dump is sent as acknowledgement. The UI should show this as the stack frames or as an indication the thread is running.

The same sequences are supported for get_stack_frame_vars:

VM <--- get_stack_frame_vars [TID] <--- Debugger

VM ---> stack_frame_vars size TID ---> Debugger

VM ---> stack_frame_var ... ---> Debugger

VM ---> stack_frame_var ... ---> Debugger

VM ---> stack_frame_var ... ---> Debugger

...

The debugger requests continue/step/next as before. When execution resumes, additional threads are also resumed. When step execution is requested, additional threads are freed to run until the currently debugged thread makes its steps, to avoid deadlocks. Then all threads are again halted and new thread_break will be received for all threads as they stop, prompting the debugger UI to request fresh information.

A new command will be added (not yet implemented) to switch thread context:

VM <--- thread TID <--- Debugger

This switches the focus of debugging to another thread. The selected thread is marked for debugging, which may be asynchronous if it is currently running. This is very different from current behavior because the debugger connection is being serviced by the previously debugged thread, so breaking isn't synchronous. The originally debugged thread and other threads are freed to run until the selected thread suspends and the sequence starts over with

VM ---> debug_enter_thread TID ---> Debugger

as above.

2. Error Hit

This is handled exactly like Breakpoint Hit.

3. Breakpoint Requested, Skip Breakpoints

Handled exactly as today, since breakpoints apply to all threads.

4. Debug Exit

As today, sent only once for the debugged thread. Other threads will resume.

The message debug_exit has been replaced with debug_exit_thread TID because we can now have out of order execution of messages. We need to make sure this doesn't terminate a debug session on another thread that starts immediately afterwards.

5. Configuration

In a next phase, configurable "other threads" behavior can be added. These modes might include

  • all other threads step execute also (presumably main continues to run debugger but does not progress VM)
  • all other threads get to keep running
  • all other threads on barrier (as described above, default implementation, presumably main continues to run debugger but does not progress VM)
  • all other threads held until debugging is done (probably bad idea because of deadlock)

All these can be implemented with the same thread parking and semaphore mechanism that is implemented now. Step executing threads other than the main thread will need to be implemented in such a way as to let the debugger(s) continue to run on main thread, while not progressing in the VM. Essentially, the "execute VM" part of main is conceptually stopped on the barrier, but really the thread needs to not be blocked.

If this enhancement will not be used often, can it be worked around with a few lines of script?

N/A

Is there a reason why this should be core and not an add-on in the asset library?

N/A

@Calinou Calinou changed the title Threaded Script Debugger Implement a threaded script debugger Jul 23, 2022
@derammo
Copy link
Author

derammo commented Jul 23, 2022

Update: I have completed the implementation of thread local storage debugging contexts and all the VM and script language support. There is still some work to be done, but it could benefit from review of the sequences above. These are the use cases that I will need to test with the new code.

There are some implementation pieces missing on the Visual Script and ScriptLanguageExtension areas, but nothing major and I will finish that once the GDScript one is working well and we have agreement on the functionality.

@derammo
Copy link
Author

derammo commented Jul 23, 2022

Already found something that doesn't work well:

thread_break is going to be renamed something like thread_paused to indicate it is just status.

There will need to be a new message (either thread_break TID or debug_request TID?) to indicate another thread has hit a breakpoint or error (like what you might imagine might be shown in a Threads view in the UI) and would like to be the focus of debugging (thread TID). Internally, this means there is a notification sent on the connection that is serviced by the current debug target thread and then it can do thread TID to hand the connection over to the new thread of interest, if the UI wants to.

I will update above sequences later to show that.

@derammo
Copy link
Author

derammo commented Jul 24, 2022

Investigations / Unclear Requirements

These are things I still need to investigate or figure out. Any input/context/pointers very welcome to make that go faster.
(Tracking post, will be edited)

Issue 1: Main Thread Use

The existing debugger runs the remote debugger connection [imprecise: I meant the message send/receive part. Faless points out the actual TCP code is threaded, but the point stands] directly on the main thread. That trick means that anything interrogated by the debug code is conveniently paused and thread safe. Do other debugger things hang off of this? For example, does the Scene Debugger hang off of this? Is that what "captures" are?

Since obviously the Scene Debugger can't run on any secondary thread, this would mean the RemoteDebugger servicing can't be handed over to another thread when that other thread is being step executed. Focusing and stepping a secondary thread would then need to be done via the main thread i.e., slightly asynchronously with some semaphore handshaking. That's pretty likely the case but it loses the very nice structure that was happening before, where the thread being studied is directly held, because it is interrogating itself. [Edit] This was wrong, since Main can service the secondary debuggers without having to own the entire connection. See below.

Answer 1a

Yes, Scene Debugger and Server Debugger(s) are hanging off of this and need to be main. Therefore, the RemoteDebugger always needs to run on main, even if the main thread is not the one being debugged and even if main is step executed or allowed to execute freely, due to the current debug settings. Instead of the thread that wants to initiate debugging grabbing the debugger connection, it needs to signal to main to do so on its behalf. [Edit] This was wrong because Main can be blocked due to debugging other threads, so may not be available. See below for a new approach using Streams.

Answer 1b

Presumably, debugging main can continue to work exactly as now.

Issue 2: Third Party Debugger Support

For example, how does VSCode debugging support work? Is it very indirectly, like remoting to the Editor, which then uses RemoteDebugger? Or do the VSCode extensions directly implement the RemoteDebugger protocol? If the latter, then the changes being made also strongly affect alternate implementations that I'd need to survey.

Answer 2

Faless confirms third party support is indirect via the Editor. Therefore, debugger protocol changes don't require third party rewrites and can be addressed by moderate adaptation inside the Editor.

@Faless
Copy link

Faless commented Jul 25, 2022

The existing debugger runs the remote debugger connection directly on the main thread. That trick means that anything interrogated by the debug code is conveniently paused and thread safe. Do other debugger things hang off of this? For example, does the Scene Debugger hang off of this? Is that what "captures" are?

I think there are a few misunderstanding here:

  • EngineDebugger is, well the engine debuggger, and can have different implementations. Notably, LocalDebugger and RemoteDebuggger.
  • The debugger can be extended by adding profilers and captures.
    • A profiler is composed of 3 callbacks (toggle/add/tick) and optionally user data. tick is called during each main iteration when the profile is active, toggle when the profiler is enabled/disabled, add is called to add data to profile.
    • A capture is just 1 callback (capture) and optionally user data. capture is used to parse messages and is called when receiving a debugger command (message) which start with the registered prefix:.
  • SceneDebugger is not another debugger implementation, it just adds a capture and a profiler to it to extend its functionalities.
  • The default connection used by RemoteDebugger (RemoteDebuggerPeerTCP) is handled in a separate thread.
    Outgoing messages are queued ( https://github.com/godotengine/godot/blob/master/core/debugger/remote_debugger_peer.cpp#L53-L61 ) and dequeued in the _poll function which is called by a thread ( https://github.com/godotengine/godot/blob/master/core/debugger/remote_debugger_peer.cpp#L193-L208 ). Similarly, incoming messages are queued in the thread, and dequeued during EngineDebugger::poll_events which is called every iteration, but can also be called in other circumstances (e.g. script languages might call it during script execution to make it easier to detect things like infinite loops).

As I mentioned in chat I don't think godotengine/godot#63267 is the correct approach. What I would expect is a wrapper that forwards calls to EngineDebugger after properly taking a global debugger lock.

The relevant functions to wrap are not many. At least:

  • poll_events.
  • send_message.
  • send_error.
  • debug (here the extra work on script languages is needed)

And likely:

  • line_poll
  • capture_parse
  • profiler_add_frame_data

This is probably a bare minimum to get a working version. Other functions might require a bit more work (e.g. the is_profiling check) but this should already gives you a workable version.

@Faless
Copy link

Faless commented Jul 25, 2022

About issue 2 which I forgot to address. Third party debuggers are supported using a DAP middleware which only runs in the editor and interfaces with the editor debugger (not the game debugger).
You can read more about the proposal here and the implementation here: PR1, PR2

@derammo
Copy link
Author

derammo commented Jul 25, 2022

@Faless thank you for your replies. I don't believe there is any misunderstandings. When I am talking about the RemoteDebugger, I just said that because it is the more non-trivial case. I am changing both the LocalDebugger and the RemoteDebugger to implement the notion of multiple threads being under debug. This isn't about making the existing version thread safe but extending it so you can actually view all stack frames in the Editor and jump around between threads, step executing them like a debugger needs to.

To do this, I implemented thread-local debugging contexts (moving all debugging stuff from all versions of ScriptLanguage into TLS) in the threads and made the GDScript VM aware of the fact it is executing in threads. That's because I am only focusing on ScriptDebugger cases initially. I am aware this means I have to do equivalent work for VisualScript and LanguageExtension also, but I don't care if those are a bit more limited initially. I would be happy to share the working copy of my code to clarify. It is a lot larger of an undertaking than you describe.

I am also aware what the SceneDebugger does. The reason I mention it is because interrogating the scene graph MUST be done on Main, because it will never be thread safe. So that sub-debugger (capture) really needs to stay on main, even if another thread is breakpointing. That in turn means RemoteDebugger/LocalDebugger main dispatching function that processes all the messages stays on main, which in turn means it is no longer the thread being debugged in all cases.

The previous implementation only ever debugged the main thread, so the main cycle of exchanging messages with the debugger (remote or local) was done on the same thread. This basically meant we got "step execution" and handshaking with the thread under test for free (because it is the same thread.) This changes if you want to step execute a worker thread. That's what the discussion about main thread use is about.

Thanks for the good news about Issue 2. That means I don't have to care. :)

@derammo
Copy link
Author

derammo commented Jul 25, 2022

As I mentioned in chat I don't think godotengine/godot#63267 is the correct approach. What I would expect is a wrapper that forwards calls to EngineDebugger after properly taking a global debugger lock.

That PR isn't really related to this work. It was a tiny modification just to get those messages across because right now they were being dumped. But if there is a quicker way to do that, that's great. It doesn't have much to do with this proposal though.

@derammo
Copy link
Author

derammo commented Jul 25, 2022

@Faless let me pop up to a higher level to try to clarify:

Are you describing an approach where each thread conceptually has its own debugger? Because they obviously each have their own stacks and run state so that either means you fan out into thread-local storage (which I did) or replicate the debugger in each thread.

I had considered doing that, but I felt that extending the existing debugger to multiple thread contexts was better, because breakpoints and globals are shared and some things can't even be accessed by other threads (Scene Debugger most prominently.)

@derammo
Copy link
Author

derammo commented Jul 25, 2022

This is probably a bare minimum to get a working version. Other functions might require a bit more work (e.g. the is_profiling check) but this should already gives you a workable version.

I don't doubt that it would work. I just don't think it gives you the ability to debug all the threads at the same time like gdb or Visual Studio. My current implementation runs and is able to debug threads, but it is enormous in terms of impact. So I kind of want to make it work completely to hopefully convince people this is good, even though it is huge already.

image

[edit] It is so huge because I decided to make ScriptLanguageThreadContext a new object that callers can directly work with, instead of redirecting every debugging call from the existing ScriptLanguage objects. So really it is a lot of internal refactoring to make the structure match what is actually going on. It's not like I changed how hundreds of things work. It would probably look a lot smaller if I had just taken this extra cost of the redirect and have uglier code, but I was trying to take the hit now and have better code long run.

@Faless
Copy link

Faless commented Jul 25, 2022

Are you describing an approach where each thread conceptually has its own debugger?

No, I am suggesting we keep the EngineDebugger a singleton, and wrap calls to be thread safe. The debugger will always run on the main thread and will exchange messages with the threads via a message queue.

I understand we need some changes in the break implementation to stop all threads and handle them, but that's mostly it when it comes to EngineDebugger/RemoteDebugger (bigger changes are needed to ScriptLanguage/etc).

But I'm probably missing something.

@derammo
Copy link
Author

derammo commented Jul 25, 2022

Now we are synchronized.

I WAS going to call RemoteDebugger from whatever thread breaks to keep it like it was now but since I learned about all the captures I realize now that is not possible. So a thread will notify break and the main thread has to enter the debugger even if main wasn’t the thread that errored or hit breakpoint.

I spent all my time on the ScriptLanguage stuff so far. The LanguageExtension case is neat because it will let whatever python it is :) allocate the per-thread context so it can export its stacks and such. Should be very cool in the medium term.

Sorry for the communication challenges. I might be using too much shorthand or poor phrasing. Also no drawings. If you want to invest some of your time, we could have a meeting and I give you a tour.

@derammo
Copy link
Author

derammo commented Jul 25, 2022

Oh maybe another important point: in this first version I am holding threads in ScriptDebugger so that I leave native engine threads alone because I didn’t want to mess with those yet at my current comfort level. So it is entirely Script (three flavors) centric

@derammo
Copy link
Author

derammo commented Jul 27, 2022

Update on RemoteDebugger / UI:

I have defined a new set of signals on the ScriptEditorDebugger that I think is ready for input. If you feel you would not know how to implement the handlers of these in a UI or if there is something wrong, please comment. Please note that "debug thread ID" is an opaque buffer, because it needs to be unique across all threads AND all script languages, so it is bigger than just Thread::ID. For extension languages, it would be whatever the external platform uses for IDs, even variable length strings packed in there. Either way, it is opaque.

The handlers for these messages will use some utility object to manage an LRU mapping from "true" IDs to smaller numbers to show in the UI (i.e. thread 1, 2, 3, ...) All API, however, needs to refer to the thread context by its full opaque name. Right now there is no support for interleaving stack frames from different languages, as before. When fetching a stack frame, only the language in which the thread is currently suspended is returned from these API calls. However, the Debugger UI can choose to show all the stack frames from all the languages (each only updated when the thread step executes in that language.) The resulting view will be consistent as of the last update from each language/thread but won't tell the user how the stacks interleave. Changing this would require different VMs to share current stack frame information, which is not the case.

	// Signals that potentially introduce new threads.
	ADD_SIGNAL(MethodInfo("thread_breaked",
		PropertyInfo(Variant::PACKED_BYTE_ARRAY, "debug_thread_id"),
		PropertyInfo(Variant::BOOL, "is_main_thread"),
		PropertyInfo(Variant::STRING, "reason"),
		PropertyInfo(Variant::INT, "severity_code"),
		PropertyInfo(Variant::BOOL, "can_debug")));
	ADD_SIGNAL(MethodInfo("thread_paused", 
		PropertyInfo(Variant::PACKED_BYTE_ARRAY, "debug_thread_id"), 
		PropertyInfo(Variant::BOOL, "is_main_thread")));
	ADD_SIGNAL(MethodInfo("thread_alert",
		PropertyInfo(Variant::PACKED_BYTE_ARRAY, "debug_thread_id"),
		PropertyInfo(Variant::BOOL, "is_main_thread"),
		PropertyInfo(Variant::STRING, "reason"),
		PropertyInfo(Variant::INT, "severity_code"),
		PropertyInfo(Variant::BOOL, "can_debug"),
		PropertyInfo(Variant::BOOL, "has_stack_dump")));

	// Signals referring to threads already known.
	ADD_SIGNAL(MethodInfo("thread_continued", PropertyInfo(Variant::PACKED_BYTE_ARRAY, "debug_thread_id")));
	ADD_SIGNAL(MethodInfo("thread_exited", PropertyInfo(Variant::PACKED_BYTE_ARRAY, "debug_thread_id")));
	ADD_SIGNAL(MethodInfo("thread_stack_dump", PropertyInfo(Variant::PACKED_BYTE_ARRAY, "debug_thread_id"), PropertyInfo(Variant::ARRAY, "stack_dump")));
	ADD_SIGNAL(MethodInfo("thread_stack_frame_vars", PropertyInfo(Variant::PACKED_BYTE_ARRAY, "debug_thread_id"), PropertyInfo(Variant::INT, "num_vars")));
	ADD_SIGNAL(MethodInfo("thread_stack_frame_var", PropertyInfo(Variant::PACKED_BYTE_ARRAY, "debug_thread_id"), PropertyInfo(Variant::ARRAY, "data")));
	ADD_SIGNAL(MethodInfo("thread_info",
		PropertyInfo(Variant::PACKED_BYTE_ARRAY, "debug_thread_id"),
		PropertyInfo(Variant::STRING, "language"),
		// Debug thread contexts that have the same value here are from the same thread, but for different language.
		// This is not guaranteed to be a real OS thread ID (even though it will be for core languages.)
		PropertyInfo(Variant::PACKED_BYTE_ARRAY, "thread_tag"),
		PropertyInfo(Variant::STRING, "name")));

Note: The reason for the is_main_thread in these messages is because it will be easier to handle these messages in the UI that way. The main thread is special in various ways, and Debugger UI features may display it differently and even use different settings for how to debug it. I decided the handlers should not have to wait for thread_info to figure out how to add the newly arrived thread to the UI (update trees, etc.)

Note: thread_alert means that a thread would have ended up thread_breaked but you are already debugging some other thread that is currently "focused." The idea is you may not want to even fetch the stack yet and just put a little red light on it or something like that. In any case, it is up to the Debugger UI to decide what happens when multiple threads want attention (breakpoint, error, etc.)

[Update: Implemented. See PR 64364 for status.]

Update on LocalDebugger:

I am pivoting now to work on LocalDebugger first, because that is the fastest path to being able to actually demo something and interact with the new debugger, without having to code the UI first.

[Update: Implemented. See PR 64364 for status.]

Update on DAP:

I don't have a running DAP client that works with 4.0. Any help appreciated.

The good news is that DAP supports threads, so it might be the fastest way to get a good threaded debugger UI.

[Update: VSCode plugin is nonfunctional with 4.0, so cannot proceed efficiently.]

Update on CSharpLanguage

[Update: Implemented. See PR 64364 for status.]

@derammo
Copy link
Author

derammo commented Jul 31, 2022

If you are thinking about reviewing this now, don't. I know it is wrong, and I will post a new design this week 10 hours later.

@derammo
Copy link
Author

derammo commented Aug 1, 2022

image

In this picture, you can see the Debugger suspended on a breakpoint in a thread function. If you look at the _ready() function, you will note the Main thread is blocked on wait_to_finish in this case. This proves the original design was flawed. It is not possible to debug only on Main and proxy for the other threads, because the Main thread in Godot is allowed to do other blocking things, including blocking on a debugged thread or a mutex held by such a thread.

Therefore, the only workable solution is to let threads run the core debug messages themselves, because the currently debugged thread is the only one that is guaranteed not to be otherwise blocked. This means that in such situations where the debugged thread is not Main, debug captures that can only process on Main will not respond if Main is currently blocked. That's just a reality of how the system works.

I designed around this by introducing Streams into the debugger protocol. Like in SCTP, messages are tagged with different stream identifiers and do not block other streams. In reality, there is currently just one TCP connection, so the stream IDs are just carried in the messages (piggy backed on the size field.) But we could consider running multiple connections when it makes sense for other protocols.

There will initially only be 2 streams: MAIN and OTHER.

Messages that are sent on MAIN are always serviced by the main thread. This includes all captures and plugin messages. If Main is blocked, then there won't be any reply for a while. This is ok, because the entire Debugger is designed to be async and it does not wait for responses.

OTHER carries messages to and from the thread(s) currently debugged. These are only the core debugging messages like next, continue, and access to the stack frame and variables. These can be sent from any thread, but are always serviced by the thread suspended on the debug() call so that we always have a runnable thread available. That way, the debugger does not freeze when Main is blocked. These must always work, so we can resume threads and get the system unstuck.

@derammo
Copy link
Author

derammo commented Aug 1, 2022

In terms of layers, the Stream concept is at the EngineDebuger / RemoteDebugger level. To clients, these streams appear like separate connections. At this layer, the meaning of messages is not known. If messages were known at this layer, there would not need to be explicit stream identifiers in the messages, but I decided it is better to be explicit. Also, multiple streams per connection is a pretty normal networking abstraction that can handle many things like QoS, head of line blocking, etc.

At the ScriptDebugger level, messages are dispatched to the correct streams based on whether they are core debug messages or not.

@derammo
Copy link
Author

derammo commented Aug 1, 2022

Also: There is currently no UI yet for showing the other threads, but all the plumbing is there to show all the stack frames from all the threads simultaneously and update them as they are temporarily resumed during step execution.

@derammo
Copy link
Author

derammo commented Aug 10, 2022

Added a third stream STREAM_DISCARDABLE for messages that will flood but can be discarded if the buffers fill up. This is currently just for servers:draw because this message is sent continuously and if the main thread is blocked, it will fill up the buffers and block all debugging messages.

@derammo
Copy link
Author

derammo commented Aug 10, 2022

Just found a description of Godot RPC with Channels. Looks like we should just be using that for the Remote Debugger protocol instead of all this ad hoc networking stuff. Why isn't this already the case?

@derammo
Copy link
Author

derammo commented Aug 10, 2022

I won't go and rewrite everything to share the RPC code, but I will rename my Streams into Channels to be less confusing. I come from a telecom background, where SCTP was the primary "channelized" protocol, so maybe everyone else uses Channel, especially if they are web people :)

@RedMser
Copy link

RedMser commented Aug 15, 2022

@derammo

VSCode plugin is nonfunctional with 4.0, so cannot proceed efficiently

You can try my WIP PR if you want to see some results (for example, variables are listed but don't update their value)... Needs more work to reach 3.x level of features again, but might help with testing?

@derammo
Copy link
Author

derammo commented Aug 16, 2022

@RedMser thanks for the pointer. I am currently working on ScriptLanguage...Extension. After that is done, only C# and DAP remain. I will definitely check in with you before doing anything on DAP.

@derammo
Copy link
Author

derammo commented Aug 17, 2022

Update: I wrote up the tricky threading issues that gave rise to the current design. I had a really hard time trying to explain them in chat or quick responses, so I wrote a short article and included it in the first comment here. (see top)

@derammo
Copy link
Author

derammo commented Aug 17, 2022

image

@AThousandShips
Copy link
Member

Threaded debugging was implemented in:

Closing as completed, if specific aspects of the current debugger are insufficient please open new dedicated issues or proposals

@AThousandShips AThousandShips added this to the 4.2 milestone Jan 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants