Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

capture error in worker thread #4342

Merged
merged 9 commits into from
Jul 2, 2024

Conversation

soulomoon
Copy link
Collaborator

@soulomoon soulomoon commented Jun 28, 2024

Fix #4340
let the caller handle the error happened during awaitRunInThread. In such case the HLS won't be stuck if the session setup fails and the LSP should be able to show the error to the client as other error cases happening in the rule system.
cc @fendor

@soulomoon soulomoon requested a review from wz1000 as a code owner June 28, 2024 09:46
@soulomoon soulomoon requested a review from fendor June 28, 2024 09:46
@soulomoon soulomoon added the performance Issues about memory consumption, responsiveness, etc. label Jun 28, 2024
@michaelpj
Copy link
Collaborator

Okay, so what are the cases here?

  1. The action is maybe expected to throw exceptions. Then probably the action supplied by the user should include exception-handling as appropriate.
  2. The action is not expected to throw exceptions, or anyway the user has not prepared for that. Then something has gone wrong, and probably it's okay to just let everything die?

So the case we need to handle is 2... but I think we have exactly the tool for that in the form of Async.link. We just need to link the "main thread" to the "worker thread", so that if the worker thread dies the exception is rethrown in the main thread. So I think all we need is just a call to link on line 37.

@soulomoon
Copy link
Collaborator Author

soulomoon commented Jun 28, 2024

Session loading belong to a part of the evaluation of Rule GhcSession on per file bases.
If #4340 happen for one file means it would happen for all files, @michaelpj 's suggestion to let everything dies seems to be a good way to go.
I worried handling in 2 is a bit too aggressive for just one file failed case ?

Maybe @fendor knows which cases are likely to happen ?

@michaelpj
Copy link
Collaborator

Actually I think I'm wrong and you were right: an unhandled exception can either go to the thread that launched the worker queue, or the thread that put the action into the queue. Probably the second option actually makes most sense, since it means that "run this action directly" and "run this action via the queue" behave the same in terms of where the exception goes.

I wonder if there's a nice way to implement that behaviour using some of the existing concurrency pieces we have. This sort of thing can be very tricky, so I'd feel more comfortable if we were leveraging async or something somehow.

@michaelpj
Copy link
Collaborator

To be honest, I continue to think about my comment here: #4256 (review)

A lot of this would be simpler if we were using locks!

  • Consumers who want to block until they get the result can just wait until they get the lock and then do the thing
  • Consumers who don't want to wait can fork a thread that waits for the lock

In both cases the exception handling is pretty straightforward and under the control of the consumer.

Haskell MVars are fair, so if we use a MVar-based lock we also should get things being scheduled reasonably. GHC does them in FIFO fashion, so we would even get the same queueing behaviour, but arguably this is a bit less good because it doesn't give us quite the same sequencing guarantees (does that matter?).

@michaelpj
Copy link
Collaborator

So basically everywhere we have a ThreadQueue we would instead just have a Lock.

@soulomoon
Copy link
Collaborator Author

soulomoon commented Jun 28, 2024

I wonder if there's a nice way to implement that behaviour using some of the existing concurrency pieces we have.

Currently I do not find an existing api for such mechanism. Since the caller are different rule evaluation threads. The scoping is nasty here, so the link function might not be in our favor.

@soulomoon
Copy link
Collaborator Author

soulomoon commented Jun 28, 2024

So basically everywhere we have a ThreadQueue we would instead just have a Lock.

Lock solve the race problem. However, it can not solve the cancel problem.
We have two main cancel points for them.
The session restart and the shutdown.
We want the shutdown to be able to cancel them as fast as it can.
But we do not want to session restart to cancel them, for session loading, cancel in the middle would result in invalid state.

@michaelpj
Copy link
Collaborator

Don't the restarts cancel the Actions which trigger the job enqueuing? If those can cancel their spawned actions then there's maybe not a problem there?

@soulomoon
Copy link
Collaborator Author

soulomoon commented Jun 28, 2024

no, unless we can rewrite session loader like other rules and update the STM variables all togather in the hls-graph compute post hooker. Which is a rather hard work to do. Maybe we can add this into todo list.

Copy link
Collaborator

@fendor fendor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good now, thanks!

@soulomoon soulomoon merged commit f0ba40b into master Jul 2, 2024
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues about memory consumption, responsiveness, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HLS gets stuck if session setup fails for any reason
3 participants