Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server out of sync with file when editing imports on Windows #3787

Closed
fpvandoorn opened this issue Nov 24, 2023 · 4 comments · Fixed by #4821
Closed

server out of sync with file when editing imports on Windows #3787

fpvandoorn opened this issue Nov 24, 2023 · 4 comments · Fixed by #4821
Labels
bug Something isn't working

Comments

@fpvandoorn
Copy link
Contributor

fpvandoorn commented Nov 24, 2023

Description

On Windows, the following steps reliably cause the server to go out of sync with the file. This issue is not reproducible on Linux.

Context

Zulip thread

Steps to Reproduce

  1. Download mathlib and its cache (I can reproduce on v4.2.0, but presumably also every other version)
  2. Create a new file with the following contents, and wait until Lean has compiled the file
import Mathlib
#eval 1+1
  1. Delete the first line all at once (e.g. with ctrl+X). On 2 separate Windows computers this does not cause a recompilation of the file
  2. Type the letter a on the first line and wait until Lean shows an error
  3. Remove the letter a. The error remains. The error remains even if you start typing new commands (e.g. def foo := 1), and the server is interpreting as a letter a floating somewhere in your file. It disappears once you write import again.

I experimented with importing less than Mathlib, and below a certain number of imports, this behavior seems to go away. (e.g. if we replace import Mathlib with import Std, then step 3 does cause a recompilation of the file.

Impact

Add 👍 to issues you consider important. If others are impacted by this issue, please ask them to add 👍 to it.

@fpvandoorn fpvandoorn added the bug Something isn't working label Nov 24, 2023
@mhuisi mhuisi transferred this issue from leanprover/vscode-lean4 Mar 27, 2024
@Kha
Copy link
Member

Kha commented Apr 17, 2024

The relevant code has been almost completely rewritten on master, it would be good to check if 4.8.0-rc1 still exhibits the behavior when it lands

@mhuisi
Copy link
Contributor

mhuisi commented May 29, 2024

@fpvandoorn Do you still encounter this issue on 4.8.0?

@fpvandoorn
Copy link
Contributor Author

fpvandoorn commented May 29, 2024

Yes, both this issue and #3786 still happen for me. In fact, with the steps described in this issue, both these behaviors can happen. I'm testing this now on Mathlib 826a3e67f2 (from yesterday), Lean version 4.8.0-rc2.

Executing steps 1-3 can lead now to one of the following three situations:

  • The server doesn't recompile the file at all after step 3, and the behavior of steps 4-5 happens
  • The server starts recompiling, but never stops. The yellow bars stay for many seconds (issue server hangs when changing imports #3786)
  • Sometimes we get the intended behavior: the server recompiles the file for <0.1 second, and the file behaves completely normally afterwards.

These situations can all happen after performing these steps, even when trying to do exactly the same thing. If I take the file and repeatedly remove and add the import-line with uses of ctrl+X and ctrl+V, I can get all three outcomes on different iterations.

@mhuisi
Copy link
Contributor

mhuisi commented May 29, 2024

Ok, I can reproduce the outcome of step 5 on Windows as well. I haven't seen the server eternally recompiling yet, though.

github-merge-queue bot pushed a commit that referenced this issue Aug 7, 2024
This PR resolves two language server bugs that especially affect Windows
users:
1. Editing the header could result in the watchdog not correctly
restarting the file worker (#3786, #3787), which would lead to the file
seemingly being processed forever.
- The cause of this issue was a race condition in the watchdog that was
accidentally introduced as far back as #1884: In specific circumstances,
the watchdog will attempt forwarding a message to the file worker after
the process has exited due to a changed header, but before the file
worker exiting has been noticed by the watchdog (which will then restart
the file worker). In this case, the watchdog would mark the file worker
as having crashed and not look at its exit code to restart the file
worker, but instead treat it like a crashed file worker that will only
be restarted when editing the file again. Not inspecting the exit code
of the file worker when it crashed from forwarding a message from the
file worker is necessary since we do not restart the file worker until
another notification from the client arrives, and so we would read the
same crash exit code over and over again in the main loop of the
watchdog if we did not remove it from our list of file workers that we
listen to.
- This PR resolves this issue by distinguishing between "crashes when
forwarding messages to the file worker" and "crashes when forwarding
messages from the file worker". In the former case, we still inspect the
exit code of the file worker and potentially restart it if the imports
changed, whereas in the latter case, we stop inspecting the exit code of
the file worker. This is correct because the latter case is exactly the
one where we need to stop inspecting the exit code but where a crash
cannot occur as a result of a changed header, whereas the former case is
exactly the one where we still need to inspect the exit code after a
crash to ensure that we restart the file worker in case it exited
because the header changed.
- At some point in the future, it would be nice to revamp the
concurrency model of the watchdog entirely now that we have all those
fancy concurrency primitives that were not available four years ago when
the watchdog was first written.

2. On an especially slow Windows machine, we found that starting the
language server would sometimes not succeed at all because reading from
the stdin pipe in the watchdog produced an EINVAL error, which was in
turn caused by an NT "pipe empty" error.
- After lots of debugging, @Kha found that Lake accidentally passes its
stdin to Git because it does not explicitly set the `stdin` field to
`null` when spawning the process.
- Changing this fixes the issue, which suggests that Git may mutate the
pipe we pass to it to be non-blocking, which then causes a "pipe empty"
error in the watchdog when we also attempt to read from that same pipe.
- I'm still very uncertain why we only saw this issue on one
particularly slow machine and not across the whole eco system.

This PR also resolves an issue where we would not correctly emit
messages that we received while the file worker is being restarted to
the corresponding file worker after the restart.

Closes #3786, closes #3787.

---------

Co-authored-by: Sebastian Ullrich <[email protected]>
@Kha Kha closed this as completed in #4821 Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants