Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

various race conditions in applying old edits to updated documents #19

Open
wrongu opened this issue Jun 8, 2023 · 8 comments
Open
Labels
bug Something isn't working git good first issue Good for newcomers

Comments

@wrongu
Copy link
Contributor

wrongu commented Jun 8, 2023

current trigger/edit lifecycle is

  1. git pull
  2. create AI triggers
  3. git pull (to "debounce" / check for recent human edits and potentially cancel the triggers)
  4. run AI
  5. push changes

but the extra pull in step 3 can invalidate the triggers in step 2

should be able to just remove the extra pull

@wrongu wrongu added bug Something isn't working good first issue Good for newcomers git labels Jun 8, 2023
@j6k4m8
Copy link
Member

j6k4m8 commented Jun 8, 2023

need to pull RIGHT BEFORE we push to minimize merge errors (esp with overleaf since it rewrites git history all the time, but also generally true of all git)...

@wrongu
Copy link
Contributor Author

wrongu commented Jun 8, 2023

I understand wanting to minimize merge errors, but that's also a race condition!

Imagine:

  1. pull
  2. trigger an edit
  3. check for debouncing --> ok! no edits in the last few moments
  4. simultaneously (LLM starts working) and (user deletes a line earlier in the paper)
  5. pull again and edit lines

now the lines being edited in step 5 != the same lines that were triggered in step 2

basically we're rediscovering the old problem that "simultaneous editing is hard" and this is why OT was invented. And one of the core ideas of OT is that every edit needs to be relative to some timestamped/hashed version of the document. If you try to change the hash/timestamp of an edit before/after the edit is done, you're gonna have a bad time.

Right now we're using git merge as a hacky version of OT. git already does timestamping/versioning for us using commits. We just need to lean into that for now, and follow the rule that 1 edit = 1 commit

@j6k4m8
Copy link
Member

j6k4m8 commented Jun 8, 2023 via email

@wrongu
Copy link
Contributor Author

wrongu commented Jun 8, 2023

In that case we really need our own mini-OT system. This would be a function with a signature like

def apply_ot_using_git(commit1, edit1, commit2) -> edit2:
    ....

And the simplest version of this function would be just to update the line numbers in the edit to account for any changes that happened between commit1 and commit2 OR raise an error if commit1->commit2 involved changes to the same lines

@wrongu
Copy link
Contributor Author

wrongu commented Jun 8, 2023

...follow up thought: as long as we're happy to lean heavily on git, and we're happy to just error-out when it fails, the ot function here can be a git rebase. We would apply edit1 to commit1, commit it as commit3, and then rebase commit3 onto commit2

@wrongu
Copy link
Contributor Author

wrongu commented Jun 9, 2023

...follow up Part II: refactoring the trigger lifecycle further, we want to allow 1 Trigger to give rise to any number of Edits. So I think we may want PaperRemote to support perform_batch_of_edits() not just perform_edit(), which is another reason where we want our own OT, but this one might be ugly if done using git 😢

@j6k4m8
Copy link
Member

j6k4m8 commented Jun 9, 2023

sigh —

EditResponse should probably have some signature like,

class EditResponse:

    def as_diff(self, repo):
        ...
        
    def as_ot(self, original_text):
        ...
        
    def as_totally_new_string(self, original_text):
        ...

so that we can apply an edit to any doc type....

or maybe EditResponse is super simple in structure but each paperremote needs to know how to apply it themselves.... have NO idea what that API would look like.

@wrongu
Copy link
Contributor Author

wrongu commented Jun 9, 2023

See #21 . I made EditResult a data class to parallel EditTrigger. I figure we want the Trigger/Result semantics to be independent of a particular remote, but we can hold the remotes to a high-ish standard where they must provide some basic OT

@wrongu wrongu changed the title race condition in debouncing various race conditions in applying old edits to updated documents Jun 12, 2023
wrongu added a commit that referenced this issue Jun 22, 2023
* add test fixture for OverleafGitPaperRemote
    * testing that git can resolve non-conflicting edits using merge (partially addressing #19)
    * testing that we recover gracefully from merge conflicts and other kinds of bad edits
* better git diff handling to test for recent human edits
* handling edits' revision_id in OverleafGitPaperRemote. Snazzy `with paper.rewind(commit_id)` syntax
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working git good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants