-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linediff number-prefixed diff format #2174
base: main
Are you sure you want to change the base?
Conversation
Hi @dceluis,
Has this not been the case in your experimentation? |
Yer & no. by far the biggest problem i found was convincing the LLM to construct diffs that don't assume that patch hunks will be applied sequentially. this is straightforward to implement in software but LLMs get confused pretty quickly. my hypothesis was that a coding LLM would generate more correct patches if instructed to reference the original line numbers no matter what. it was surprisingly hard to convince 4o-mini to do this, though. since there must be conflicting references in the models' training database, https://github.com/google/diff-match-patch/wiki/Unidiff#3-rolling-context . it took a huge prompt but it worked and the logic for parsing the patches is arguably much simpler. stronger models have less issues, so you might try those and get better results. so I think the docs are mostly true but could be updated, this is why I made this PR too, to contribute my findings. & still use linediff on a day to day basis, although not through aider :) https://x.com/dceluis/status/1854601543963525576?t=OuuEDHsjo9Bsr10LBft3og&s=19 |
Do you mean that the LLM would always assume that changes would be applied sequentially, so that later changes would have line numbers that don't match the original file?
Yes I would've thought so too.. but if it seems to want to account for prior changes, what if you just let it do that? Does it do it accurately? |
I wonder if a tool-based approach using line numbers would work well... I read how it's generally less effective due to having to do JSON escaping of the code content, but I think it only reduced the performance by a small amount. Could be worth trying |
Also, regarding your diff format:
Isn't it kind of redundant to require both line numbers AND content to match? Wouldn't it be best to have just one or the other? Could just use a number range and not repeat the original content. Or does LLM struggle with this? |
Yes, it tends to do that. And this also hurst its ability to produce code, since a wrong line could be interpreted as a source or destination line.
Not very accurately, hence the Aider doc's notice.
I wouldn't discard it but my prior is that since there isn't much source code in the training database represented as JSON the models would have more difficulties producing quality code. |
for what I could see in my tests having both the line numbers AND the source line helps the models reason. more context is good. it's the same reason why i set the destination lines to NOT have line numbers so that the models do not get confused. I predict that with better models you would need less of these prompting tricks |
But if it is putting in the exact source line content then the numbers aren't even required at all right? (Assuming the content is unique in the file). Then it basically just becomes like the existing diff method used by aider. Have you tried an approach of having the LLM only provide the line numbers to replace? I'm tempted to try it out |
I have, although I wish i had documented more of what did and didn't work. IIRC it performs a bit worse because printing out the number + code turns the source-referencing into a recitation exercise: the simplest rule I could enforce to reduce confusion. But don't refrain from running the benchmark yourself, with any tweaks you think will improve results. I'm most interested in learning what you find! |
Hi there,
Would a new diff format be of interest to the project?
I wanted to write a simple format that would allow me to simplify the line-matching logic. It took me a while to get it to work reliably on gpt-4o-mini (it's what I could extensively test with). More advanced models should be able to comply without such a huge system prompt.
Pros:
Cons:
You can also have a look at https://github.com/dceluis/ln-diff for a deeper look into the decisions behind the format.
Benchmark for context:
I understand the implementation is nowhere near mergeable but I figured I might show it anyways for easier consideration.
Cheers!