Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what if i want it to work on existing codebase and not from scratch ? #5

Open
hemangjoshi37a opened this issue Jul 12, 2024 · 12 comments

Comments

@hemangjoshi37a
Copy link

I want it to work on my existing project with multiple code files and with nested folders and multimodality with local models like ollama and lite-llm

@theyashwanthsai
Copy link
Owner

Yup, that is the next step. Basically "build your own Intern". It will take some time building tools and logic, but it sure is possible.

Right now Devyan is in early stage where We get applications from text promts

@hemangjoshi37a
Copy link
Author

Please let me know how can i help in this direction . Like you can suggest any edit to a file or something , i will update and PR it.

@theyashwanthsai
Copy link
Owner

theyashwanthsai commented Jul 12, 2024

I see.

I will let you know. I myself dont have a rough idea on how to implement. but i will let you know what can be done

@hemangjoshi37a
Copy link
Author

ok. Can we use graphrag : https://github.com/microsoft/graphrag for context generation using vector search ? or how can we feed inital context or base context into this system ? If we can feed base knowledge at the initial stage then we can achieve this .

@theyashwanthsai
Copy link
Owner

We should also make tools which can edit a file instead of rewriting. this will be tricky for the agent to edit in a file.

@hemangjoshi37a
Copy link
Author

it is simple , we can use prompt something like this :

you always response in git blame format with `-` and `+`  at the beginning of the line with few upper and lower lines for the reference . also give line numbers where edits are made.

using this we can parse edits from the response and replace it with existing code and run git commit with generated commit message .

@theyashwanthsai
Copy link
Owner

theyashwanthsai commented Jul 12, 2024

I see, But the problem with that approach is that we might hit token limits if the context from one task is too much. But can definitely give this approach a try

@theyashwanthsai
Copy link
Owner

We can do some tricks here with that approach, Works perfectly

@hemangjoshi37a
Copy link
Author

if we can decode what cursor.sh is doing then we can do this. if you know cursor IDE , it is a AI coding IDE based on VS Code , it is very good to use but one only and big problem is it is closed source and i dont know how it accesses information across different files . may be it uses some sophisticated RAG system specially designed code coding text.

@theyashwanthsai
Copy link
Owner

Yup. Theres also another approach where we programmatically create code blocks (Grouping snippets). And the operations can be done on these blocks. Not sure how good this idea might be in practical case wrt costs

@hemangjoshi37a
Copy link
Author

please review #6 for this

@jojogh
Copy link

jojogh commented Jul 13, 2024

Yup. Theres also another approach where we programmatically create code blocks (Grouping snippets). And the operations can be done on these blocks. Not sure how good this idea might be in practical case wrt costs

Use AST to find codebase dependancy and represent it to LLM(use as a context in prompt), and focus on the code snippets you want to update or write. That may solve the long context token problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants