-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama code review looping #825
Open
aidando73
wants to merge
1
commit into
meta-llama:main
Choose a base branch
from
aidando73:aidand-llama-code-review-demo
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
sandbox/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
## Llamas in Code Review | ||
|
||
<video name="llama-code-review-loop" src="https://github.com/user-attachments/assets/f717889a-e517-4380-a07b-9657319dd189" controls></video> | ||
|
||
In this example, we have two agents: | ||
|
||
- **Code Author:** Writes the code. | ||
- **Code Reviewer:** Reviews the code and provides constructive feedback. | ||
|
||
Together, they'll engage in multiple iterations, and over time improve the code. | ||
|
||
This demo demonstrates tool calls, structured outputs and looping with llama. | ||
|
||
## Setup | ||
|
||
### Prerequisites | ||
|
||
- Python 3.10+ | ||
- Docker | ||
|
||
### Running the demo | ||
|
||
We'll be using the fireworks llama-stack distribution to run this example - but you can use most other llama-stack distributions (instructions [here](https://llama-stack.readthedocs.io/en/latest/distributions/index.html)). | ||
(Though note that not all distributions support structured outputs yet e.g., Ollama). | ||
|
||
```bash | ||
# You can get this from https://fireworks.ai/account/api-keys - they give out initial free credits | ||
export FIREWORKS_API_KEY=<your-api-key> | ||
|
||
# This runs the llama-stack server | ||
export LLAMA_STACK_PORT=5000 | ||
docker run -it \ | ||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ | ||
-v ~/.llama:/root/.llama \ | ||
llamastack/distribution-fireworks \ | ||
--port $LLAMA_STACK_PORT \ | ||
--env FIREWORKS_API_KEY=$FIREWORKS_API_KEY | ||
``` | ||
|
||
Then to run the app: | ||
|
||
```bash | ||
# cd to this directory | ||
cd recipes/use_cases/coding/llamas-in-code-review | ||
|
||
# Create a virtual environment | ||
# Use your preferred method to create a virtual environment | ||
python -m venv .venv | ||
source .venv/bin/activate | ||
|
||
# Install llama-stack-client | ||
pip install llama-stack-client | ||
|
||
# Run the demo | ||
export LLAMA_STACK_PORT=5000 | ||
python app.py | ||
``` | ||
|
||
The agents will then start writing code in the ./sandbox directory. | ||
|
||
### Configuration | ||
|
||
You can customize the application's behavior by adjusting parameters in `app.py`: | ||
|
||
```python | ||
# The aim of the program | ||
PROGRAM_OBJECTIVE="a web server that has an API endpoint that translates text from English to French." | ||
|
||
# Number of code review cycles | ||
CODE_REVIEW_CYCLES = 5 | ||
|
||
# The model to use | ||
# 3.1 405B works the best, 3.3 70B works really well too, smaller models are a bit hit and miss. | ||
MODEL_ID = "meta-llama/Llama-3.3-70B-Instruct" | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,193 @@ | ||
import os | ||
from llama_stack_client import LlamaStackClient | ||
from tools import SANDBOX_DIR, TOOLS, run_tool | ||
import json | ||
|
||
PROGRAM_OBJECTIVE="a web server that has an API endpoint that translates text from English to French." | ||
|
||
# Number of code review cycles | ||
CODE_REVIEW_CYCLES = 5 | ||
|
||
# Works: | ||
MODEL_ID = "meta-llama/Llama-3.3-70B-Instruct" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that 3.3-70B for fireworks is not yet released - but will be in the next release: meta-llama/llama-stack#654 |
||
# MODEL_ID = "meta-llama/Llama-3.1-405B-Instruct-FP8" | ||
# MODEL_ID = "meta-llama/Llama-3.1-8B-Instruct" # Works okay | ||
|
||
# Note: Smaller models don't work very well in this example. But feel free to try them out. | ||
# MODEL_ID = "meta-llama/Llama-3.2-3B-Instruct" | ||
# MODEL_ID = "meta-llama/Llama-3.2-1B-Instruct" | ||
|
||
CODER_AGENT_SYSTEM_PROMPT=f""" | ||
You are a software engineer who is writing code to build a python codebase: {PROGRAM_OBJECTIVE}. | ||
""" | ||
|
||
REVIEWER_AGENT_SYSTEM_PROMPT=f""" | ||
You are a senior software engineer who is reviewing the codebase that was created by another software engineer. | ||
The program is {PROGRAM_OBJECTIVE}. | ||
If you think the codebase is good enough to ship, please say LGTM. | ||
""" | ||
|
||
# No limit on output tokens | ||
MAX_TOKENS = 200_000 | ||
|
||
def get_codebase_contents(): | ||
contents = "" | ||
for root, dirs, files in os.walk(SANDBOX_DIR): | ||
for file in files: | ||
# concatenate the file name | ||
contents += f"file: {file}:\n" | ||
with open(os.path.join(root, file), "r") as f: | ||
contents += f.read() | ||
contents += "\n\n" | ||
return contents | ||
|
||
|
||
BLUE = "\033[94m" | ||
MAGENTA = "\033[95m" | ||
GREEN = "\033[92m" | ||
RESET = "\033[0m" | ||
|
||
if "3.2" in MODEL_ID or "3.3" in MODEL_ID: | ||
tool_prompt_format = "python_list" | ||
else: | ||
tool_prompt_format = "json" | ||
|
||
client = LlamaStackClient(base_url=f"http://localhost:{os.environ['LLAMA_STACK_PORT']}") | ||
|
||
review_feedback = None | ||
for i in range(1, CODE_REVIEW_CYCLES + 1): | ||
print(f"{BLUE}Coder Agent - Creating Plan - Iteration {i}{RESET}") | ||
if review_feedback: | ||
prompt_feedback = f""" | ||
One of your peers has provided the following feedback: | ||
{review_feedback} | ||
Please adjust the plan to address the feedback. | ||
|
||
""" | ||
else: | ||
prompt_feedback = "" | ||
|
||
prompt =f""" | ||
Create a step by step plan to complete the task of creating a codebase that will {PROGRAM_OBJECTIVE}. | ||
You have 3 different operations you can perform. You can create a file, update a file, or delete a file. | ||
Limit your step by step plan to only these operations per step. | ||
Don't create more than 10 steps. | ||
|
||
Here is the codebase currently: | ||
{get_codebase_contents()} | ||
|
||
{prompt_feedback} | ||
Please ensure there's a README.md file in the root of the codebase that describes the codebase and how to run it. | ||
Please ensure there's a requirements.txt file in the root of the codebase that describes the dependencies of the codebase. | ||
""" | ||
response = client.inference.chat_completion( | ||
model_id=MODEL_ID, | ||
messages=[ | ||
{"role": "system", "content": CODER_AGENT_SYSTEM_PROMPT}, | ||
{"role": "user", "content": prompt}, | ||
], | ||
sampling_params={ | ||
"max_tokens": MAX_TOKENS, | ||
}, | ||
response_format={ | ||
"type": "json_schema", | ||
"json_schema": { | ||
"$schema": "http://json-schema.org/draft-07/schema#", | ||
"title": "Plan", | ||
"description": f"A plan to complete the task of creating a codebase that will {PROGRAM_OBJECTIVE}.", | ||
"type": "object", | ||
"properties": { | ||
"steps": { | ||
"type": "array", | ||
"items": { | ||
"type": "string" | ||
} | ||
} | ||
}, | ||
"required": ["steps"], | ||
"additionalProperties": False, | ||
} | ||
}, | ||
stream=True, | ||
) | ||
|
||
content = "" | ||
for chunk in response: | ||
if chunk.event.delta: | ||
print(chunk.event.delta, end="", flush=True) | ||
content += chunk.event.delta | ||
try: | ||
plan = json.loads(content) | ||
except Exception as e: | ||
print(f"Error parsing plan into JSON: {e}") | ||
plan = {"steps": []} | ||
print("\n") | ||
|
||
# Coding agent executes the plan | ||
print(f"{BLUE}Coder Agent - Executing Plan - Iteration {i}{RESET}") | ||
if review_feedback: | ||
prompt_feedback = f""" | ||
Keep in mind one a senior engineer has provided the following feedback: | ||
{review_feedback} | ||
|
||
""" | ||
else: | ||
prompt_feedback = "" | ||
|
||
for step in plan["steps"]: | ||
prompt = f""" | ||
You have 3 different operations you can perform. create_file(path, content), update_file(path, content), delete_file(path). | ||
Here is the codebase: | ||
{get_codebase_contents()} | ||
Please perform the following operation: {step} | ||
|
||
{prompt_feedback} | ||
Please don't create incomplete files. | ||
""" | ||
try: | ||
response = client.inference.chat_completion( | ||
model_id=MODEL_ID, | ||
messages=[ | ||
{"role": "system", "content": CODER_AGENT_SYSTEM_PROMPT}, | ||
{"role": "user", "content": prompt}, | ||
], | ||
sampling_params={ | ||
"max_tokens": MAX_TOKENS, | ||
}, | ||
tools=TOOLS, | ||
tool_prompt_format=tool_prompt_format, | ||
) | ||
except Exception as e: | ||
print(f"Error running tool - skipping: {e.message[:50] + '...'}") | ||
continue | ||
message = response.completion_message | ||
if message.content: | ||
print("Didn't get tool call - got message: ", message.content[:50] + "...") | ||
else: | ||
tool_call = message.tool_calls[0] | ||
run_tool(tool_call) | ||
print("\n") | ||
|
||
print(f"{MAGENTA}Reviewer Agent - Reviewing Codebase - Iteration {i}{RESET}") | ||
response = client.inference.chat_completion( | ||
model_id=MODEL_ID, | ||
messages=[ | ||
{"role": "system", "content": REVIEWER_AGENT_SYSTEM_PROMPT}, | ||
{"role": "user", "content": f""" | ||
Here is the full codebase: | ||
{get_codebase_contents()} | ||
Please review the codebase and make sure it is correct. | ||
Please provide a list of changes you would like to make to the codebase. | ||
"""}, | ||
], | ||
sampling_params={ | ||
"max_tokens": MAX_TOKENS, | ||
}, | ||
stream=True, | ||
) | ||
review_feedback = "" | ||
for chunk in response: | ||
if chunk.event.delta: | ||
print(chunk.event.delta, end="", flush=True) | ||
review_feedback += chunk.event.delta | ||
print("\n") |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR incoming for Ollama though: meta-llama/llama-stack#680