Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable code_interpreter in tool-calling agent #407

Open
1 of 2 tasks
subramen opened this issue Nov 8, 2024 · 2 comments
Open
1 of 2 tasks

Disable code_interpreter in tool-calling agent #407

subramen opened this issue Nov 8, 2024 · 2 comments

Comments

@subramen
Copy link
Contributor

subramen commented Nov 8, 2024

System Info

..

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

I am setting up a search agent exactly as shown here: https://github.com/meta-llama/llama-stack-apps/blob/7c92eb274924b38b110ca1759dd487817980e5af/examples/agents/client.py#L38

Despite no instructions to write or execute code, the agent automatically invokes code_interpreter and errors out with AssertionError: Tool code_interpreter not found. This appears to happen when the assistant response contains any code.

How do I explicitly disable code_interpreter?

Error logs

role='user' content='Query: What methods are best for finetuning llama?\n\nSpecialist answers:Based on the provided context, it appears that finetuning LLaMA is not directly mentioned in the code snippets. However, I can infer that finetuning LLaMA is likely to be performed using the `llama_recipes.finetuning` module.\n\nIn the `finetuning.py` file, the `main` function is imported from `llama_recipes.finetuning`, which suggests that this file contains the code for finetuning LLaMA.\n\nTo finetun...<more>...Guard.\n\nAs for finetuning Llama in general, it seems that the provided context only provides information on finetuning Llama Guard, which is a specific application of the Llama model. For general finetuning of Llama, you may need to refer to the official documentation or other external resources.\n\nHowever, based on the provided context, it seems that the `finetune_vision_model.md` file in the `quickstart` folder may provide some information on finetuning Llama for vision tasks.' context=None


role='assistant' content='' stop_reason=<StopReason.end_of_turn: 'end_of_turn'> tool_calls=[ToolCall(call_id='83e9d075-144b-40bc-9319-ca073ad562f9', tool_name=<BuiltinTool.code_interpreter: 'code_interpreter'>, arguments={'code': 'Based on the provided context, it appears that finetuning LLaMA is not directly mentioned in the code snippets. However, I can infer that finetuning LLaMA is likely to be performed using the `llama_recipes.finetuning` module.\n\nTo finetune LLaMA, you would likely need to use the `train_config` from `llama_recipes.configs.training` module, which is imported in the `configs` module.\n\nHere are the possible methods for finetuning LLaMA:\n\n1. Using the `train_config` from `llama_recipes.configs.training` module:\n    ```python\nfrom llama_recipes.configs.training import train_config\n\n# Load the train config from a YAML file\nconfig = load_config(config_path="./config.yaml")\n\n# Update the config with the finetuning configuration\nconfig.update(train_config)\n\n# Use the updated config to finetune LLaMA\nmain(config)\n```\n\n2. Using the `lora_config` from `llama_recipes.configs.peft` module:\n    ```python\nfrom llama_recipes.configs.peft import lora_config\n\n# Load the lora config from a YAML file\nconfig = load_config(config_path="./config.yaml")\n\n# Update the config with the lora configuration\nconfig.update(lora_config)\n\n# Use the updated config to finetune LLaMA\nmain(config)\n```\n\n3. Using the `fsdp_config` from `llama_recipes.configs.fsdp` module:\n    ```python\nfrom llama_recipes.configs.fsdp import fsdp_config\n\n# Load the fsdp config from a YAML file\nconfig = load_config(config_path="./config.yaml")\n\n# Update the config with the fsdp configuration\nconfig.update(fsdp_config)\n\n# Use the updated config to finetune LLaMA\nmain(config)\n```\n\n4. Using the `wandb_config` from `llama_recipes.configs.wandb` module:\n    ```python\nfrom llama_recipes.configs.wandb import wandb_config\n\n# Load the wandb config from a YAML file\nconfig = load_config(config_path="./config.yaml")\n\n# Update the config with the wandb configuration\nconfig.update(wandb_config)\n\n# Use the updated config to finetune LLaMA\nmain(config)\n```\n\n5. Using the `quantization_config` from `llama_recipes.configs.quantization` module:\n    ```python\nfrom llama_recipes.configs.quantization import quantization_config\n\n# Load the quantization config from a YAML file\nconfig = load_config(config_path="./config.yaml")\n\n# Update the config with the quantization configuration\nconfig.update(quantization_config)\n\n# Use the updated config to finetune LLaMA\nmain(config)\n```\n\nNote that these are just possible methods and may require additional configuration and setup. The actual finetuning process may involve more steps and parameters, and may require additional libraries and dependencies.\n\nHowever, based on the provided context, it seems that the `finetune_vision_model.md` file in the `quickstart` folder may provide some information on finetuning LLaMA for vision tasks.'})]



Traceback (most recent call last):
  File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 206, in sse_generator
    async for item in await event_gen:
  File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agents.py", line 138, in _create_agent_turn_streaming
    async for event in agent.create_and_execute_turn(request):
  File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 179, in create_and_execute_turn
    async for chunk in self.run(
  File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 252, in run
    async for res in self._run(
  File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 560, in _run
    result_messages = await execute_tool_call_maybe(
  File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 824, in execute_tool_call_maybe
    assert name in tools_dict, f"Tool {name} not found"
AssertionError: Tool code_interpreter not found

Expected behavior

don't call code_interpreter, just use search.

This was referenced Dec 6, 2024
@aidando73
Copy link
Contributor

I've been able to reproduce: https://github.com/aidando73/llama-stack-apps/pull/3/files#diff-1ebfaf6cb3592166b73835fa82333cb7109e7c624865c0039a7b22ff34aa27fa

Traceback (most recent call last):
  File "/Users/aidand/dev/llama-stack/llama_stack/distribution/server/server.py", line 158, in sse_generator
    async for item in event_gen:
  File "/Users/aidand/dev/llama-stack/llama_stack/providers/inline/agents/meta_reference/agents.py", line 153, in _create_agent_turn_streaming
    async for event in agent.create_and_execute_turn(request):
  File "/Users/aidand/dev/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 179, in create_and_execute_turn
    async for chunk in self.run(
  File "/Users/aidand/dev/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 250, in run
    async for res in self._run(
  File "/Users/aidand/dev/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 568, in _run
    result_messages = await execute_tool_call_maybe(
  File "/Users/aidand/dev/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 833, in execute_tool_call_maybe
    assert name in tools_dict, f"Tool {name} not found"
AssertionError: Tool code_interpreter not found
[INFO] role='assistant' content='' stop_reason=<StopReason.end_of_turn: 'end_of_turn'> tool_calls=[ToolCall(call_id='effb0bb7-ebb2-4baf-8a3a-941c99dc0cca', tool_name=<BuiltinTool.code_interpreter: 'code_interpreter'>, arguments={'code': 'import os\nfrom llama_index import LLaMAIndex\nfrom llama_index.finer_tuning import fine_tune_orma\n\n\n# Initialize the LLaMA index\nllama_index = LLaMAIndex()\n\n\ndef main():\n    # Load the pre-trained Llama model\n    model_name = "large"\n    model_path = os.path.join(llama_index.model_dir, f"{model_name}.pth")\n\n    # Fine-tune the loaded model on a custom dataset\n    fine_tune_orma(\n        model_path=model_path,\n        train_data_path="path/to/custom/train/data",\n        eval_data_path="path/to/custom/eval/data",\n        batch_size=32,\n        num_epochs=5,\n    )\n\n\nif __name__ == "__main__":\n    main()'})]

Let me look into it

@aidando73
Copy link
Contributor

aidando73 commented Dec 15, 2024

tl;dr: The model is hallucinating and we don't check whether the client passed in the `code_interpreter` tool.

The inference request the model gets is correct:

tools=[ToolDefinition(tool_name=<BuiltinTool.brave_search: 'brave_search'>, description=None, parameters=None)]

Doesn't include code_interpreter.

But the raw message I get back is:

<|python_tag|>import os
from llama_index import llama_recipes
from llama_index.finetuning import main


def fine_tune_lamaguard():
    # Define the path to the finetuning configuration file
...

The <|python_tag|> is interpreted as a tool_call and defaults to code_interpreter even when it's not specified in the request.

for tool_call in message.tool_calls:
yield ChatCompletionResponseStreamChunk(
event=ChatCompletionResponseEvent(
event_type=ChatCompletionResponseEventType.progress,
delta=ToolCallDelta(
content=tool_call,
parse_status=ToolCallParseStatus.success,
),
stop_reason=stop_reason,
)
)

I've submitted a PR so the agent doesn't call the tool if we haven't enabled it: #637

As for your example @subramen - ooc in your user prompt it looks like there's additional stuff (it looks like some assistant responses as well?), what's stopping you from doing something like this?

System: You are a helpful assistant. If you don't know the answer, use the brave_search tool to search the web.

User: Query: What methods are best for finetuning llama?

Which doesn't cause the code_interpreter tool call

inference> brave_search.call(query="llama finetuning methods")
tool_execution> Tool:brave_search Response:{"query": "llama finetuning methods", "top_k": [{"title": "Fine-Tuning LLaMA 2: A Step-by-Step Guide to Customizing the Large Language Model | DataCamp", "url": "https://www.datacamp.com/tutorial/fine-tuning-llama-2", "description": "Learn how to fine-tune <strong>Llama</strong>-2 using new techniques to overcome memory and computing limitations to make open-source large language models more accessible", "type": "search_result"}, {"title": "Fine-Tuning Llama in practices | by Luc Nguyen", "url": "https://medium.com/@lucnguyen_61589/fine-tuning-llama-in-practices-bc7f3feb1ac4", "description": "If you\u2019re new to this concept ... models. In a previous blog post, I
...
inference> The final answer to the user's question "What methods are best for finetuning llama?" is:

There are several methods for fine-tuning LLaMA, including full fine-tuning and parameter-efficient fine-tuning (PEFT)
...

wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants