Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ollama tool calls not working via openai proxy only when using langgraph #1153

Open
5 tasks done
StreetLamb opened this issue Jul 27, 2024 · 3 comments
Open
5 tasks done

Comments

@StreetLamb
Copy link

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangGraph/LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangGraph/LangChain rather than my code.
  • I am sure this is better as an issue rather than a GitHub discussion, since this is a LangGraph bug and not a design question.

Example Code

from typing import Literal

from langchain_core.messages import HumanMessage
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langgraph.checkpoint import MemorySaver
from langgraph.graph import END, MessagesState, StateGraph
from langgraph.prebuilt import ToolNode
import asyncio


@tool
def search(query: str):
    """Call to surf the web."""
    if "sf" in query.lower() or "san francisco" in query.lower():
        return "It's 60 degrees and foggy."
    return "It's 90 degrees and sunny."


tools = [search]

tool_node = ToolNode(tools)

model = ChatOpenAI(
    model="llama3.1", base_url="http://localhost:11434/v1", temperature=0
).bind_tools(tools)


def should_continue(state: MessagesState) -> Literal["tools", END]:
    messages = state["messages"]
    last_message = messages[-1]
    if last_message.tool_calls:
        return "tools"
    return END


async def call_model(state: MessagesState, config):
    messages = state["messages"]
    response = await model.ainvoke(messages, config)
    return {"messages": [response]}


workflow = StateGraph(MessagesState)

workflow.add_node("agent", call_model)
workflow.add_node("tools", tool_node)

workflow.set_entry_point("agent")

workflow.add_conditional_edges(
    "agent",
    should_continue,
)

workflow.add_edge("tools", "agent")

checkpointer = MemorySaver()

app = workflow.compile(checkpointer=checkpointer)

async def test():
    async for event in app.astream_events(
        {"messages": [HumanMessage(content="what is the weather in sf")]},
        version="v1",
        config={"configurable": {"thread_id": 42}},
    ):
        print(event)


asyncio.run(test())

Error Message and Stack Trace (if applicable)

No response

Description

I want to invoke a tool-calling compatible Ollama model through ChatOpenAI proxy. However, using the code above, the model does not properly tool call:

{'event': 'on_chain_end', 'data': {'output': {'messages': [HumanMessage(content='what is the weather in sf', id='f7017ae4-b2d0-49e3-b939-69738686368b'), AIMessage(content='{"name": "search", "parameters": {"query": "sf weather"}}', response_metadata={'finish_reason': 'stop', 'model_name': 'llama3.1', 'system_fingerprint': 'fp_ollama'}, id='run-6a214185-27ba-4505-9cc2-574f20d04909')]}}, 'run_id': 'b2aeba64-38b0-447a-b7de-eefff49e3555', 'name': 'LangGraph', 'tags': [], 'metadata': {'thread_id': 43}, 'parent_ids': []}

However, the behaviour is different when using just langchain:

import asyncio

import openai
from langchain_core.messages import HumanMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI

@tool
def search(query: str):
    """Call to surf the web."""
    if "sf" in query.lower() or "san francisco" in query.lower():
        return "It's 60 degrees and foggy."
    return "It's 90 degrees and sunny."


model = ChatOpenAI(model="llama3.1", base_url="http://localhost:11434/v1", temperature=0)

model_with_tools = model.bind_tools([search])

async def test():
    prompt = ChatPromptTemplate.from_messages(
        [
            MessagesPlaceholder(variable_name="messages"),
        ]
    )
    chain = prompt | model_with_tools
    response = await chain.ainvoke(
        {"messages": [HumanMessage(content="What is the weather like in sf")]}
    )
    return response


response = asyncio.run(test())
print(response)

This way the model correctly utilise a tool call:

content='' additional_kwargs={'tool_calls': [{'id': 'call_xncx3ycn', 'function': {'arguments': '{"query":"sf weather"}', 'name': 'search'}, 'type': 'function'}]} response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 148, 'total_tokens': 165}, 'model_name': 'llama3.1', 'system_fingerprint': 'fp_ollama', 'finish_reason': 'stop', 'logprobs': None} id='run-63a9efd2-6619-448b-9a89-476f45cfb5c8-0' tool_calls=[{'name': 'search', 'args': {'query': 'sf weather'}, 'id': 'call_xncx3ycn', 'type': 'tool_call'}] usage_metadata={'input_tokens': 148, 'output_tokens': 17, 'total_tokens': 165}

System Info

langchain==0.2.7
langchain-anthropic==0.1.20
langchain-cohere==0.1.5
langchain-community==0.2.7
langchain-core==0.2.21
langchain-google-genai==1.0.5
langchain-ollama==0.1.0
langchain-openai==0.1.17
langchain-qdrant==0.1.1
langchain-text-splitters==0.2.0
langchain-weaviate==0.0.1.post1

platform: mac silicon
python version: Python 3.12.2

@hinthornw
Copy link
Contributor

hinthornw commented Jul 27, 2024

Oh interesting. How many times did you run both versions? Is it reliably different in both contexts? And then you've confirmed the ollama versions are the same in both scenarios?

@StreetLamb
Copy link
Author

Hi @hinthornw , I ran them multiple times (and once more just now just to be sure) and the behaviour is consistent. Tested both on ollama v0.3.0.

@StreetLamb
Copy link
Author

It seems to be cause by astream_events method. Using langgraph with astream works:

...
async def test():
    async for event in app.astream(
        {"messages": [HumanMessage(content="what is the weather in sf")]},
        config={"configurable": {"thread_id": 42}},
    ):
        print(event)


asyncio.run(test())
{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_rs3ykbgl', 'function': {'arguments': '{"query":"sf weather"}', 'name': 'search'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 147, 'total_tokens': 164}, 'model_name': 'llama3.1', 'system_fingerprint': 'fp_ollama', 'finish_reason': 'stop', 'logprobs': None}, id='run-3175464b-50ce-4fa7-afc7-73fe14cf92ee-0', tool_calls=[{'name': 'search', 'args': {'query': 'sf weather'}, 'id': 'call_rs3ykbgl', 'type': 'tool_call'}], usage_metadata={'input_tokens': 147, 'output_tokens': 17, 'total_tokens': 164})]}}
{'tools': {'messages': [ToolMessage(content="It's 60 degrees and foggy.", name='search', tool_call_id='call_rs3ykbgl')]}}
{'agent': {'messages': [AIMessage(content='Based on the tool call response, I can format an answer to your original question:\n\nThe current weather in San Francisco (SF) is 60 degrees with fog.', response_metadata={'token_usage': {'completion_tokens': 34, 'prompt_tokens': 86, 'total_tokens': 120}, 'model_name': 'llama3.1', 'system_fingerprint': 'fp_ollama', 'finish_reason': 'stop', 'logprobs': None}, id='run-0c048f94-63ae-45d1-9808-4083dd65ec0d-0', usage_metadata={'input_tokens': 86, 'output_tokens': 34, 'total_tokens': 120})]}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants