Help needed with chat engine(condense or context) HELP!!! #12996

pathucr7 · 2024-04-21T12:11:03Z

pathucr7
Apr 21, 2024

Question
I am trying to create a RAG chatbot that only answers from the content that i provide through pdf and never uses its own knowledge to answer or add information to the answer that is outside of the context. At this moment i am facing three issues:-

the answer length is quite small
the follow up question that it creates is totally of the mark
model overload issues
As i am creating a RAG chatbot for e-book summarization the answers are meant to be lengthy. I have used HuggingFace for embeddings as well as for the LLM. There was very less docs to refer to when using CondenseQuestionChatEngine so add to learn for myself. I thought i had solved it then i realize it was running on default llm which is openai, and after solving that issue all the prompts that i created for the Condensequestion started to fail, now if i run the question in query engine and print it i get the right answers. but the chat engine when trying to make sense using the chat history it completely goes of the grid and ask the wrong question all together and way it is setup the bot will revert with i don't know.
now i tried to switch to context chat engine just to compare the results its giving me an error:-
NotImplementedError("Messages passed in must be of odd length.")
here is my code for reference:-
`
import streamlit as st
from llama_index.core import Settings
import logging
import sys
import os.path

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index.core.service_context import set_global_service_context

from llama_index.llms.llama_cpp import LlamaCPP
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.embeddings.huggingface import HuggingFaceBgeEmbeddings
from langchain_community.llms.huggingface_endpoint import HuggingFaceEndpoint
from llama_cpp import Llama

from langchain.llms import HuggingFaceHub
from llama_index.core.prompts.chat_prompts import ChatPromptTemplate, ChatMessage,MessageRole
from llama_index.core.chat_engine import CondenseQuestionChatEngine, ContextChatEngine

from llama_index.legacy.prompts import ChatPromptTemplate
from llama_index.core.base.llms.types import ChatMessage, MessageRole
try:
from llama_index import VectorStoreIndex, ServiceContext, Document, SimpleDirectoryReader, StorageContext, load_index_from_storage
except ImportError:
from llama_index.core import VectorStoreIndex, ServiceContext, Document, SimpleDirectoryReader, StorageContext, load_index_from_storage

from llama_index.llms.huggingface import (
HuggingFaceInferenceAPI,
HuggingFaceLLM,
)
from llama_index.llms.llama_cpp.llama_utils import (
messages_to_prompt,
completion_to_prompt,
)
from huggingface_hub import login

login("MY_HUGGINGFACE_API_KEY")

from transformers import AutoTokenizer

st.set_page_config(page_title="Chat with the Streamlit docs, powered by LlamaIndex", page_icon="🦙", layout="centered", initial_sidebar_state="auto", menu_items=None)

st.title("Chat with the Streamlit docs, powered by LlamaIndex 💬🦙")
st.info("Check out the full tutorial to build this app in our blog post", icon="📃")

from llama_index.core import PromptTemplate

prompt_template = """### System: Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Only return the helpful answer below and nothing else.
Helpful answer:
"""

if "messages" not in st.session_state.keys(): # Initialize the chat messages history
st.session_state.messages = [
{"role": "assistant", "content": "Ask me a question about the E-books!"}
]

PERSIST_DIR = "./storage"

@st.cache_resource(show_spinner=False)
def load_data():
with st.spinner(text="Loading and indexing the E-books – hang tight! This should take 1-2 minutes."):

reader = SimpleDirectoryReader(input_dir="./data", recursive=True)

docs = reader.load_data()

llm = HuggingFaceInferenceAPI(
generate_kwargs={"temperature": 0.0},
model_name="meta-llama/Llama-2-70b-chat-hf",
)
model_name = "BAAI/bge-large-en"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': False}
embed_model = HuggingFaceBgeEmbeddings(
model_name=model_name,
model_kwargs=model_kwargs,
encode_kwargs=encode_kwargs
)
service_context=ServiceContext.from_defaults(
chunk_size=1000,
chunk_overlap=100,
embed_model=embed_model,
llm=llm
)
set_global_service_context(service_context)
if not os.path.exists(PERSIST_DIR):
reader = SimpleDirectoryReader(input_dir="./data", recursive=True)
Settings.llm = llm
Settings.embed_model = embed_model
docs = reader.load_data()
index = VectorStoreIndex.from_documents(documents=docs, service_context=service_context)
index.storage_context.persist(persist_dir=PERSIST_DIR)
st.write("LoadEmbedding>>>", index)
return index
else:

load the existing index

Settings.llm = llm
Settings.embed_model = embed_model
storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
index = load_index_from_storage(storage_context)
st.write("StoredEmbedding>>>", index)
return index

index = load_data()

def generate_text(prompt):
question = ("tell me a story with a lesson?")

qa_prompt_str = (
"Context information is below.\n"
"---------------------\n"
"{context_str}\n"
"---------------------\n"
"Given only the context information and not prior knowledge, "
"answer the question: {query_str}\n"
)

refine_prompt_str = (
"We have the opportunity to refine the original answer "
"(only if needed) with some more context below.\n"
"------------\n"
"{context_msg}\n"
"------------\n"
"Given the new context, refine the original answer to better "
"answer the question: {query_str}. "
"If the context isn't useful, output the original answer again.\n"
"Original Answer: {existing_answer}"
)

chat_text_qa_msgs = [
ChatMessage(
role=MessageRole.SYSTEM,
content=(
prompt_template
),
),
ChatMessage(
role=MessageRole.USER,
content=(
qa_prompt_str
),
),
]
text_qa_template = ChatPromptTemplate(chat_text_qa_msgs)

Refine Prompt
chat_refine_msgs = [
ChatMessage(
role=MessageRole.SYSTEM,
content=(
"If the context isn't helpful, just say I don't know. Don't any add informtion into the answer that is not available in the context"
),
),
ChatMessage(
role=MessageRole.USER,
content=(
"New Context: {context_msg}\n"
"Query: {query_str}\n"
"Original Answer: {existing_answer}\n"
"New Answer: "
),
),
]
refine_template = ChatPromptTemplate(chat_refine_msgs)

custom_prompt = PromptTemplate(
"""
Given a conversation (between Human and Assistant) and a follow up message from Human,
rewrite the message to be a standalone question that captures all relevant context
from the conversation.

{chat_history} {question} """ )
list of ChatMessage objects
custom_chat_history = [
ChatMessage(
role=MessageRole.USER,
content="Hello assistant, we are having an insightful discussion about the given content and you are helping me understand the content by answering or summerizing and explaining me the content without changing its true meaning.",
),
ChatMessage(role=MessageRole.ASSISTANT, content="Okay, sounds good."),
]

if "chat_engine" not in st.session_state.keys(): # Initialize the chat engine

query_engine=(
index.as_query_engine(text_qa_template=text_qa_template, refine_template=refine_template, llm=Settings.llm)

)
chat_engine = CondenseQuestionChatEngine.from_defaults(
    query_engine=query_engine,
    condense_question_prompt=custom_prompt,
    chat_history=custom_chat_history,
    verbose=True,
)
st.session_state.chat_engine = chat_engine

if prompt := st.chat_input("Your question"): # Prompt for user input and save to chat history
st.session_state.messages.append({"role": "user", "content": prompt})

for message in st.session_state.messages: # Display the prior chat messages
with st.chat_message(message["role"]):
st.write(message["content"])

If last message is not from assistant, generate a new response
if st.session_state.messages[-1]["role"] != "assistant":
with st.chat_message("assistant"):
with st.spinner("Thinking..."):
response = st.session_state.chat_engine.chat(prompt)

response = st.session_state.chat_engine.stream_chat(prompt,chat_engine.chat_history)

st.write(response.response)
message = {"role": "assistant", "content": response.response}
st.session_state.messages.append(message) # Add response to message history`

I am now sharing some of the examples:-

here i asked about Mumbai and a story knowledge about both of them is given by me in the form of pdf.
image

as you can see it answered both the 1 and 2 question correctly. But the for the 3rd question it was querring a wrong question all together. It was querring with this:-
image

and i don't know why it is spamming the queries as if they are in some kind of a loop.

I know there are lot of flaws in this code but please understand i am new to AI/ML so pls kindly help!!!
I think the project is near completion just few adjustment is needed just confused of what should i do your help would be much appreciated.

@dosu · 2024-04-21T12:11:27Z

dosubot[bot]
bot Apr 21, 2024

To address the issues you're encountering with your RAG chatbot, let's tackle them one by one:

Answer Length: To increase the length of the answers, adjust the max_new_tokens parameter in the HuggingFaceInferenceAPI. This controls how much text the model generates.
Off-Mark Follow-up Questions: Ensure the chat history is accurately maintained for context relevance. The CondenseQuestionChatEngine relies on a well-formatted chat history to generate relevant follow-up questions.
Model Overload: Use asynchronous methods provided by CondenseQuestionChatEngine to manage load efficiently. Implementing callbacks for monitoring can also help in avoiding overload.
ContextChatEngine Error: The "NotImplementedError: Messages passed in must be of odd length." error indicates an issue with the number of messages passed. The engine expects an odd number of messages, starting with a user message, to alternate properly between user and assistant messages.
Looping Queries: This might be due to a repetitive loop in the response generation logic. Reviewing and adjusting the logic in the CondenseQuestionChatEngine should help prevent such looping behavior.

By addressing these specific areas, you should be able to improve the performance and accuracy of your chatbot.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

2 replies

pathucr7 Apr 21, 2024
Author

@dosu can you give a working example of context chat engine and also the storing of messages

dosubot[bot] bot Apr 21, 2024

To effectively use the ContextChatEngine and manage the odd length message requirement, ensure that before passing messages to the engine, you check if the total number of messages is odd. If it's even, consider adding a system message to meet this requirement. This step is crucial to avoid the "NotImplementedError" you encountered.

For managing load and maintaining context relevance, especially to prevent errors related to message length, implement a strategy to manage the chat history efficiently. This involves limiting the number of messages stored and periodically pruning older messages. If a single message exceeds the token limit, consider splitting it into smaller parts to ensure each is within the limit.

Here's a simplified approach to manage the chat history for context relevance:

def get(self) -> List[ChatMessage]:
    """Get chat history, ensuring each message part is within the token limit."""
    message_count = len(self.chat_history)
    message_str = " ".join([str(m.content) for m in self.chat_history[-message_count:]])
    token_count = len(self.tokenizer_fn(message_str))

    while token_count > self.token_limit and message_count > 1:
        message_count -= 1
        message_str = " ".join([str(m.content) for m in self.chat_history[-message_count:]])
        token_count = len(self.tokenizer_fn(message_str))

    if token_count > self.token_limit:
        # Split the message into smaller parts if it exceeds the token limit
        message_parts = self.chat_history[-message_count].content.split(' ')
        split_messages = []
        for i in range(0, len(message_parts), self.token_limit):
            split_message = ChatMessage(content=' '.join(message_parts[i:i+self.token_limit]))
            split_messages.append(split_message)
        return split_messages

    return self.chat_history[-message_count:]

This code snippet helps in managing the chat history effectively by ensuring that each message part is within the token limit, thus preventing errors related to message length.

By following these guidelines, you should be able to address the issues with the ContextChatEngine and improve the performance and accuracy of your chatbot.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

pathucr7 · 2024-04-21T13:48:15Z

pathucr7
Apr 21, 2024
Author

@dosu can you write the whole code from the beginning like from storing on index and not using openai. and can you also code it with streamlit

1 reply

dosubot[bot] bot Apr 21, 2024

To create a RAG chatbot for e-book summarization with Streamlit, you'll need to index your e-books, set up a custom language model to avoid using OpenAI, and manage chat history effectively. Here's a streamlined approach:

Indexing E-books: Use a custom EbookReader to load and index your e-books. This reader should parse the e-books and prepare them for indexing.
Custom Language Model: Implement MyCustomLLM, a custom language model class, to generate responses. This model should be designed to handle the specifics of your e-book content and ensure responses are generated from the provided PDFs without external knowledge.
Streamlit Integration: Use Streamlit to create an interactive chat interface. Initialize your chat history and manage it within the Streamlit session state to ensure continuity in the conversation.
Handling Chat History: Implement logic to manage chat history, ensuring messages are within the token limit of your model. This might involve splitting messages or pruning older messages to maintain performance.
Generating Responses: Use your custom language model within a chat engine to generate responses to user queries. Ensure the chat engine is capable of handling context from the chat history to improve the relevance of follow-up questions.

Here's a basic code structure to get you started:

import streamlit as st
from my_custom_llm import MyCustomLLM  # Your custom LLM implementation
from ebook_reader import EbookReader  # Your custom e-book reader

# Load and index e-books
ebook_path = "path/to/ebook"
ebook_reader = EbookReader(ebook_path)
documents = ebook_reader.load_documents()

# Initialize your custom LLM
custom_llm = MyCustomLLM(model="custom-model-name", temperature=0.5)

# Streamlit UI setup
st.title("E-book Summarization Chatbot")

# Chat history management
if "messages" not in st.session_state:
    st.session_state.messages = []

# Input from user
user_input = st.text_input("Ask a question about the e-book:")

# Generate and display response
if user_input:
    response = custom_llm.generate_response(user_input, documents)
    st.session_state.messages.append({"user": user_input, "assistant": response})
    for message in st.session_state.messages:
        st.write(f"{message['user']}: {message['assistant']}")

This example assumes the existence of MyCustomLLM and EbookReader, which you'll need to implement based on your requirements. The focus is on creating a chatbot that leverages your e-book content for generating responses, integrated with a user-friendly Streamlit interface.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

pathucr7 · 2024-04-26T06:08:42Z

pathucr7
Apr 26, 2024
Author

@dosu can you give me a simple example of context chat engine using huggingface interface API without it throwing a error of message should be of odd length

6 replies

pathucr7 Apr 26, 2024
Author

@dosu cant you write the whole code in detail while using context chat engine from defaults

dosubot[bot] bot Apr 26, 2024

To create a detailed example using the ContextChatEngine with the HuggingFace interface API in a Streamlit application for e-book summarization, you'll need to focus on a few key components: initializing the chat engine, managing chat history, and integrating with Streamlit for user interaction. Here's a step-by-step guide:

Initialize the HuggingFace LLM: This involves setting up your language model with the necessary parameters, such as model name and API token.
Implement a Custom Retriever for E-book Content: Create a class that extends BaseRetriever to fetch relevant content from your e-books based on user queries.
Set Up the CondensePlusContextChatEngine: This engine will manage the conversation flow, ensuring that messages maintain an odd length to avoid errors. It integrates your custom retriever and the HuggingFace LLM.
Streamlit Application: Use Streamlit to build a user-friendly interface for your chatbot. This includes handling user input and displaying the chatbot's responses.

Here's a simplified code structure to illustrate these steps:

import streamlit as st
from llama_index.core.chat_engine.condense_plus_context import CondensePlusContextChatEngine
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.llms.huggingface import HuggingFaceLLM
from llama_index.core.indices.base_retriever import BaseRetriever

# Initialize your HuggingFace LLM
llm = HuggingFaceLLM(model_name="your-model-name", api_token="your-huggingface-api-token")

# Custom retriever for e-book content
class EbookRetriever(BaseRetriever):
    def retrieve(self, message: str):
        # Retrieval logic for e-book content
        pass

retriever = EbookRetriever()

# Initialize the chat engine
chat_engine = CondensePlusContextChatEngine.from_defaults(retriever=retriever, llm=llm)

# Streamlit UI setup
st.title("E-book Summarizer")
user_input = st.text_input("Ask a question about the e-book:")
if user_input:
    response = chat_engine.chat(user_input)
    st.write(response.response)

This example provides a foundation for building a chatbot that leverages e-book content to generate responses. Remember to adapt the LLM initialization for HuggingFace integration and implement the EbookRetriever for your specific e-book content. The CondensePlusContextChatEngine will handle the odd length message requirement automatically, ensuring smooth conversation flow.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

pathucr7 Apr 26, 2024
Author

@dosu the above code is showing this error NotImplementedError: Messages passed in must be of odd length

dosubot[bot] bot Apr 26, 2024

To address the NotImplementedError: Messages passed in must be of odd length error you're encountering with the ContextChatEngine, you can ensure an odd number of messages in your chat history before passing it to the engine. This is crucial because the engine expects an odd number of messages to properly alternate between user and assistant messages, starting with a user message.

Here's a quick fix you can apply:

def ensure_odd_length_messages(chat_history):
    if len(chat_history) % 2 == 0:
        # Append a system message to make the length odd
        chat_history.append(ChatMessage(content="System placeholder message", role="system"))
    return chat_history

# Use this function before invoking the chat method
chat_history = ensure_odd_length_messages(chat_history)
response = context_chat_engine.chat("Your message here", chat_history)

This function checks the length of chat_history and appends a placeholder system message if the number of messages is even, ensuring the length is odd before you call the chat engine. Adjust the placeholder message as needed for your application.

Implementing this solution should help you overcome the error and ensure smoother operation of your chatbot.

Details

Sources

llama-index-legacy/llama_index/legacy/chat_engine/context.py

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

anurav088 May 21, 2024

This doesn't work either, NotImplementedError: Didn't handle when messages aren't ordered in alternating pairs of (<MessageRole.USER: 'user'>, <MessageRole.ASSISTANT: 'assistant'>).

pathucr7 · 2024-04-30T12:56:14Z

pathucr7
Apr 30, 2024
Author

@logan-markewich @Thomas-AH-Heller pls help just getting the same error of message should be in odd length when the same code is working for condense question chat engine but i want to use the context chat engine as i want the chatbot to only answer from the context. Please help I read all the documents even that is not helping i kindly help even a update document that shows an example for local LLM or HuggingfaceInterface API would work wonders for me PLS PLS PLS help

0 replies

1ly1mk · 2024-05-26T20:27:24Z

1ly1mk
May 26, 2024

This works fine with
chat_engine = index.as_chat_engine(chat_mode="condense_question", verbose=True)
But when I change chat_mode to "react" or "condense_plus_context" , I have issues same as above.

llama_index\llms\huggingface\base.py", line 450, in chat_messages_to_conversational_kwargs
raise NotImplementedError(
NotImplementedError: Didn't handle when messages aren't ordered in alternating pairs of (<MessageRole.USER: 'user'>, <MessageRole.ASSISTANT: 'assistant'>).

Seems to be issue with validation put in huggingface\base.py. I am using huggingfaceInferenceAPI as llm.
Any fix would be great, happy to provide more details. Thank you.

0 replies

corps3d · 2024-06-10T09:25:37Z

corps3d
Jun 10, 2024

I have had a similar issue and I think the issue might be the use of HuggingFaceInferenceAPI. My theory is that the API expects a single query to get the response from the LLM but when using chat engine it sends multiple queries at once which is causing the error.
When using condense_question it sums up all the history into a single question, which is then sent through the API and gets the response.

Again its my own speculations, I dont know for sure.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help needed with chat engine(condense or context) HELP!!! #12996

{{title}}

Replies: 6 comments 9 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Details

{{title}}

{{title}}

Details

{{title}}

{{title}}

Details

{{title}}

{{title}}

{{title}}

Details

{{title}}

{{title}}

Details

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Help needed with chat engine(condense or context) HELP!!! #12996

pathucr7 Apr 21, 2024

reader = SimpleDirectoryReader(input_dir="./data", recursive=True)

docs = reader.load_data()

load the existing index

response = st.session_state.chat_engine.stream_chat(prompt,chat_engine.chat_history)

Replies: 6 comments · 9 replies

dosubot[bot] bot Apr 21, 2024

Details

pathucr7 Apr 21, 2024 Author

dosubot[bot] bot Apr 21, 2024

Details

pathucr7 Apr 21, 2024 Author

dosubot[bot] bot Apr 21, 2024

Details

pathucr7 Apr 26, 2024 Author

pathucr7 Apr 26, 2024 Author

dosubot[bot] bot Apr 26, 2024

Details

pathucr7 Apr 26, 2024 Author

dosubot[bot] bot Apr 26, 2024

Details

anurav088 May 21, 2024

pathucr7 Apr 30, 2024 Author

1ly1mk May 26, 2024

corps3d Jun 10, 2024

pathucr7
Apr 21, 2024

Replies: 6 comments 9 replies

dosubot[bot]
bot Apr 21, 2024

pathucr7 Apr 21, 2024
Author

pathucr7
Apr 21, 2024
Author

pathucr7
Apr 26, 2024
Author

pathucr7 Apr 26, 2024
Author

pathucr7 Apr 26, 2024
Author

pathucr7
Apr 30, 2024
Author

1ly1mk
May 26, 2024

corps3d
Jun 10, 2024