You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to evaluate my RAG system by using the "evaluate" method
Standalone code OR list down the steps to reproduce the issue
1. I first created the testset using this script:
import giskard
from dotenv import load_dotenv
from giskard.rag import generate_testset, KnowledgeBase
import pandas as pd
load_dotenv()
print('setting llm and embedding models')
giskard.llm.set_llm_model("mistral/mistral-large-latest")
giskard.llm.set_embedding_model("mistral/mistral-embed")
print('Reading data')
df = pd.read_csv("test_data/DataTest.ChunksTest.csv")
df = df.drop(df[df['page_content'].str.len() > 1024].index)
df['page_content'] = df['page_content'].astype(str)
knowledge_base = KnowledgeBase.from_pandas(df, columns=['page_content','metadata.source'])
print('Creating test set')
testset = generate_testset(
knowledge_base,
num_questions=10,
language='fr',
,
)
print('Saving test set')
testset.save("my_testset_10.jsonl")
2. Then trying to evaluate as such:
```pythonfrom giskard.rag import evaluate, AgentAnswer, QATestset, KnowledgeBaseimport pandas as pd#Imports specific to my use casefrom Betty import BettyBotfrom ChatModel import ChatModelfrom dotenv import load_dotenv#Loads relevant API keysload_dotenv()chat_model = ChatModel()df = pd.read_csv("test_data/DataTest.ChunksTest.csv")df = df.drop(df[df['page_content'].str.len() > 1024].index)knowledge_base = KnowledgeBase.from_pandas(df, columns=['page_content','metadata.source'])loaded_testset = QATestset.load("my_testset_10.jsonl")# Wrap your RAG modeldef get_answer_fn(question: str, history=None):"""A function representing your RAG agent.""" messages = historyifhistoryelse [] messages.append({"role": "user", "content": question})# function which retrieved document from index in the form of a list of dicts with 'page_content' key documents_dicts = search_documents(question) documents = concat_documents_for_prompting(documents_dicts)# string answer of the model answer = chat_model.question_model(context=documents, query=question)#List of string document contents docs_content = [doc['page_content'] fordocin documents_dicts]return AgentAnswer( message=answer, documents=docs_content )# Run the evaluation and get a report#I am purposely avoiding using ragas metrics because they have a bug that is in their latest versionreport = evaluate(get_answer_fn, testset=loaded_testset, knowledge_base=knowledge_base)report.to_html("rag_eval_report.html")
I get an error at the time of calculating the mean of the metric (done internally by evaluate)
### Relevant log output
```shell
Asking questions to the agent: 100%|██████████████████████████████████████████████████████████| 10/10 [01:31<00:00, 9.16s/it]
CorrectnessMetric evaluation: 100%|███████████████████████████████████████████████████████████| 10/10 [00:10<00:00, 1.05s/it]
Traceback (most recent call last):
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1942, in _agg_py_fallback
res_values = self._grouper.agg_series(ser, alt, preserve_dtype=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/pandas/core/groupby/ops.py", line 864, in agg_series
result = self._aggregate_series_pure_python(obj, func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/pandas/core/groupby/ops.py", line 885, in _aggregate_series_pure_python
res = func(group)
^^^^^^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 2454, in <lambda>
alt=lambda x: Series(x, copy=False).mean(numeric_only=numeric_only),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/pandas/core/series.py", line 6549, in mean
return NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/pandas/core/generic.py", line 12420, in mean
return self._stat_function(
^^^^^^^^^^^^^^^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/pandas/core/generic.py", line 12377, in _stat_function
return self._reduce(
^^^^^^^^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/pandas/core/series.py", line 6457, in _reduce
return op(delegate, skipna=skipna, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/pandas/core/nanops.py", line 147, in f
result = alt(values, axis=axis, skipna=skipna, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/pandas/core/nanops.py", line 404, in new_func
result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/pandas/core/nanops.py", line 719, in nanmean
the_sum = values.sum(axis, dtype=dtype_sum)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/numpy/core/_methods.py", line 49, in _sum
return umr_sum(a, axis, dtype, out, keepdims, initial, where)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: unsupported operand type(s) for +: 'bool' and 'str'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ali-belaich/DataspellProjects/ModelService/tests/giskar/evaluate_rag.py", line 89, in <module>
report = evaluate(get_answer_fn, testset=loaded_testset, knowledge_base=knowledge_base)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/giskard/rag/evaluate.py", line 110, in evaluate
report.correctness_by_question_type().to_dict()[metrics[0].name],
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/giskard/rag/report.py", line 276, in correctness_by_question_type
correctness = self._correctness_by_metadata("question_type")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/giskard/rag/report.py", line 323, in _correctness_by_metadata
.mean()
^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 2452, in mean
result = self._cython_agg_general(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1998, in _cython_agg_general
new_mgr = data.grouped_reduce(array_func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/pandas/core/internals/base.py", line 367, in grouped_reduce
res = func(arr)
^^^^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1995, in array_func
result = self._agg_py_fallback(how, values, ndim=data.ndim, alt=alt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ali-belaich/anaconda3/envs/BETTY-MODEL-GISKARD/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1946, in _agg_py_fallback
raise type(err)(msg) from err
TypeError: agg function failed [how->mean,dtype->object]
The text was updated successfully, but these errors were encountered:
Issue Type
Bug
Source
source
Giskard Library Version
2.16.0
OS Platform and Distribution
Ubuntu 24.04.1
Python version
3.11
Installed python packages
Current Behaviour?
I am trying to evaluate my RAG system by using the "evaluate" method
Standalone code OR list down the steps to reproduce the issue
I get an error at the time of calculating the mean of the metric (done internally by evaluate)
The text was updated successfully, but these errors were encountered: