Context Recall returning NaN when using GPT-4 models #798

FranciscoAlves00 · 2024-03-23T18:38:46Z

[x] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
When using any gpt-4 model as an evaluator, the context recall metric returns an NaN result and the following warning for almost every single question:
WARNING:ragas.metrics._context_recall:Invalid JSON response. Expected dictionary with key 'Attributed'

I have tried this with my own dataset, as well as following the instructions in https://docs.ragas.io/en/stable/getstarted/evaluation.html simply changing the evaluator to any of the GPT-4 models (gpt-4-0125-preview, gpt-4-1106-preview and gpt-4). From the 10 questions in the testset, I got on average 9 NaN results for that metric. The other metrics work correctly.

Ragas version: 0.1.5
Python version: 3.10

Code to Reproduce
Follow the code in https://docs.ragas.io/en/stable/getstarted/evaluation.html simply changing the evaluator to any of the GPT-4 models (gpt-4-0125-preview, gpt-4-1106-preview and gpt-4).

Error trace
WARNING:ragas.metrics._context_recall:Invalid JSON response. Expected dictionary with key 'Attributed'

shahules786 · 2024-03-23T18:51:40Z

Hey, can you please share some data points that I can use to reproduce the issue?
I'll raise a fix - this is mostly an issue related to the JSON formating which we are working on.

FranciscoAlves00 · 2024-03-23T19:01:47Z

from ragas.metrics import (
answer_relevancy,
faithfulness,
context_recall,
context_precision,
answer_correctness,
context_relevancy
)
from ragas import evaluate
from langchain.chat_models import ChatOpenAI
from datasets import load_dataset

amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")
amnesty_qa

gpt4 = ChatOpenAI(model_name="gpt-4-0125-preview")
#gpt4 = ChatOpenAI(model_name="gpt-4")

result = evaluate(
#experiment_dataset,
amnesty_qa["eval"],
metrics=[
context_precision,
faithfulness,
answer_relevancy,
context_recall,
context_relevancy,
answer_correctness
],
llm=gpt4
)

result
df = result.to_pandas()

df.head(10)

Running this code from your website I am getting 9/10 values of NaN for context recall:
recall_error.json

FranciscoAlves00 · 2024-03-23T19:54:19Z

I would like to add it works better with the gpt-4 simple model and works almost perfectly with the gpt-3.5 models. But I need to run the evaluation with the gpt4 models.
Moreover, I have tried installing previous RAGAS versions and it still returns the same problems. Which is very odd, since yesterday I was being able to run the evaluations correctly.

abhinavkashyapcrayon · 2024-10-04T11:35:33Z

Hi, I think this is still relevant. Context Precision and context recall return Nan for Gpt-4o models

FranciscoAlves00 added the bug Something isn't working label Mar 23, 2024

shahules786 self-assigned this Mar 23, 2024

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 19, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 1, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jun 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context Recall returning NaN when using GPT-4 models #798

Context Recall returning NaN when using GPT-4 models #798

FranciscoAlves00 commented Mar 23, 2024

shahules786 commented Mar 23, 2024

FranciscoAlves00 commented Mar 23, 2024 •

edited

Loading

FranciscoAlves00 commented Mar 23, 2024

abhinavkashyapcrayon commented Oct 4, 2024 •

edited

Loading

Context Recall returning NaN when using GPT-4 models #798

Context Recall returning NaN when using GPT-4 models #798

Comments

FranciscoAlves00 commented Mar 23, 2024

shahules786 commented Mar 23, 2024

FranciscoAlves00 commented Mar 23, 2024 • edited Loading

FranciscoAlves00 commented Mar 23, 2024

abhinavkashyapcrayon commented Oct 4, 2024 • edited Loading

FranciscoAlves00 commented Mar 23, 2024 •

edited

Loading

abhinavkashyapcrayon commented Oct 4, 2024 •

edited

Loading