Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context Recall returning NaN when using GPT-4 models #798

Closed
FranciscoAlves00 opened this issue Mar 23, 2024 · 4 comments
Closed

Context Recall returning NaN when using GPT-4 models #798

FranciscoAlves00 opened this issue Mar 23, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@FranciscoAlves00
Copy link

[x] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
When using any gpt-4 model as an evaluator, the context recall metric returns an NaN result and the following warning for almost every single question:
WARNING:ragas.metrics._context_recall:Invalid JSON response. Expected dictionary with key 'Attributed'

I have tried this with my own dataset, as well as following the instructions in https://docs.ragas.io/en/stable/getstarted/evaluation.html simply changing the evaluator to any of the GPT-4 models (gpt-4-0125-preview, gpt-4-1106-preview and gpt-4). From the 10 questions in the testset, I got on average 9 NaN results for that metric. The other metrics work correctly.

Ragas version: 0.1.5
Python version: 3.10

Code to Reproduce
Follow the code in https://docs.ragas.io/en/stable/getstarted/evaluation.html simply changing the evaluator to any of the GPT-4 models (gpt-4-0125-preview, gpt-4-1106-preview and gpt-4).

Error trace
WARNING:ragas.metrics._context_recall:Invalid JSON response. Expected dictionary with key 'Attributed'

@FranciscoAlves00 FranciscoAlves00 added the bug Something isn't working label Mar 23, 2024
@shahules786 shahules786 self-assigned this Mar 23, 2024
@shahules786
Copy link
Member

Hey, can you please share some data points that I can use to reproduce the issue?
I'll raise a fix - this is mostly an issue related to the JSON formating which we are working on.

@FranciscoAlves00
Copy link
Author

FranciscoAlves00 commented Mar 23, 2024

from ragas.metrics import (
answer_relevancy,
faithfulness,
context_recall,
context_precision,
answer_correctness,
context_relevancy
)
from ragas import evaluate
from langchain.chat_models import ChatOpenAI
from datasets import load_dataset

amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")
amnesty_qa

gpt4 = ChatOpenAI(model_name="gpt-4-0125-preview")
#gpt4 = ChatOpenAI(model_name="gpt-4")

result = evaluate(
#experiment_dataset,
amnesty_qa["eval"],
metrics=[
context_precision,
faithfulness,
answer_relevancy,
context_recall,
context_relevancy,
answer_correctness
],
llm=gpt4
)

result
df = result.to_pandas()

df.head(10)

Running this code from your website I am getting 9/10 values of NaN for context recall:
recall_error.json

@FranciscoAlves00
Copy link
Author

I would like to add it works better with the gpt-4 simple model and works almost perfectly with the gpt-3.5 models. But I need to run the evaluation with the gpt4 models.
Moreover, I have tried installing previous RAGAS versions and it still returns the same problems. Which is very odd, since yesterday I was being able to run the evaluations correctly.

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 19, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 1, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jun 1, 2024
@abhinavkashyapcrayon
Copy link

abhinavkashyapcrayon commented Oct 4, 2024

Hi, I think this is still relevant. Context Precision and context recall return Nan for Gpt-4o models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants