Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis of the hallucination benchmark result in Appendix of your paper #1

Open
laserwave opened this issue Jun 18, 2024 · 1 comment

Comments

@laserwave
Copy link

laserwave commented Jun 18, 2024

Hi,nice work.

In table 7, you report the POPE result, which decreased in some sets of experiments(comparing with and without). As your method assigns low weights to contradictory text tokens, an increase of hallucination benchmark metrics is expected in my opinion.

Do you have any comments on this, thank you.

@Menoly-xin
Copy link
Collaborator

Hi, I apologize for the delayed reply as I am currently occupied with graduation preparations and related travels.

Thanks for your kind opinion. In my view, the POPE benchmark may not be optimal for evaluating hallucination due to its excessively high scores and minimal variability. Alternative benchmarks may indeed be more suitable for these assessments (for more information, please refer to https://arxiv.org/pdf/2312.00849). After my vacation, I will augment the evaluation results from these related benchmarks if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants