Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on metric(s) with which to evaluate our models #5

Open
Tracked by #4
PGijsbers opened this issue Jul 5, 2024 · 2 comments
Open
Tracked by #4

Decide on metric(s) with which to evaluate our models #5

PGijsbers opened this issue Jul 5, 2024 · 2 comments
Assignees
Labels
evaluation about evaluation model results
Milestone

Comments

@PGijsbers
Copy link
Member

PGijsbers commented Jul 5, 2024

We need to figure out what metrics we can/should use to evaluate our models, and what data is needed to evaluate them.
Here we probably will make some distinction between evaluations during prototyping (which will need to be largely automated), and metrics that may be used later and evaluated by users with e.g., A/B testing. For now, let's just consider the former (metrics we can use now with some data).

@PGijsbers PGijsbers changed the title Metric(s) with which to evaluate our models Decide on metric(s) with which to evaluate our models Jul 5, 2024
@PGijsbers PGijsbers added the evaluation about evaluation model results label Jul 5, 2024
@PGijsbers PGijsbers added this to the Search milestone Jul 5, 2024
@PGijsbers
Copy link
Member Author

@zhao1402072392 will look into what metrics we should be using for evaluating retrieval results. This is most likely based on regular information retrieval as opposed to new LLM-based evaluation schemes as we like evaluating with "ground truths". While our labels might still be slightly subjective, at least they will be consistent and (hopefully) more accurate.

@PGijsbers
Copy link
Member Author

We also need to see which (if any) metrics are appropriate for labels with mixed confidence as different people labelled relevancy differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
evaluation about evaluation model results
Projects
None yet
Development

No branches or pull requests

2 participants