-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[
feat
] Integrate NanoBeIR datasets; use model.similarity
by defau…
…lt in evaluators (#2966) * Added the possibility of masking the prompts if the tokenizer is left-padded. * Simplify code * Remove unrelated changes * Add separate query and corpus prompts for IREvaluator * Add query and corpus prompt_name * Added NanoBEIREvaluator * Rename, example and better logging * Fix for all datasets * Remove unrelated changes * Remove unrelated changes * Remove unrelated changes * Remove unrelated changes * Remove wrong function call to InformationRetrievalEvaluator * Fix issue introduced in merge * Flatten output dict, remove 'name' as we already know the dataset names * Use the model similarity function by default for evaluators - Fix 'tokens' typo -> 'dimension' in model card - Group multiple evaluators with the same output keys together. - Fix edge case where datasets without languages are excluded in model card - Truncate really really long texts in model card - Make default similarity_fn_name "cosine" rather than None * Update tests due to similarity_fn_name defaulting to "cosine" now * Specify all similarity_fn_names to be backwards compat. with old expected performance * Fix loading the similarity fn from a config And update 'str' type to Literals --------- Co-authored-by: Tom Aarsen <[email protected]>
- Loading branch information
1 parent
96a4bd7
commit 210ea8b
Showing
12 changed files
with
758 additions
and
292 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.