Building a Hindi Leaderboard. #1806
Replies: 2 comments 3 replies
-
I discussed this idea with Niklas, and he was quite excited about it. He suggested I tag you, so I wanted to clarify a few points before my team and I begin working on the project: Is it possible to utilize existing multilingual datasets, extract the Hindi data, and use it for a different task? |
Beta Was this translation helpful? Give feedback.
-
@SaileshP97 Yup, this is totally possible. You can obtain a list of all hindi tasks by calling |
Beta Was this translation helpful? Give feedback.
-
Proposal: Building a Hindi Leaderboard for MTEB by Creating Hindi-Specific Datasets and Tasks
Summary
This proposal outlines the idea of creating a Hindi-specific leaderboard within the Massive Text Embedding Benchmark (MTEB) framework. The objective is to design and integrate Hindi language datasets and tasks derived from existing MTEB data. By focusing on Hindi language benchmarks, this project aims to improve the evaluation ecosystem for models generating embeddings specifically for Hindi NLP tasks.
Motivation
Hindi is one of the top five most spoken languages globally, with over 600 million speakers. Despite its widespread use, the performance of text embedding models on Hindi-specific tasks is often underrepresented in multilingual benchmarks. Building a dedicated Hindi leaderboard within MTEB will:
Request for Feedback
I would appreciate input from the maintainers and community on:
Looking forward to your feedback and insights!
Beta Was this translation helpful? Give feedback.
All reactions