-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing ModelMeta
for Chinese models
#1803
Comments
I will add BGE and GTE to start with |
Added BGE and GTE in #1805 |
I have implemented some of them, and I have checked on all models. Here's a curated and updated list: model_metas_to_implement = [
"DMetaSoul/Dmeta-embedding-zh-small",
"DMetaSoul/sbert-chinese-general-v1",
# Stella Models
"dunzhang/stella-large-zh-v3-1792d",
"dunzhang/stella-mrl-large-zh-v3.5-1792d",
"iampanda/zpoint_large_embedding_zh",
"infgrad/stella-base-zh",
"infgrad/stella-base-zh-v2",
"infgrad/stella-base-zh-v3-1792d",
"infgrad/stella-large-zh",
"infgrad/stella-large-zh-v2",
# Jina models
"jinaai/jina-embeddings-v2-base-zh",
# Moka models
"moka-ai/m3e-base",
"moka-ai/m3e-large",
# piccolo models
"sensenova/piccolo-base-zh",
"sensenova/piccolo-large-zh",
"sensenova/piccolo-large-zh-v2",
# Text2Vec models
"shibing624/text2vec-base-chinese", # Needs custom implementation
"shibing624/text2vec-large-chinese", # Needs custom implementation
"barisaydin/text2vec-base-multilingual", # Needs custom implementation
# Multilingual SBERT
"sentence-transformers/distiluse-base-multilingual-cased-v2",
"sentence-transformers/use-cmlm-multilingual",
# Miscellaneous
"lier007/xiaobu-embedding",
"Classical/Yinka",
"TencentBAC/Conan-embedding-v1",
"lier007/xiaobu-embedding-v2",
] |
I'm adding the missing SBERT models right now |
Then Moka |
Updating the list again after #1814 : model_metas_to_implement = [
# Stella Models
"dunzhang/stella-large-zh-v3-1792d",
"dunzhang/stella-mrl-large-zh-v3.5-1792d",
"iampanda/zpoint_large_embedding_zh",
"infgrad/stella-base-zh", # Outdated not planned
"infgrad/stella-base-zh-v2", # Outdated not planned
"infgrad/stella-base-zh-v3-1792d",
"infgrad/stella-large-zh", # outdated not planned
"infgrad/stella-large-zh-v2", # outdated not planned
# Text2Vec models
"shibing624/text2vec-base-chinese", # Needs custom implementation
"shibing624/text2vec-large-chinese", # Needs custom implementation
"barisaydin/text2vec-base-multilingual", # Needs custom implementation
# Miscellaneous
"lier007/xiaobu-embedding",
"Classical/Yinka",
"TencentBAC/Conan-embedding-v1",
"lier007/xiaobu-embedding-v2",
] |
re.: "shibing624/text2vec-base-chinese", # Needs custom implementation
"shibing624/text2vec-large-chinese", # Needs custom implementation
"barisaydin/text2vec-base-multilingual", # Needs custom implementation If you want these to appear in the leaderboard, but don't want to implement it you can simply leader the loader to None (or add a "NotImplemented" class). |
Sure, but shouldn't take too long to implement. @KennethEnevoldsen do you have time for the Stella models? Then I'll take the rest |
Oh I seem to have been wrong, it looks like you can use text2vec with SentenceTransformers. |
Added Stella in #1824 . This issue will be ready to close once it's merged. |
Many of the models that have been run on the original C-MTEB and we have results on are currently missing
ModelMeta
objects in the library.Here's a list of Chinese-specific models that have yet to be added to MTEB:
As well as a list of multilingual models that are currently missing metadata:
Most of these should be pretty trivial to add.
The text was updated successfully, but these errors were encountered: