-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error occured in bm25_ef.fit(corpus) #22
Comments
/assign @wxywb |
This code works in my environment. It may be related to some multiprocessing problems I need to delve into. You can try the following code. from milvus_model.sparse.bm25.tokenizers import build_default_analyzer
from milvus_model.sparse import BM25EmbeddingFunction
analyzer = build_default_analyzer(language="zh")
corpus = [ "人工智能于1956年作为一门学科成立。", "艾伦·图灵是第一个对人工智能进行实质性研究的人。", "图灵出生在伦敦的梅达维尔,在英格兰南部长大。", ]
# this line will remove multi-processing
bm25_ef = BM25EmbeddingFunction(analyzer, num_workers=1)
bm25_ef.fit(corpus)
docs = [ "人工智能领域于1956年作为一门学术学科成立。", "艾伦·图灵是在人工智能领域进行重大研究的先驱。", "图灵出生在伦敦的梅达维尔,在英格兰南部地区长大。", "1956年,人工智能作为一个学术领域出现。", "图>灵来自伦敦梅达维尔,在英格兰南部长大。" ]
docs_embeddings = bm25_ef.encode_documents(docs)
print("Embeddings:", docs_embeddings)
print("Sparse dim:", bm25_ef.dim, list(docs_embeddings)[0].shape) |
@rdyuan Could you give me full trace log? It seems just part of it. |
加了num_workers=1确实跑通了 |
这个问题还没解决吗?一到fit就开始死循环, num_workers=1是可以的 |
what operating system are you using?and please show me the code snippet abd error info. |
just as the same problem as this issue. and os is Mac with Intel chip |
how about your python version? |
3.12 |
这是我的全部代码:
from milvus_model.sparse.bm25.tokenizers import build_default_analyzer from milvus_model.sparse import BM25EmbeddingFunction analyzer = build_default_analyzer(language="zh") corpus = [ "人工智能于1956年作为一门学科成立。", "艾伦·图灵是第一个对人工智能进行实质性研究的人。", "图灵出生在伦敦的梅达维尔,在英格兰南部长大。", ] bm25_ef = BM25EmbeddingFunction(analyzer) bm25_ef.fit(corpus) docs = [ "人工智能领域于1956年作为一门学术学科成立。", "艾伦·图灵是在人工智能领域进行重大研究的先驱。", "图灵出生在伦敦的梅达维尔,在英格兰南部地区长大。", "1956年,人工智能作为一个学术领域出现。", "图灵来自伦敦梅达维尔,在英格兰南部长大。" ] docs_embeddings = bm25_ef.encode_documents(docs) print("Embeddings:", docs_embeddings) print("Sparse dim:", bm25_ef.dim, list(docs_embeddings)[0].shape)
在执行到bm25_ef.fit(corpus)时发生报错如下:
Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 129, in _main main_content = runpy.run_path(main_path, main_content = runpy.run_path(main_path, ^^ ^prepare(preparation_data)^ ^^^^^^ ^ ^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 240, in prepare ^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen runpy>", line 291, in run_path File "<frozen runpy>", line 98, in _run_module_code File "<frozen runpy>", line 88, in _run_code
相关版本号:
Python==3.11.3
milvus_model==0.2.2
The text was updated successfully, but these errors were encountered: