Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error occured in bm25_ef.fit(corpus) #22

Open
rdyuan opened this issue May 18, 2024 · 10 comments
Open

Error occured in bm25_ef.fit(corpus) #22

rdyuan opened this issue May 18, 2024 · 10 comments

Comments

@rdyuan
Copy link

rdyuan commented May 18, 2024

这是我的全部代码:
from milvus_model.sparse.bm25.tokenizers import build_default_analyzer from milvus_model.sparse import BM25EmbeddingFunction analyzer = build_default_analyzer(language="zh") corpus = [ "人工智能于1956年作为一门学科成立。", "艾伦·图灵是第一个对人工智能进行实质性研究的人。", "图灵出生在伦敦的梅达维尔,在英格兰南部长大。", ] bm25_ef = BM25EmbeddingFunction(analyzer) bm25_ef.fit(corpus) docs = [ "人工智能领域于1956年作为一门学术学科成立。", "艾伦·图灵是在人工智能领域进行重大研究的先驱。", "图灵出生在伦敦的梅达维尔,在英格兰南部地区长大。", "1956年,人工智能作为一个学术领域出现。", "图灵来自伦敦梅达维尔,在英格兰南部长大。" ] docs_embeddings = bm25_ef.encode_documents(docs) print("Embeddings:", docs_embeddings) print("Sparse dim:", bm25_ef.dim, list(docs_embeddings)[0].shape)

在执行到bm25_ef.fit(corpus)时发生报错如下:
Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 129, in _main main_content = runpy.run_path(main_path, main_content = runpy.run_path(main_path, ^^ ^prepare(preparation_data)^ ^^^^^^ ^ ^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 240, in prepare ^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen runpy>", line 291, in run_path File "<frozen runpy>", line 98, in _run_module_code File "<frozen runpy>", line 88, in _run_code

相关版本号:
Python==3.11.3
milvus_model==0.2.2

@xiaofan-luan
Copy link

/assign @wxywb
can you help on investigating it

@wxywb
Copy link
Collaborator

wxywb commented May 18, 2024

This code works in my environment. It may be related to some multiprocessing problems I need to delve into. You can try the following code.

from milvus_model.sparse.bm25.tokenizers import build_default_analyzer
from milvus_model.sparse import BM25EmbeddingFunction
analyzer = build_default_analyzer(language="zh")
corpus = [ "人工智能于1956年作为一门学科成立。", "艾伦·图灵是第一个对人工智能进行实质性研究的人。", "图灵出生在伦敦的梅达维尔,在英格兰南部长大。", ]
# this line will remove multi-processing 
bm25_ef = BM25EmbeddingFunction(analyzer, num_workers=1)
bm25_ef.fit(corpus)
docs = [ "人工智能领域于1956年作为一门学术学科成立。", "艾伦·图灵是在人工智能领域进行重大研究的先驱。", "图灵出生在伦敦的梅达维尔,在英格兰南部地区长大。", "1956年,人工智能作为一个学术领域出现。", "图>灵来自伦敦梅达维尔,在英格兰南部长大。" ]
docs_embeddings = bm25_ef.encode_documents(docs)
print("Embeddings:", docs_embeddings)
print("Sparse dim:", bm25_ef.dim, list(docs_embeddings)[0].shape)

@wxywb
Copy link
Collaborator

wxywb commented May 18, 2024

@rdyuan Could you give me full trace log? It seems just part of it.

@rdyuan
Copy link
Author

rdyuan commented May 18, 2024

@rdyuan Could you give me full trace log? It seems just part of it.

log.txt

@rdyuan
Copy link
Author

rdyuan commented May 18, 2024

@rdyuan Could you give me full trace log? It seems just part of it.

加了num_workers=1确实跑通了

@wxywb wxywb changed the title bm25_ef.fit(corpus)时报错 Error occured in bm25_ef.fit(corpus) May 18, 2024
@abellee
Copy link

abellee commented Jun 19, 2024

这个问题还没解决吗?一到fit就开始死循环, num_workers=1是可以的

@wxywb
Copy link
Collaborator

wxywb commented Jun 19, 2024

这个问题还没解决吗?一到fit就开始死循环, num_workers=1是可以的

what operating system are you using?and please show me the code snippet abd error info.

@abellee
Copy link

abellee commented Jun 19, 2024

这个问题还没解决吗?一到fit就开始死循环, num_workers=1是可以的

what operating system are you using?and please show me the code snippet abd error info.

just as the same problem as this issue. and os is Mac with Intel chip

@wxywb
Copy link
Collaborator

wxywb commented Jun 20, 2024

这个问题还没解决吗?一到fit就开始死循环, num_workers=1是可以的

what operating system are you using?and please show me the code snippet abd error info.

just as the same problem as this issue. and os is Mac with Intel chip

how about your python version?

@abellee
Copy link

abellee commented Jun 23, 2024

这个问题还没解决吗?一到fit就开始死循环, num_workers=1是可以的

what operating system are you using?and please show me the code snippet abd error info.

just as the same problem as this issue. and os is Mac with Intel chip

how about your python version?

3.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants