You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a pretty specific set of circumstances that I discovered today, but on the off chance that it is useful to someone, here it is.
If you include the output of nlp.pipe(...,n_process>1) in a zip() within tqdm() it will hang interminably. See below
How to reproduce the behaviour
#!/usr/bin/env python
import pandas as pd
import spacy
from tqdm import tqdm
data = [
{"text": "I just wanna tell you how I'm feeling", "id": 0},
{"text": "Gotta make you understand", "id": 1},
{"text": "Never gonna give you up", "id": 2},
{"text": "Never gonna let you down", "id": 3},
{"text": "Never gonna run around and desert you", "id": 4},
{"text": "Never gonna make you cry", "id": 5},
{"text": "Never gonna say goodbye", "id": 6},
{"text": "Never gonna tell a lie and hurt you", "id": 7},
]
df = pd.DataFrame(data)
nlp = spacy.load("en_core_web_md")
# Works with a single process
for id, doc in tqdm(
zip(
df["id"],
nlp.pipe(
df["text"],
n_process=1,
),
)
):
print(id, doc.text)
# Works with no zip and multiple processes
for doc in tqdm(
nlp.pipe(
df["text"],
n_process=2,
),
):
print(doc.text)
# Hangs with multiple processes and zip
for id, doc in tqdm(
zip(
df["id"],
nlp.pipe(
df["text"],
n_process=2,
),
)
):
print(id, doc.text)
Output:
$python script.py
0it [00:00, ?it/s]0 I just wanna tell you how I'm feeling
1 Gotta make you understand
2 Never gonna give you up
3 Never gonna let you down
4 Never gonna run around and desert you
5 Never gonna make you cry
6 Never gonna say goodbye
7 Never gonna tell a lie and hurt you
8it [00:00, 977.21it/s]
0it [00:00, ?it/s]I just wanna tell you how I'm feeling
Gotta make you understand
Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
Never gonna make you cry
Never gonna say goodbye
Never gonna tell a lie and hurt you
8it [00:00, 345.82it/s]
0it [00:00, ?it/s]0 I just wanna tell you how I'm feeling
1 Gotta make you understand
2 Never gonna give you up
3 Never gonna let you down
4 Never gonna run around and desert you
5 Never gonna make you cry
6 Never gonna say goodbye
7 Never gonna tell a lie and hurt you
8it [00:00, 342.59it/s]
The text was updated successfully, but these errors were encountered:
ivyleavedtoadflax
changed the title
nlp.pipe won't return if wrapped by tqdm and zipper
nlp.pipe won't return if wrapped by tqdm and zip
Jul 23, 2021
ivyleavedtoadflax
changed the title
nlp.pipe won't return if wrapped by tqdm and zip
nlp.pipe(..., n_process>1) won't return if wrapped by tqdm and zip
Jul 23, 2021
ivyleavedtoadflax
changed the title
nlp.pipe(..., n_process>1) won't return if wrapped by tqdm and zip nlp.pipe(..., n_process>1) won't return if wrapped by tqdm() and zip()Jul 23, 2021
As a note, I've marked this as a bug because it shouldn't hang like this, but since there's an easy workaround it's going to be pretty low priority for us to fix.
Maybe some of the changes related to error handling have caused this? I'm not sure. In any case, it's better to use tqdm on something with a length rather than a generator.
The spacy test suite would hang on all OSes with python 3.12 prior to 467c824. (This commit is just a workaround for the test suite / common use cases. It doesn't fix the underlying issue with deadlocks and tqdm.)
This is a pretty specific set of circumstances that I discovered today, but on the off chance that it is useful to someone, here it is.
If you include the output of
nlp.pipe(...,n_process>1)
in azip()
withintqdm()
it will hang interminably. See belowHow to reproduce the behaviour
Output:
Your Environment
Info about spaCy
The text was updated successfully, but these errors were encountered: