-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please explain if/how one can avoid putting too much data into a pipeline #13
Comments
You're right: when you Each task occupies socket buffer space until it is taken off by the worker process. When you use up all the buffer space, the next Unfortunately there's no safety built into MPipe, so it's up to the user to monitor and manage how much they're feeding the pipeline. |
Thank you for this answer! As far as I understand multiprocessing.Queue has a maxsize parameter, so would it not possible to use this to easily limit the maximum data that can be put into the Queue before it blocks? But unfortunately the API does not seem to allow to pass this on to when the Queue instance is created... If the above is not possible for some reason, how exactly would it possible to actually monitor on manage how much gets fed into the pipeline? For this it would be necessary to keep track of how much has put in already versus how much has been processed. Are these numbers available easily somewhere? |
The Also, there is no facility to retrieve total count/size of tasks submitted and processed. I'm noting this as a requested feature for a future release. For now the user would have to do some kind of bookkeeping. Counting submitted tasks is obvious since it's under user control. As for counting how many tasks have been processed, something like this might work:
I haven't fully tested this, but the basic concept is to have a background thread continuously retrieving results, and bumping the count. But make sure your stages have |
Thank you so much, also for accepting this as a feature request! |
My understanding is that whenever
pipe.put(something)
is executed it is put into a queue or similar.If we only want to process the somethings by workers, without caring at all about any return value, we can use
disable_result
but it seems there is no limit to how much data can get put into the pipeline. If a large number of large data is put into the pipeline, will this cause problems? Is it possible the have only a certain maximum number of items waiting for processing beforeput(something)
blocks?The text was updated successfully, but these errors were encountered: