BentoML and the GIL #946
-
Hi, I was wondering, how bentoML is working with the GIL when you have multiple workers /threads - is there one share GIL or are independent workers running in true subprocesses and there is no scalability problem with the GIL? Would be great to hear some insights. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi @jondoering, there is no scalability issue with GIL in BentoML. BentoML uses gunicorn to manage multiple processes of model backend workers. In BentoML, the frontend micro-batching layer(the MarshalServer in the code) is built with asyncio and aiohttp. It sends batched requests to the model backend, which runs a Flask app in a Gunicorn server. |
Beta Was this translation helpful? Give feedback.
Hi @jondoering, there is no scalability issue with GIL in BentoML. BentoML uses gunicorn to manage multiple processes of model backend workers. In BentoML, the frontend micro-batching layer(the MarshalServer in the code) is built with asyncio and aiohttp. It sends batched requests to the model backend, which runs a Flask app in a Gunicorn server.