You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To help with cold start, I did some experiments with provisioned concurrency, lazy load the transformers module in sentiment and prime the sentiment method if the provisioned concurrency is enabled. This reduced the cold start time to 1 ~ 2 seconds, and the first predict call complete in about 1 second.
defsentiment(payload):
fromtransformersimportpipelineclf=pipeline("sentiment-analysis", model="model/")
prediction=clf(payload, return_all_scores=True)
# convert list to dictresult= {}
forpredinprediction[0]:
result[pred["label"]] =pred["score"]
returnresult# Prime the sentiment function for provisioned concurrencyinit_type=os.environ.get("AWS_LAMBDA_INITIALIZATION_TYPE", "on-demand")
ifinit_type=="provisioned-concurrency":
payload=json.dumps({"fn_index": 0, "data": [
"Running Gradio on AWS Lambda is amazing"], "session_hash": "fpx8ngrma3d"})
sentiment(payload)
The text was updated successfully, but these errors were encountered:
Gradio is a great framework to build ML applications quickly. Is there any way to deploy the static/assets to CDN? It will reduce the requests sent to the Gradio application and lower the concurrency.
Hey @bnusunny! Nice job! But normally, if you use provisioned concurrency there is no cold start at all since the functions stays warm, meaning you also have to pay for it 24/7.
Yes, and no. The first invoke latency with provisioned concurrency depends on how much we have initialized it during the function init process. That's why I added the priming step.
We can use provisioned concurrency with application auto-scaling to adjust automatically the amount of the concurrency provisioned. You could lower cold start latency and reduce the cost.
Of course, if we could have SnapStart support for container images, that would be the best. Until then, provisioned concurrency is the best tool to lower the cold start time.
Thanks for this great example!
To help with cold start, I did some experiments with provisioned concurrency, lazy load the transformers module in sentiment and prime the sentiment method if the provisioned concurrency is enabled. This reduced the cold start time to 1 ~ 2 seconds, and the first predict call complete in about 1 second.
The text was updated successfully, but these errors were encountered: