Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use provisioned concurrency to help with cold start #3

Open
bnusunny opened this issue Mar 5, 2023 · 3 comments
Open

Use provisioned concurrency to help with cold start #3

bnusunny opened this issue Mar 5, 2023 · 3 comments

Comments

@bnusunny
Copy link

bnusunny commented Mar 5, 2023

Thanks for this great example!

To help with cold start, I did some experiments with provisioned concurrency, lazy load the transformers module in sentiment and prime the sentiment method if the provisioned concurrency is enabled. This reduced the cold start time to 1 ~ 2 seconds, and the first predict call complete in about 1 second.

def sentiment(payload):
    from transformers import pipeline
    clf = pipeline("sentiment-analysis", model="model/")
    prediction = clf(payload, return_all_scores=True)

    # convert list to dict
    result = {}
    for pred in prediction[0]:
        result[pred["label"]] = pred["score"]
    return result

# Prime the sentiment function for provisioned concurrency
init_type = os.environ.get("AWS_LAMBDA_INITIALIZATION_TYPE", "on-demand")
if init_type == "provisioned-concurrency":
    payload = json.dumps({"fn_index": 0, "data": [
        "Running Gradio on AWS Lambda is amazing"], "session_hash": "fpx8ngrma3d"})
    sentiment(payload)

image

@bnusunny
Copy link
Author

bnusunny commented Mar 5, 2023

Gradio is a great framework to build ML applications quickly. Is there any way to deploy the static/assets to CDN? It will reduce the requests sent to the Gradio application and lower the concurrency.

@philschmid
Copy link
Owner

Hey @bnusunny! Nice job! But normally, if you use provisioned concurrency there is no cold start at all since the functions stays warm, meaning you also have to pay for it 24/7.

@bnusunny
Copy link
Author

bnusunny commented Mar 5, 2023

Yes, and no. The first invoke latency with provisioned concurrency depends on how much we have initialized it during the function init process. That's why I added the priming step.

We can use provisioned concurrency with application auto-scaling to adjust automatically the amount of the concurrency provisioned. You could lower cold start latency and reduce the cost.

Of course, if we could have SnapStart support for container images, that would be the best. Until then, provisioned concurrency is the best tool to lower the cold start time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants