Use provisioned concurrency to help with cold start #3

bnusunny · 2023-03-05T01:34:20Z

Thanks for this great example!

To help with cold start, I did some experiments with provisioned concurrency, lazy load the transformers module in sentiment and prime the sentiment method if the provisioned concurrency is enabled. This reduced the cold start time to 1 ~ 2 seconds, and the first predict call complete in about 1 second.

def sentiment(payload):
    from transformers import pipeline
    clf = pipeline("sentiment-analysis", model="model/")
    prediction = clf(payload, return_all_scores=True)

    # convert list to dict
    result = {}
    for pred in prediction[0]:
        result[pred["label"]] = pred["score"]
    return result

# Prime the sentiment function for provisioned concurrency
init_type = os.environ.get("AWS_LAMBDA_INITIALIZATION_TYPE", "on-demand")
if init_type == "provisioned-concurrency":
    payload = json.dumps({"fn_index": 0, "data": [
        "Running Gradio on AWS Lambda is amazing"], "session_hash": "fpx8ngrma3d"})
    sentiment(payload)

bnusunny · 2023-03-05T01:38:38Z

Gradio is a great framework to build ML applications quickly. Is there any way to deploy the static/assets to CDN? It will reduce the requests sent to the Gradio application and lower the concurrency.

philschmid · 2023-03-05T08:17:23Z

Hey @bnusunny! Nice job! But normally, if you use provisioned concurrency there is no cold start at all since the functions stays warm, meaning you also have to pay for it 24/7.

bnusunny · 2023-03-05T08:35:37Z

Yes, and no. The first invoke latency with provisioned concurrency depends on how much we have initialized it during the function init process. That's why I added the priming step.

We can use provisioned concurrency with application auto-scaling to adjust automatically the amount of the concurrency provisioned. You could lower cold start latency and reduce the cost.

Of course, if we could have SnapStart support for container images, that would be the best. Until then, provisioned concurrency is the best tool to lower the cold start time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use provisioned concurrency to help with cold start #3

Use provisioned concurrency to help with cold start #3

bnusunny commented Mar 5, 2023 •

edited

Loading

bnusunny commented Mar 5, 2023 •

edited

Loading

philschmid commented Mar 5, 2023

bnusunny commented Mar 5, 2023 •

edited

Loading

Use provisioned concurrency to help with cold start #3

Use provisioned concurrency to help with cold start #3

Comments

bnusunny commented Mar 5, 2023 • edited Loading

bnusunny commented Mar 5, 2023 • edited Loading

philschmid commented Mar 5, 2023

bnusunny commented Mar 5, 2023 • edited Loading

bnusunny commented Mar 5, 2023 •

edited

Loading

bnusunny commented Mar 5, 2023 •

edited

Loading

bnusunny commented Mar 5, 2023 •

edited

Loading