SuperLaser

⚠️Not yet ready for primetime ⚠️

SuperLaser provides a comprehensive suite of tools and scripts designed for deploying LLMs onto RunPod's serverless infrastructure. Additionally, the deployment utilizes a containerized vLLM engine during runtime, ensuring memory-efficient and high-performance inference capabilities.

Features

Scalable Deployment: Easily scale your LLM inference tasks with vLLM and RunPod serverless capabilities.
Cost-Effective: Optimize resource and hardware usage: tensor parallelism and other GPU assets.
Uses OpenAI's API: Use the OpenAI client for chat, completion, and streaming options.

Install

pip install superlaser

Before you begin, ensure you have:

A RunPod account.

RunPod Config

First step is to obtain an API key from RunPod. Go to your account's console, in the Settings section, click on API Keys.

After obtaining a key, set it as an environment variable:

export RUNPOD_API_KEY=<YOUR-API-KEY>

Configure Template

Before spinning up a serverless endpoint, let's first configure a template that we'll pass into the endpoint during staging. The template allows you to set vLLMs Docker image, model, and the container's and volume's disk space:

import os
from superlaser import RunpodHandler as runpod

api_key = os.environ.get("RUNPOD_API_KEY")

template_data = runpod.set_template(
    serverless="true",                                      
    template_name="superlaser-inf",                         # Give a name to your template
    container_image="runpod/worker-vllm:0.3.1-cuda12.1.0",  # Docker image stub
    model_name="mistralai/Mistral-7B-v0.1",                 # Hugging Face model stub
    max_model_length=340,                                   # Maximum number of tokens for the engine to handle per request.
    container_disk=15,                                      
    volume_disk=15,
)

Create Template on RunPod

template = runpod(api_key, data=template_data)
print(template().text)

Configure Endpoint

After your template is created, it will return a data dicitionary that includes your template ID. We will pass this template id when configuring the serverless endpoint in the section below:

endpoint_data = runpod.set_endpoint(
    gpu_ids="AMPERE_24", # options for gpuIds are "AMPERE_16,AMPERE_24,AMPERE_48,AMPERE_80,ADA_24"
    idle_timeout=5,
    name="vllm_endpoint",
    scaler_type="QUEUE_DELAY",
    scaler_value=1,
    template_id="template-id",
    workers_max=1,
    workers_min=0,
)

Start Endpoint on RunPod

endpoint = runpod(api_key, data=endpoint_data)
print(endpoint().text)

Call Endpoint

After your endpoint is staged, it will return a dictionary with your endpoint ID. Pass this endpoint ID to the OpenAI client and start making API requests!

from openai import OpenAI

endpoint_id = "you-endpoint-id"

client = OpenAI(
    api_key=api_key,
    base_url=f"https://api.runpod.ai/v2/{endpoint_id}/openai/v1",
)

Chat w/ Streaming

stream = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.1",
    messages=[{"role": "user", "content": "To be or not to be"}],
    temperature=0,
    max_tokens=100,
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Completion w/ Streaming

stream = client.completions.create(
    model="meta-llama/Llama-2-7b-hf",
    prompt="To be or not to be",
    temperature=0,
    max_tokens=100,
    stream=True,
)

for response in stream:
    print(response.choices[0].text or "", end="", flush=True)

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
img		img
superlaser		superlaser
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.yaml		env.yaml
ruff.toml		ruff.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SuperLaser

⚠️Not yet ready for primetime ⚠️

Features

Install

RunPod Config

Configure Template

Create Template on RunPod

Configure Endpoint

Start Endpoint on RunPod

Call Endpoint

Chat w/ Streaming

Completion w/ Streaming

About

Releases

Packages

Languages

License

InquestGeronimo/superlaser

Folders and files

Latest commit

History

Repository files navigation

SuperLaser

⚠️Not yet ready for primetime ⚠️

Features

Install

RunPod Config

Configure Template

Create Template on RunPod

Configure Endpoint

Start Endpoint on RunPod

Call Endpoint

Chat w/ Streaming

Completion w/ Streaming

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages