-
Notifications
You must be signed in to change notification settings - Fork 48
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This is just a stub that gets the structure and a basic API server/test (inspired by vllm). Unlike some of the other implementations, this is going to be pretty thin as most of the work will be done in a companion project focused on compilation.
- Loading branch information
1 parent
9d929b0
commit e7f0f94
Showing
19 changed files
with
291 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
iree-compiler==20240129.785 | ||
iree-runtime==20240129.785 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
numpy>=1.26.3 | ||
onnx>=1.15.0 | ||
pytest>=8.0.0 | ||
pytest-xdist>=3.5.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Typing packages needed for full mypy execution at the project level. | ||
mypy==1.8.0 | ||
types-requests |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Turbine Serving Infrastructure | ||
|
||
This sub-project contains components and infrastructure for serving various | ||
forms of Turbine compiled models. Instead of coming with models, it defines | ||
ABIs that compiled models should adhere to in order to be served. It then | ||
allows them to be delivered as web endpoints via popular APIs. | ||
|
||
As emulation can be the sincerest form of flattery, this project derives | ||
substantial inspiration from vllm and the OpenAI APIs, emulating and | ||
interopping with them where possible. It is intended to be the lightest | ||
weight possible reference implementation for serving models with an | ||
opinionated compiled form, built elsewhere in the project. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
[mypy] | ||
|
||
explicit_package_bases = True | ||
mypy_path = $MYPY_CONFIG_FILE_DIR | ||
packages = turbine_serving.llm |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
[build-system] | ||
requires = ["setuptools", "wheel"] | ||
build-backend = "setuptools.build_meta" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
fastapi>=0.109.2 | ||
uvicorn>=0.27.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
[tool:pytest] | ||
testpaths = | ||
./tests | ||
filterwarnings = | ||
# TODO: Remove once flatbuffer 'imp' usage resolved. | ||
ignore::DeprecationWarning |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
# Copyright 2024 Advanced Micro Devices, Inc | ||
# | ||
# Licensed under the Apache License v2.0 with LLVM Exceptions. | ||
# See https://llvm.org/LICENSE.txt for license information. | ||
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
|
||
import json | ||
import os | ||
import distutils.command.build | ||
from pathlib import Path | ||
|
||
from setuptools import find_namespace_packages, setup # type: ignore | ||
|
||
THIS_DIR = Path(__file__).resolve().parent | ||
REPO_DIR = THIS_DIR.parent | ||
VERSION_INFO_FILE = REPO_DIR / "version_info.json" | ||
|
||
|
||
with open( | ||
os.path.join( | ||
REPO_DIR, | ||
"README.md", | ||
), | ||
"rt", | ||
) as f: | ||
README = f.read() | ||
|
||
|
||
def load_version_info(): | ||
with open(VERSION_INFO_FILE, "rt") as f: | ||
return json.load(f) | ||
|
||
|
||
version_info = load_version_info() | ||
PACKAGE_VERSION = version_info["package-version"] | ||
|
||
packages = find_namespace_packages( | ||
include=[ | ||
"turbine_serving", | ||
"turbine_serving.*", | ||
], | ||
) | ||
|
||
print("Found packages:", packages) | ||
|
||
# Lookup version pins from requirements files. | ||
requirement_pins = {} | ||
|
||
|
||
def load_requirement_pins(requirements_file: Path): | ||
with open(requirements_file, "rt") as f: | ||
lines = f.readlines() | ||
pin_pairs = [line.strip().split("==") for line in lines if "==" in line] | ||
requirement_pins.update(dict(pin_pairs)) | ||
|
||
|
||
load_requirement_pins(THIS_DIR / "requirements.txt") | ||
load_requirement_pins(REPO_DIR / "core" / "iree-requirements.txt") | ||
load_requirement_pins(REPO_DIR / "core" / "misc-requirements.txt") | ||
|
||
|
||
def get_version_spec(dep: str): | ||
if dep in requirement_pins: | ||
return f">={requirement_pins[dep]}" | ||
else: | ||
return "" | ||
|
||
|
||
# Override build command so that we can build into _python_build | ||
# instead of the default "build". This avoids collisions with | ||
# typical CMake incantations, which can produce all kinds of | ||
# hilarity (like including the contents of the build/lib directory). | ||
class BuildCommand(distutils.command.build.build): | ||
def initialize_options(self): | ||
distutils.command.build.build.initialize_options(self) | ||
self.build_base = "_python_build" | ||
|
||
|
||
setup( | ||
name=f"turbine-serving", | ||
version=f"{PACKAGE_VERSION}", | ||
author="SHARK Authors", | ||
author_email="[email protected]", | ||
description="SHARK Turbine Machine Learning Deployment Tools", | ||
long_description=README, | ||
long_description_content_type="text/markdown", | ||
url="https://github.com/nod-ai/SHARK-Turbine", | ||
license="Apache-2.0", | ||
classifiers=[ | ||
"Development Status :: 3 - Alpha", | ||
"License :: OSI Approved :: Apache Software License", | ||
"Programming Language :: Python :: 3", | ||
], | ||
packages=packages, | ||
package_data={"turbine_serving": ["py.typed"]}, | ||
install_requires=[ | ||
f"fastapi{get_version_spec('fastapi')}", | ||
f"iree-compiler{get_version_spec('iree-compiler')}", | ||
f"iree-runtime{get_version_spec('iree-runtime')}", | ||
f"uvicorn{get_version_spec('uvicorn')}", | ||
], | ||
extras_require={ | ||
"testing": [ | ||
f"pytest{get_version_spec('pytest')}", | ||
f"pytest-xdist{get_version_spec('pytest-xdist')}", | ||
], | ||
}, | ||
cmdclass={"build": BuildCommand}, | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# Copyright 2024 Advanced Micro Devices, Inc | ||
# | ||
# Licensed under the Apache License v2.0 with LLVM Exceptions. | ||
# See https://llvm.org/LICENSE.txt for license information. | ||
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
|
||
import os | ||
import pytest | ||
import requests | ||
import subprocess | ||
import sys | ||
import time | ||
|
||
|
||
class ServerRunner: | ||
def __init__(self, args): | ||
self.url = "http://localhost:8000" | ||
env = os.environ.copy() | ||
env["PYTHONUNBUFFERED"] = "1" | ||
self.process = subprocess.Popen( | ||
[ | ||
sys.executable, | ||
"-m", | ||
"turbine_serving.llm.entrypoints.api_server", | ||
] | ||
+ args, | ||
env=env, | ||
stdout=sys.stdout, | ||
stderr=sys.stderr, | ||
) | ||
self._wait_for_ready() | ||
|
||
def _wait_for_ready(self): | ||
start = time.time() | ||
while True: | ||
try: | ||
if requests.get(f"{self.url}/health").status_code == 200: | ||
return | ||
except Exception as e: | ||
if self.process.poll() is not None: | ||
raise RuntimeError("API server processs terminated") from e | ||
time.sleep(0.25) | ||
if time.time() - start > 30: | ||
raise RuntimeError("Timeout waiting for server start") from e | ||
|
||
def __del__(self): | ||
try: | ||
process = self.process | ||
except AttributeError: | ||
pass | ||
else: | ||
process.terminate() | ||
process.wait() | ||
|
||
|
||
@pytest.fixture(scope="session") | ||
def server(): | ||
runner = ServerRunner([]) | ||
yield runner | ||
|
||
|
||
def test_basic(server: ServerRunner): | ||
... |
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
# Copyright 2024 Advanced Micro Devices, Inc | ||
# | ||
# Licensed under the Apache License v2.0 with LLVM Exceptions. | ||
# See https://llvm.org/LICENSE.txt for license information. | ||
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
|
||
from typing import Sequence | ||
|
||
import argparse | ||
|
||
from fastapi import FastAPI, Request | ||
from fastapi.responses import JSONResponse, Response | ||
import sys | ||
import uvicorn | ||
|
||
app = FastAPI() | ||
|
||
|
||
@app.get("/health") | ||
async def health() -> Response: | ||
return Response(status_code=200) | ||
|
||
|
||
def main(clargs: Sequence[str]): | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument("--host", type=str, default=None) | ||
parser.add_argument("--port", type=int, default=8000) | ||
parser.add_argument( | ||
"--root-path", | ||
type=str, | ||
default=None, | ||
help="Root path to use for installing behind path based proxy.", | ||
) | ||
parser.add_argument( | ||
"--timeout-keep-alive", type=int, default=5, help="Keep alive timeout" | ||
) | ||
args = parser.parse_args(clargs) | ||
|
||
app.root_path = args.root_path | ||
uvicorn.run( | ||
app, | ||
host=args.host, | ||
port=args.port, | ||
log_level="debug", | ||
timeout_keep_alive=args.timeout_keep_alive, | ||
) | ||
|
||
|
||
if __name__ == "__main__": | ||
main(sys.argv[1:]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Marker file for PEP 561 inline type checking. |