Skip to content

Commit

Permalink
[Doc] Group examples into categories (vllm-project#11782)
Browse files Browse the repository at this point in the history
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Fred Reiss <[email protected]>
  • Loading branch information
hmellor authored and frreiss committed Jan 10, 2025
1 parent ae27a6e commit 0c92a77
Show file tree
Hide file tree
Showing 13 changed files with 240 additions and 62 deletions.
5 changes: 1 addition & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,7 @@ instance/

# Sphinx documentation
docs/_build/
docs/source/getting_started/examples/*.rst
!**/*.template.rst
docs/source/getting_started/examples/*.md
!**/*.template.md
docs/source/getting_started/examples/

# PyBuilder
.pybuilder/
Expand Down
4 changes: 4 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,7 @@ help:
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

clean:
@$(SPHINXBUILD) -M clean "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
rm -rf "$(SOURCEDIR)/getting_started/examples"
1 change: 1 addition & 0 deletions docs/requirements-docs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ sphinx-book-theme==1.0.1
sphinx-copybutton==0.5.2
myst-parser==3.0.1
sphinx-argparse==0.4.0
sphinx-togglebutton==0.3.2
msgspec
cloudpickle

Expand Down
4 changes: 4 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@
"sphinx.ext.autosummary",
"myst_parser",
"sphinxarg.ext",
"sphinx_togglebutton",
]
myst_enable_extensions = [
"colon_fence",
]

# Add any paths that contain templates here, relative to this directory.
Expand Down
264 changes: 222 additions & 42 deletions docs/source/generate_examples.py
Original file line number Diff line number Diff line change
@@ -1,54 +1,234 @@
import itertools
import re
from dataclasses import dataclass, field
from pathlib import Path

ROOT_DIR = Path(__file__).parent.parent.parent.resolve()
ROOT_DIR_RELATIVE = '../../../..'
EXAMPLE_DIR = ROOT_DIR / "examples"
EXAMPLE_DOC_DIR = ROOT_DIR / "docs/source/getting_started/examples"


def fix_case(text: str) -> str:
subs = [
("api", "API"),
("llm", "LLM"),
("vllm", "vLLM"),
("openai", "OpenAI"),
("multilora", "MultiLoRA"),
]
for sub in subs:
text = re.sub(*sub, text, flags=re.IGNORECASE)
subs = {
"api": "API",
"cpu": "CPU",
"llm": "LLM",
"tpu": "TPU",
"aqlm": "AQLM",
"gguf": "GGUF",
"lora": "LoRA",
"vllm": "vLLM",
"openai": "OpenAI",
"multilora": "MultiLoRA",
"mlpspeculator": "MLPSpeculator",
r"fp\d+": lambda x: x.group(0).upper(), # e.g. fp16, fp32
r"int\d+": lambda x: x.group(0).upper(), # e.g. int8, int16
}
for pattern, repl in subs.items():
text = re.sub(rf'\b{pattern}\b', repl, text, flags=re.IGNORECASE)
return text


def generate_title(filename: str) -> str:
# Turn filename into a title
title = filename.replace("_", " ").title()
# Handle acronyms and names
title = fix_case(title)
return f"# {title}"
@dataclass
class Index:
"""
Index class to generate a structured document index.
Attributes:
path (Path): The path save the index file to.
title (str): The title of the index.
description (str): A brief description of the index.
caption (str): An optional caption for the table of contents.
maxdepth (int): The maximum depth of the table of contents. Defaults to 1.
documents (list[str]): A list of document paths to include in the index. Defaults to an empty list.
Methods:
generate() -> str:
Generates the index content as a string in the specified format.
""" # noqa: E501
path: Path
title: str
description: str
caption: str
maxdepth: int = 1
documents: list[str] = field(default_factory=list)

def generate(self) -> str:
content = f"# {self.title}\n\n{self.description}\n\n"
content += "```{toctree}\n"
content += f":caption: {self.caption}\n:maxdepth: {self.maxdepth}\n"
content += "\n".join(sorted(self.documents)) + "\n```\n"
return content


@dataclass
class Example:
"""
Example class for generating documentation content from a given path.
Attributes:
path (Path): The path to the main directory or file.
category (str): The category of the document.
main_file (Path): The main file in the directory.
other_files (list[Path]): List of other files in the directory.
title (str): The title of the document.
Methods:
__post_init__(): Initializes the main_file, other_files, and title attributes.
determine_main_file() -> Path: Determines the main file in the given path.
determine_other_files() -> list[Path]: Determines other files in the directory excluding the main file.
determine_title() -> str: Determines the title of the document.
generate() -> str: Generates the documentation content.
""" # noqa: E501
path: Path
category: str = None
main_file: Path = field(init=False)
other_files: list[Path] = field(init=False)
title: str = field(init=False)

def __post_init__(self):
self.main_file = self.determine_main_file()
self.other_files = self.determine_other_files()
self.title = self.determine_title()

def determine_main_file(self) -> Path:
"""
Determines the main file in the given path.
If the path is a file, it returns the path itself. Otherwise, it searches
for Markdown files (*.md) in the directory and returns the first one found.
Returns:
Path: The main file path, either the original path if it's a file or the first
Markdown file found in the directory.
Raises:
IndexError: If no Markdown files are found in the directory.
""" # noqa: E501
return self.path if self.path.is_file() else list(
self.path.glob("*.md")).pop()

def determine_other_files(self) -> list[Path]:
"""
Determine other files in the directory excluding the main file.
This method checks if the given path is a file. If it is, it returns an empty list.
Otherwise, it recursively searches through the directory and returns a list of all
files that are not the main file.
Returns:
list[Path]: A list of Path objects representing the other files in the directory.
""" # noqa: E501
if self.path.is_file():
return []
is_other_file = lambda file: file.is_file() and file != self.main_file
return [file for file in self.path.rglob("*") if is_other_file(file)]

def determine_title(self) -> str:
return fix_case(self.path.stem.replace("_", " ").title())

def generate(self) -> str:
# Convert the path to a relative path from __file__
make_relative = lambda path: ROOT_DIR_RELATIVE / path.relative_to(
ROOT_DIR)

content = f"Source <gh-file:{self.path.relative_to(ROOT_DIR)}>.\n\n"
if self.main_file.suffix == ".py":
content += f"# {self.title}\n\n"
include = "include" if self.main_file.suffix == ".md" else \
"literalinclude"
content += f":::{{{include}}} {make_relative(self.main_file)}\n:::\n\n"

if not self.other_files:
return content

content += "## Example materials\n\n"
for file in self.other_files:
include = "include" if file.suffix == ".md" else "literalinclude"
content += f":::{{admonition}} {file.relative_to(self.path)}\n"
content += ":class: dropdown\n\n"
content += f":::{{{include}}} {make_relative(file)}\n:::\n"
content += ":::\n\n"

return content


def generate_examples():
root_dir = Path(__file__).parent.parent.parent.resolve()

# Source paths
script_dir = root_dir / "examples"
script_paths = sorted(script_dir.glob("*.py"))

# Destination paths
doc_dir = root_dir / "docs/source/getting_started/examples"
doc_paths = [doc_dir / f"{path.stem}.md" for path in script_paths]

# Generate the example docs for each example script
for script_path, doc_path in zip(script_paths, doc_paths):
# Make script_path relative to doc_path and call it include_path
include_path = '../../../..' / script_path.relative_to(root_dir)
content = (f"{generate_title(doc_path.stem)}\n\n"
f"Source: <gh-file:examples/{script_path.name}>.\n\n"
f"```{{literalinclude}} {include_path}\n"
":language: python\n"
":linenos:\n```")
# Create the EXAMPLE_DOC_DIR if it doesn't exist
if not EXAMPLE_DOC_DIR.exists():
EXAMPLE_DOC_DIR.mkdir(parents=True)

# Create empty indices
examples_index = Index(
path=EXAMPLE_DOC_DIR / "examples_index.md",
title="Examples",
description=
"A collection of examples demonstrating usage of vLLM.\nAll documented examples are autogenerated using <gh-file:docs/source/generate_examples.py> from examples found in <gh-file:examples>.", # noqa: E501
caption="Examples",
maxdepth=1) # TODO change to 2 when examples start being categorised
category_indices = {
"offline_inference":
Index(
path=EXAMPLE_DOC_DIR / "examples_offline_inference_index.md",
title="Offline Inference",
description=
"Offline inference examples demonstrate how to use vLLM in an offline setting, where the model is queried for predictions in batches.", # noqa: E501
caption="Examples",
),
"online_serving":
Index(
path=EXAMPLE_DOC_DIR / "examples_online_serving_index.md",
title="Online Serving",
description=
"Online serving examples demonstrate how to use vLLM in an online setting, where the model is queried for predictions in real-time.", # noqa: E501
caption="Examples",
),
"other":
Index(
path=EXAMPLE_DOC_DIR / "examples_other_index.md",
title="Other",
description=
"Other examples that don't strongly fit into the online or offline serving categories.", # noqa: E501
caption="Examples",
),
}

examples = []
# Find categorised examples
for category in category_indices:
category_dir = EXAMPLE_DIR / category
py = category_dir.glob("*.py")
md = category_dir.glob("*.md")
for path in itertools.chain(py, md):
examples.append(Example(path, category))
# Find examples in subdirectories
for path in category_dir.glob("*/*.md"):
examples.append(Example(path.parent, category))
# Find uncategorised examples
py = EXAMPLE_DIR.glob("*.py")
md = EXAMPLE_DIR.glob("*.md")
for path in itertools.chain(py, md):
examples.append(Example(path))
# Find examples in subdirectories
for path in EXAMPLE_DIR.glob("*/*.md"):
# Skip categorised examples
if path.parent.name in category_indices:
continue
examples.append(Example(path.parent))

# Generate the example documentation
for example in examples:
doc_path = EXAMPLE_DOC_DIR / f"{example.path.stem}.md"
with open(doc_path, "w+") as f:
f.write(content)

# Generate the toctree for the example scripts
with open(doc_dir / "examples_index.template.md") as f:
examples_index = f.read()
with open(doc_dir / "examples_index.md", "w+") as f:
example_docs = "\n".join(path.stem + ".md" for path in script_paths)
f.write(examples_index.replace(r"%EXAMPLE_DOCS%", example_docs))
f.write(example.generate())
# Add the example to the appropriate index
index = category_indices.get(example.category, examples_index)
index.documents.append(example.path.stem)

# Generate the index files
for category_index in category_indices.values():
if category_index.documents:
examples_index.documents.insert(0, category_index.path.name)
with open(category_index.path, "w+") as f:
f.write(category_index.generate())

with open(examples_index.path, "w+") as f:
f.write(examples_index.generate())

This file was deleted.

6 changes: 3 additions & 3 deletions examples/fp8/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ python3 examples/fp8/extract_scales.py --quantized_model <QUANTIZED_MODEL_DIR> -
```
### 4. Load KV Cache Scaling Factors into VLLM.
This script evaluates the inference throughput of language models using various backends such as vLLM. It measures the time taken to process a given number of prompts and generate sequences for each prompt. The recently generated KV cache scaling factors are now integrated into the benchmarking process and allow for KV cache scaling factors to be utilized for FP8.
```python
```
# prerequisites:
# - LLaMa 2 kv_cache_scales.json file
Expand Down Expand Up @@ -90,7 +90,7 @@ optional arguments:
--kv-cache-dtype {auto,fp8} Data type for kv cache storage. If "auto", will use model data type. FP8_E5M2 (without scaling) is only supported on cuda version greater than 11.8. On ROCm (AMD GPU), FP8_E4M3 is instead supported ```for common inference criteria.
--quantization-param-path QUANT_PARAM_JSON Path to the JSON file containing the KV cache scaling factors. This should generally be supplied, when KV cache dtype is FP8. Otherwise, KV cache scaling factors default to 1.0, which may cause accuracy issues. FP8_E5M2 (without scaling) is only supported on cuda version greater than 11.8. On ROCm (AMD GPU), FP8_E4M3 is instead supported for common inference criteria.
```
```
Example:
```console
python3 benchmarks/benchmark_throughput.py --input-len <INPUT_LEN> --output-len <OUTPUT_LEN> -tp <TENSOR_PARALLEL_SIZE> --kv-cache-dtype fp8 --quantization-param-path <path/to/kv_cache_scales.json> --model <path-to-llama2>
```python
```
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# vLLM + Prometheus/Grafana
# Prometheus and Grafana

This is a simple example that shows you how to connect vLLM metric logging to the Prometheus/Grafana stack. For this example, we launch Prometheus and Grafana via Docker. You can checkout other methods through [Prometheus](https://prometheus.io/) and [Grafana](https://grafana.com/) websites.

Install:
- [`docker`](https://docs.docker.com/engine/install/)
- [`docker compose`](https://docs.docker.com/compose/install/linux/#install-using-the-repository)

### Launch
## Launch

Prometheus metric logging is enabled by default in the OpenAI-compatible server. Launch via the entrypoint:
```bash
Expand Down Expand Up @@ -35,19 +35,19 @@ python3 ../../benchmarks/benchmark_serving.py \

Navigating to [`http://localhost:8000/metrics`](http://localhost:8000/metrics) will show the raw Prometheus metrics being exposed by vLLM.

### Grafana Dashboard
## Grafana Dashboard

Navigate to [`http://localhost:3000`](http://localhost:3000). Log in with the default username (`admin`) and password (`admin`).

#### Add Prometheus Data Source
### Add Prometheus Data Source

Navigate to [`http://localhost:3000/connections/datasources/new`](http://localhost:3000/connections/datasources/new) and select Prometheus.

On Prometheus configuration page, we need to add the `Prometheus Server URL` in `Connection`. For this setup, Grafana and Prometheus are running in separate containers, but Docker creates DNS name for each containers. You can just use `http://prometheus:9090`.

Click `Save & Test`. You should get a green check saying "Successfully queried the Prometheus API.".

#### Import Dashboard
### Import Dashboard

Navigate to [`http://localhost:3000/dashboard/import`](http://localhost:3000/dashboard/import), upload `grafana.json`, and select the `prometheus` datasource. You should see a screen that looks like the following:

Expand Down
File renamed without changes.
File renamed without changes.

0 comments on commit 0c92a77

Please sign in to comment.