Feat/use vllm server #40

saattrupdan · 2024-05-22T09:32:55Z

This changes the vLLM generator to using the vLLM server instead of the Python interface. This simplifies things a bit, in that we can re-use the OpenaiGenerator code, but is also required to enable streaming with vLLM models.

This defaults to starting a new server in a background process, but also allows running a separate server and setting generator.server=<url-to-server> to use the existing one.

… one

…ator

…'t exist

saattrupdan added 24 commits May 21, 2024 16:35

feat: Use vllm server

2d0f5f7

fix: Use self.model

223cd96

debug

50c907f

debug

d997f28

feat: Try using guided_json

24fd935

fix: Use extra_body

984079f

fix: Use json

309baac

fix: Use model_json_schema

35fab6c

chore: Include response_format

9b63001

chore: Logging

0061921

chore: Remove logging

86745cb

debug

25c8e07

fix: Do not break streaming if chunk_str is None

7e0528c

debug

4fef832

feat: Spawn new vLLM server if not already running

05a96d3

fix: Do not use api_key if running vLLM generator

ff9a16e

fix: vLLM config

d5b8f6b

chore: Remove breakpoint

c1dc715

debug

a73d98d

debug

b9e72f0

fix: Set server after booting it

6622b7a

debug

acde691

debug

ef4c33e

fix: Add sleep after server start

00b38bd

saattrupdan requested a review from AJDERS May 22, 2024 09:32

saattrupdan self-assigned this May 22, 2024

saattrupdan added 4 commits May 22, 2024 11:34

fix: Only require CUDA to start the vLLM inference server, not to use…

d532c77

… one

fix: Only set guided_json if using vLLM

dc0be2c

tests: vLLM tests

3fd39c6

feat: Add more args to vLLM server

ad04ed1

saattrupdan added 28 commits May 22, 2024 12:44

fix: Up vLLM startup sleep time

dedb032

debug

598f286

debug

36e6d0b

debug

192721a

debug

89a56c2

fix: Add port back in

a2936b6

fix: Set up self.server in OpenaiGenerator correctly

e414942

debug

6d01292

fix: Store config in VllmGenerator

905dd97

debug

63cf5c9

feat: Check manually if Uvicorn server has started

837ce28

feat: Block stderr when loading tokenizer

14e5d33

debug

7c1298c

refactor: Use HiddenPrints

bc1641b

fix: Block transformers logging

67de367

feat: Add --host back in

aaae8cb

debug

8b67836

fix: Add del self in __del__

9a93b7b

chore: Ignore ResourceWarning in pytest

3e15ea2

tests: Initialise the VllmGenerator fewer times in tests

38cb047

fix: Do not hardcode different ports

2c3ff56

tests: Use same VllmGenerator

9b2fddc

tests: Remove validity check test, as it is impossible with VllmGener…

21abef2

…ator

tests: Remove random_seed from VllmGenerator config

65e538a

docs: Add comments

6facb53

fix: Raise ValueError in get_component_by_name if module or class don…

aa64ac4

…'t exist

docs: Update coverage badge

213950b

chore: Re-instate pre-commit hook

46c61ef

saattrupdan merged commit 5dfeee2 into main May 22, 2024
2 checks passed

saattrupdan deleted the feat/use-vllm-server branch May 22, 2024 11:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/use vllm server #40

Feat/use vllm server #40

saattrupdan commented May 22, 2024 •

edited

Loading

Feat/use vllm server #40

Feat/use vllm server #40

Conversation

saattrupdan commented May 22, 2024 • edited Loading

saattrupdan commented May 22, 2024 •

edited

Loading