-
Notifications
You must be signed in to change notification settings - Fork 36
2.2.5 Backend: Aphrodite Engine
av edited this page Sep 14, 2024
·
2 revisions
Handle:
aphrodite
URL: http://localhost:33921
PygmalionAI's large-scale inference engine
# [Optional] pre-pull the image, ~5GB
harbor pull aphrodite
# Start the service
harbor up aphrodite
# [Optional] When loading closed/gated models
# provision the token
harbor hf token <your-token>
# Open HF Search to find the models
harbor find gptq awq
# Download model repo to the global HF cache
# user/repo format
harbor hf download infly/INF-34B-Chat-AWQ
# Get/set the model to run
# in the aphrodite engine
harbor aphrodite model infly/INF-34B-Chat-AWQ
# See available options
harbor run aphrodite --help
# Get/Set the extra arguments for
# the aphrodite engine
harbor aphrodite args