Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: 2.5x faster model generation (and 2.5x lower model costs) #70

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

moinnadeem
Copy link

@moinnadeem moinnadeem commented Jul 18, 2023

Hello!

🔥 work on the companion app!

I've been hacking around on low-latency inference recently on Replicate. This PR swaps out the default Vicuna model for a model with ~2.5x faster generation (depending on the sequence length).

It's just a base Replicate model, so we can just swap out the model string: https://replicate.com/moinnadeem/fastervicuna_13b
We could also work on getting these things upstreamed to the main Replicate model too.

WIP:

  • Test the frontend
  • a 3x improvement on latency

Things to test:

  • Does streaming generation work? I am in progress on this one

@moinnadeem moinnadeem changed the title feat: 2.5x faster model generation (and 2.5x lower Replicate bill) feat: 2.5x faster model generation (and 2.5x lower model costs) Jul 18, 2023
@jenniferli23
Copy link
Collaborator

Thanks for the PR @moinnadeem !!

Do you mind elaborate on how did you achieve the 2.5 faster and 2.5 lower cost model? And how did you benchmark it?

We're working on testing this out before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants