Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Add projects section in README which is developed based on FasterTransformer #731

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lvhan028
Copy link

@lvhan028 lvhan028 commented Jul 25, 2023

It is noted that some issues(#506 #729 #727) are requesting FasterTransformer to support Llama and Llama-2. Our project LMDeploy developed based on FasterTransformer, has supported them and their derived models, like vicuna, alpaca, baichuan, and so on.

Meanwhile, LMDeploy has developed a continuous-batch-like feature named persistent-batch, which can handle #696 by the way. It modeled the inference of a conversational LLM as a persistently running batch whose lifetime spans the entire serving process, To put it simply

  • The persistent batch as N pre-configured batch slots.
  • Requests join the batch when there are free slots available. A batch slot is released and can be reused once the generation of the requested tokens is finished.
  • On cache-hits , history tokens don't need to be decoded in every round of a conversation; generation of response tokens will start instantly.
  • The batch grows or shrinks automatically to minimize unnecessary computations.

We really appreciate FasterTransformer team for developing such an efficient and high-throughput LLM inference engine

@lvhan028 lvhan028 changed the title [Doc] add projects section in README which is developed based on FasterTransformer [Doc] Add projects section in README which is developed based on FasterTransformer Jul 25, 2023
@AnyangAngus
Copy link

@lvhan028
Cool!
I see TurboMind can support llama-2-70b with GQA now.
I would like to ask if there will be any support plans for LMDeploy to support Llama-2-7b and Llama-2-13b with GQA ?
Thank U!

@lvhan028
Copy link
Author

@AnyangAngus
GQA in LMDeploy/TurboMind doesn't distinguish between 7B, 13B, or 70B models.

But as far as I know, llama-2-7b/13b doesn't have GQA block
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants