Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added prompt based testing of text generation models #452

Closed
wants to merge 7 commits into from

Conversation

MohitIntel
Copy link
Collaborator

@MohitIntel MohitIntel commented Oct 6, 2023

What does this PR do?

Enables prompt based tests for text generation models and compares the performance with baseline numbers.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MohitIntel Thanks for opening this PR as this is truly needed! I think we should measure the throughput and the accuracy in the same test (kind of the same as it is done for training). How long does it take to generate 1024 tokens with this prompt?

@MohitIntel
Copy link
Collaborator Author

MohitIntel commented Oct 10, 2023

@MohitIntel Thanks for opening this PR as this is truly needed! I think we should measure the throughput and the accuracy in the same test (kind of the same as it is done for training). How long does it take to generate 1024 tokens with this prompt?

@regisss , I enabled accuracy check in the latest commit. The total time for running prompt based perf and accuracy tests for both bloomz-7b1 and llama-v2-13b-hf is nearly 26 minutes.

Here is the timing breakdown for generating 1024 tokens:
bloomz-7b1 - bf16 --> 5m:36s
bloomz-7b1 - fp32 --> 8m:19s
llama-2-13b-hf - bf16 --> 4m:39s
llama-2-13b-hf - fp32 --> 7m:12s

@regisss
Copy link
Collaborator

regisss commented Oct 11, 2023

@MohitIntel Could we reduce the size of the prompt and the number of tokens to generate? 128 for both could be good enough no?
Asking this because I will duplicate the CI at some point so that it runs on main branches and on stable releases.

@MohitIntel MohitIntel closed this Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants