-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transformer models generation supports user-provided input embeddings #1276
Conversation
5933b4c
to
418d2fc
Compare
@zongwave as this PR changes the common utils.py, can you run through gaudi2 ci test for text-generation? |
@libinta GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/test_text_generation_example.py -v -k meta-llama/Llama-2-7b-hf tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-1-True-141.25776956002076] PASSED [ 10%] ========================================================= 10 passed, 42 deselected in 2522.21s (0:42:02) ========================================================== |
@libinta I ran GAUDI_CI test with command: GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/test_text_generation_example.py -v -k meta-llama/Llama======================================================================= test session starts ======================================================================= tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-1-True-141.25776956002076] PASSED [ 6%] ========================================================= 15 passed, 37 deselected in 7759.19s (2:09:19) ========================================================== I'm trying to figure out how to trigger llama 8x test case. |
68ad075
to
d020294
Compare
@regisss @libinta 18 llama cases include bf16/fp8, 1x/8x configuration were selected and passed both input embeds and input tokens cases.I triggered the CI test with "--input_embeds" option manually in test_text_generation_example.py 1. Test results on input embeds generation enabled by adding option "--input_embeds" manually:============================================================================================ test session starts ============================================================================================= tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-1-True-141.25776956002076] PASSED [ 5%] 2. Tests results on original input tokens generation:GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/test_text_generation_example.py -v -k llama tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-1-True-141.25776956002076] PASSED [ 5%] ========================================================= 18 passed, 34 deselected in 8202.18s (2:16:42) ========================================================= |
@regisss please take a look |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! There is a merge conflict to solve however before we can merge.
@regisss @vidyasiv Unfortunately, 5 test cases failed. It seems this #948 requires generating new tokens from "input_ids" and "inputs_embeds" simultaneously. FAILED tests/transformers/tests/models/llama/test_modeling_llama.py::LlamaModelTest::test_generate_from_inputs_embeds_decoder_only - AssertionError: Lists differ: [[70,[66 chars]7, 78], [90, 71, 10, 82, 86, 98, 64, 64, 64, 6[37 chars] 64]] != [[70,[66 chars]7, 78, 39, 95, 41], [90, 71, 10, 82, 86, 98, 6[61 chars] 90]] My PR only convers "inputs_embeds" or "input_ids" but not both at the same time. I’m curious about the real use case for this. I need more time to make both two PRs work together. |
I rebased the PR to the latest, in this update, transformer models support process both "input_ids" and "inputs_embeds" at the same time or separately. Passed #948 test cases to verify the generation consistent by input tokens or embeds tests/transformers/tests/models/bert/test_modeling_bert.py . [ 12%] Passed test-text-generation CI test cases for single HPU device to verify the performance reach benchmark (can not reserve 8-card Gaudi2 this time, I've ran and passed the total cases before) tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-1-True-141.25776956002076] PASSED [ 5%] tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-1230-False-128-128-13152.7] PASSED [ 33%] |
What does this PR do?
Some multimodal generation model needs user specified embedded tokens input, for example NExT-GPT, this commit enable user-provided input embeddings for model generation.
Modified transformers/generation/utils.py to include logic that supports user-provided input embeddings. This allows for more flexibility in how input data can be fed into the models.
Added the --input_embeds option in example/text-generation/run_generation.py to facilitate testing with different models using embedded tokens.
Conducted tests using the modified script on the Mistral 7B model with a batch size of 6. The tests included various input scenarios to assess the robustness and performance of the new features.
Enhanced example/text-generation/run_generation.py to support multiple decoding strategies including Greedy, Beam Search, and Contrastive Search. This allows users to specify the decoding strategy during runtime and evaluate the effectiveness of each method.
Fixes # (issue)
Optimum habana transformer models currently do not support user provided input embeddings for token generation.
Usage
To test the new features, use the following command:
Performance Data:
The table below summarizes the performance data for six popular models with both embeds and tokens input, highlighting the throughput, number of HPU graphs, graph compilation duration, and memory allocation.
<style> body { font-family: "Intel Clear", sans-serif; } table { width: 100%; border-collapse: collapse; } th, td { border: 1px solid black; padding: 8px; text-align: left; } th { background-color: #f2f2f2; } </style>Before submitting