Sarkar/growing bucket for beam #450

ssarkar2 · 2023-10-04T23:49:09Z

What does this PR do?

Extends growing bucket optimization from greedy to beam.

Original PR for growing bucket optimization for greedy here

Testing in progress

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2023-10-04T23:57:55Z

The documentation is not available anymore as the PR was closed or merged.

optimum/habana/transformers/generation/utils.py

ssarkar2 · 2023-10-17T03:09:23Z

@regisss : can you please take a look at this PR. This ports this PR from greedy to beam.
Perf/accuracy tests have been done internally, and they look good.

This is not necessary for 1.12, which I understand is your priority right now

regisss

LGTM! I just left a few minor comments.
Could you also share a command line that I can quickly run to test it please? @ssarkar2

examples/text-generation/README.md

optimum/habana/transformers/generation/utils.py

ssarkar2 · 2023-10-17T17:48:47Z

@regisss : Incorporated ur suggestions. Here are commands for testing:

1. Run with a dataset (--dataset_name squad --column_name context ) for 10 steps with bs=1 (--dataset_max_samples 10 --batch 1 ) without cropping prompt (--max_input_tokens -1)
beams is enabled by --num_beams 2

python run_generation.py --model_name_or_path path_to_model --use_hpu_graphs --use_kv_cache --max_new_tokens 256 --batch 1 --num_beams 2 --dataset_name squad --column_name context --max_input_tokens -1 --dataset_max_samples 10

Adding "--bucket 50" to enable current PR code
python run_generation.py --model_name_or_path path_to_model --use_hpu_graphs --use_kv_cache --max_new_tokens 256 --batch 1 --num_beams 2 --dataset_name squad --column_name context --max_input_tokens -1 --dataset_max_samples 10 --bucket 50

the bucketed one is faster than the base one

2. Single prompt test:
python run_generation.py --model_name_or_path path_to_model --use_hpu_graphs --use_kv_cache --max_new_tokens 256 --batch 16 --num_beams 2

python run_generation.py --model_name_or_path path_to_model --use_hpu_graphs --use_kv_cache --max_new_tokens 256 --batch 16 --num_beams 2 --bucket 50

Here we may see some improvement, but maybe not too much difference. Could depend on model, promptlen. max_new_tokens etc

This change helps reduce num compilations, so it is more useful in test 1 (dataset, with different input shapes)

models tested: opt-350m, llama-7b, bloom-7b

ssarkar2 · 2023-10-18T15:47:10Z

@regisss , i see some distributed tests fail in "Unit and integration tests / Run tests for optimum.habana.transformers (pull_request_target)". Are these expected failures, or introduced by my PR?

regisss · 2023-10-20T15:54:59Z

@regisss , i see some distributed tests fail in "Unit and integration tests / Run tests for optimum.habana.transformers (pull_request_target)". Are these expected failures, or introduced by my PR?

I've seen these errors in the CI of another PR that was not related to distributed training either. That's a bit weird but that's independent of your PR. I'm going to relaunch these tests.

regisss

I just spotted one comment to update. I'm going to test it quickly and then it should be good to merge!

examples/text-generation/run_generation.py

ssarkar2 · 2023-10-25T16:48:46Z

@regisss can we merge this?

regisss

LGTM!

ssarkar2 added 2 commits October 3, 2023 22:31

Initial commit porting growing bucket from greedy to beam

89ed30a

Style

866c0ca

ssarkar2 requested review from bhargaveede, vivekgoe and regisss as code owners October 4, 2023 23:49

Merge branch 'main' into sarkar/growing_bucket_for_beam

1e9c588

ssarkar2 commented Oct 10, 2023

View reviewed changes

optimum/habana/transformers/generation/utils.py Show resolved Hide resolved

Minor fix in beam search finalize

7fbf486

regisss reviewed Oct 17, 2023

View reviewed changes

Address comments

246d0a6

ssarkar2 requested a review from regisss October 17, 2023 17:49

ssarkar2 added the run-test Run CI for PRs from external contributors label Oct 17, 2023

regisss reviewed Oct 20, 2023

View reviewed changes

examples/text-generation/run_generation.py Outdated Show resolved Hide resolved

Update run_generation.py

3394023

regisss approved these changes Oct 26, 2023

View reviewed changes

regisss merged commit b935935 into main Oct 26, 2023
9 checks passed

regisss deleted the sarkar/growing_bucket_for_beam branch October 26, 2023 13:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sarkar/growing bucket for beam #450

Sarkar/growing bucket for beam #450

ssarkar2 commented Oct 4, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 4, 2023 •

edited

Loading

ssarkar2 commented Oct 17, 2023

regisss left a comment

ssarkar2 commented Oct 17, 2023 •

edited

Loading

ssarkar2 commented Oct 18, 2023

regisss commented Oct 20, 2023

regisss left a comment

ssarkar2 commented Oct 25, 2023

regisss left a comment

Sarkar/growing bucket for beam #450

Sarkar/growing bucket for beam #450

Conversation

ssarkar2 commented Oct 4, 2023 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Oct 4, 2023 • edited Loading

ssarkar2 commented Oct 17, 2023

regisss left a comment

Choose a reason for hiding this comment

ssarkar2 commented Oct 17, 2023 • edited Loading

ssarkar2 commented Oct 18, 2023

regisss commented Oct 20, 2023

regisss left a comment

Choose a reason for hiding this comment

ssarkar2 commented Oct 25, 2023

regisss left a comment

Choose a reason for hiding this comment

ssarkar2 commented Oct 4, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 4, 2023 •

edited

Loading

ssarkar2 commented Oct 17, 2023 •

edited

Loading