Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of Batching, Enabling HPU Graphs and FP8 quantization for SD3 Pipeline #1345

Open
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

deepak-gowda-narayana
Copy link

@deepak-gowda-narayana deepak-gowda-narayana commented Sep 19, 2024

Added support and enabled the follow for SD3 Pipeline.

  • HPU Graph integration
  • FP8 quantization
  • Batching

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@mkpatel3-github
Copy link

@dsocek @skavulya need your feedback to review

Copy link
Contributor

@dsocek dsocek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see inline comments and requested changes. Also, make sure there is at least 1 CI test for SD3 with batching added to test_diffusers

Copy link
Contributor

@skavulya skavulya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please add a test for batch sizes to tests /test_diffusers.py

@deepak-gowda-narayana deepak-gowda-narayana changed the title Implementation of Batching and Enabling HPU Graphs Integration for SD3 Pipeline Implementation of Batching, Enabling HPU Graphs and FP8 quantization Integration for SD3 Pipeline Oct 2, 2024
@deepak-gowda-narayana deepak-gowda-narayana changed the title Implementation of Batching, Enabling HPU Graphs and FP8 quantization Integration for SD3 Pipeline Implementation of Batching, Enabling HPU Graphs and FP8 quantization for SD3 Pipeline Oct 2, 2024
@deepak-gowda-narayana
Copy link
Author

LGTM. Please add a test for batch sizes to tests /test_diffusers.py

Added the tests for Batch Size

@deepak-gowda-narayana
Copy link
Author

@skavulya @dsocek Request to review - Have updated the changes specified.

Copy link
Contributor

@dsocek dsocek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. There are some extra spaces in sd3 pipeline file you should run make style to fix.

Copy link
Contributor

@dsocek dsocek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add example of running FP8 mode in README

@deepak-gowda-narayana
Copy link
Author

Should add example of running FP8 mode in README

Added the example

@deepak-gowda-narayana
Copy link
Author

Looks good. There are some extra spaces in sd3 pipeline file you should run make style to fix.

Fixed

@deepak-gowda-narayana
Copy link
Author

@dsocek Please provide Feedback on the changes

@deepak-gowda-narayana
Copy link
Author

@libinta Request to review the PR and push for merging

@emascarenhas
Copy link
Contributor

@deepak-gowda-narayana ,
Also run fast tests i.e., tests/ci/fast_tests*.sh and the SLOW test_diffusers.py and post summary of results here.
Also make style and fix any errors.

@deepak-gowda-narayana
Copy link
Author

deepak-gowda-narayana commented Oct 22, 2024

@emascarenhas

@deepak-gowda-narayana , Also run fast tests i.e., tests/ci/fast_tests*.sh and the SLOW test_diffusers.py and post summary of results here. Also make style and fix any errors.

Result Summary of fast_tests.sh

python -m pytest tests/test_gaudi_configuration.py tests/test_trainer_distributed.py tests/test_trainer.py tests/test_trainer_seq2seq.py
=============================================================================================== 81 passed, 8 skipped, 39 warnings in 109.40s (0:01:49) ================================================================================================

Result Summary of fast_tests_diffusers.sh

python -m pytest tests/test_diffusers.py
================================================================================================= 113 passed, 47 skipped, 280 warnings in 1229.45s (0:20:29) =================================================================================================

Result Summary of slow_tests_diffusers.sh
================================================================================================================== short test summary info ===================================================================================================================
FAILED tests/test_diffusers.py::GaudiDDPMPipelineTester::test_no_throughput_regression_bf16 - AssertionError: 6.937816172838211 not greater than or equal to 7.287651444971561
======================================================================================= 1 failed, 4 passed, 1 skipped, 154 deselected, 6 warnings in 358.13s (0:05:58) =======================================================================================
make: *** [Makefile:104: slow_tests_diffusers] Error 1

The Throughput in GaudiDDPMPipeline is calculated with the formula
throughput = (end_time - start_time) / batch_size , which gives the output in seconds/sample metric.

The failing test currently compares throughput against a benchmark value, assuming that higher throughput is better. However, the throughput metric is measured in seconds per sample, where lower values indicate better performance. I am working on a fix to correct the throughput calculation in the pipeline and will update the test accordingly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants