Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations and WAs to support HPU execution for Detr-Resnet-50 #1334

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

sandeep-maddipatla
Copy link

Modifications to the Detr transformer including WA's and Optimizations to run the Detr-Resnet-50 model in eager and lazy modes on the HPU.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@sandeep-maddipatla
Copy link
Author

This PR builds on #1155 , which is meant for Eager mode and adds changes necessary for Lazy mode execution. Some of the review feedback for the former PR is addressed here. More details below.

[x] pls. rebase/sync on top of main OH.
[x] run make style
[x] Pls. share the results of this test on g2 machines.
Done. Will share test result in another comment below.
[ ] we need to add README file for this in examples
Skipped. There is a README.md at https://github.com/huggingface/optimum-habana/tree/main/examples/object-detection. We haven't changed that particular inference example. Pls let us know if that still needs modification.
[x] pls. add the appropriate CI tests for this.
Done. Extended existing ci-test to add a detr-resnet-50 test as well.

@sandeep-maddipatla
Copy link
Author

sandeep-maddipatla commented Sep 16, 2024

make style result:
image

@sandeep-maddipatla
Copy link
Author

Test Result:
image

Copy link
Contributor

@vidyasiv vidyasiv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase to latest main, there are some changes in modeling_utils.py

"""
Copied from https://github.com/huggingface/transformers/tree/v4.40.2
https://github.com/huggingface/transformers/blob/4fdf58afb72b0754da30037fc800b6044e7d9c99/src/transformers/models/detr/modeling_detr.py#L2287
The modications are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The modications are:
The modifications are:


# Compute the classification cost. Contrary to the loss, we don't use the NLL,
# but approximate it in 1 - proba[target class].
# The 1 is a constant that doesn't change the matching, it can be ommitted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The 1 is a constant that doesn't change the matching, it can be ommitted.
# The 1 is a constant that doesn't change the matching, it can be omitted.

Copy link
Contributor

@vidyasiv vidyasiv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@splotnikv , please take a look if you're covering for Sandeep

@splotnikv
Copy link

@splotnikv , please take a look if you're covering for Sandeep

Done. I don't have access right to update this PR, so created a new one. See #1404.

splotnikv and others added 5 commits October 23, 2024 03:43
- Add capability to ignore targets that have an out-of-range ID
- This helps to pad target objects to avoid graph recompilation and
  yet not affect the loss computation in training.
@sandeep-maddipatla
Copy link
Author

Rebased to latest optimum-habana, addressed feedback from #1404 , and merged in changes from that PR.

Now that I'm back working on this, will use this PR going forward to complete the review process. Sorry for the back-and-forth over the two PR's.

@vidyasiv
Copy link
Contributor

vidyasiv commented Oct 29, 2024

@sandeep-maddipatla Sorry was not able to work last few days so unable to review sooner.

  • Instead of jira link which shouldnt be pasted in public repo, can you add high level summary of the changes?
  • Are the tests meant to address both lazy and eager modes? or should i be manually setting env to test that?
GAUDI2_CI=1
RUN_SLOW=true
#lazy mode all 8 pass but eager 4 fail
PT_HPU_LAZY_MODE=0 pytest tests/test_object_detection.py 
FAILED tests/test_object_detection.py::GaudiDETRTester::test_inference_hpu_graphs - AttributeError: module 'habana_frameworks.torch.hpu' has no attribute 'wrap_in_hpu_graph'
FAILED tests/test_object_detection.py::GaudiDETRTester::test_no_latency_regression_autocast - AttributeError: module 'habana_frameworks.torch.hpu' has no attribute 'wrap_in_hpu_graph'
FAILED tests/test_object_detection.py::GaudiDetrResnet50_Tester::test_inference_hpu_graphs - AttributeError: module 'habana_frameworks.torch.hpu' has no attribute 'wrap_in_hpu_graph'
FAILED tests/test_object_detection.py::GaudiDetrResnet50_Tester::test_no_latency_regression_autocast - AttributeError: module 'habana_frameworks.torch.hpu' has no attribute 'wrap_in_hpu_graph'
  • README check (again not sure if eager mode is supported)
export PT_HPU_LAZY_MODE=0
python3 run_example.py \
	--model_name_or_path facebook/detr-resnet-101 \
	--image_path "http://images.cocodataset.org/val2017/000000039769.jpg" \
	--use_hpu_graphs \
	--bf16 \
	--print_result
AttributeError: module 'habana_frameworks.torch.hpu' has no attribute 'wrap_in_hpu_graph'

Lazy mode passes

Detected cat with confidence 0.996 at location [344.0, 25.25, 640.0, 376.0]
Detected remote with confidence 0.996 at location [328.0, 76.0, 372.0, 188.0]
Detected remote with confidence 0.996 at location [39.5, 69.5, 175.0, 119.0]
Detected cat with confidence 1.0 at location [15.62, 52.5, 316.0, 472.0]
Detected couch with confidence 0.996 at location [-1.25, 0.94, 640.0, 472.0]

Stats:
------------------------------------------------------------
Total latency (ms): 59.30161476135254 (for n_iterations=10) 
Average latency (ms): 5.930161476135254 (per iteration) 

Pls clarify how the testing is to be done.

@vidyasiv
Copy link
Contributor

vidyasiv commented Nov 6, 2024

@sandeep-maddipatla , could you update by EOW?

@vidyasiv
Copy link
Contributor

@sandeep-maddipatla please resolve merge conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants