-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XLA:CPU] Allow convert natively on supported CPUs #17222
[XLA:CPU] Allow convert natively on supported CPUs #17222
Conversation
You're linking a private commit on an Intel branch, which actual commit is it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR and sorry for the delay!
xla/service/cpu/ir_emitter.cc
Outdated
#if defined(INTEL_MKL) && defined(ENABLE_ONEDNN_V3) | ||
!IsNativeConvertSupportedOnThisCPU() | ||
#else | ||
true | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would like to avoid having #ifdef
s littered in the code as much as possible. I think a better long-term solution is to have a generic IsNativeConvertSupported
that can be called regardless of whether INTEL_MKL
is defined or not. I'd like to keep this PR short and focused so I'll do some refactoring (of this method and other utils) in a future PR instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do this in the current CL. Even as is IsNativeConvertSupported
seems pretty generic. It may need to be extend in the future, but that's OK.
Let's please move IsNativeConvertSupported
directly in cpu/ir_emitter.cc
and call it unconditionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kanvi-nervana Could you please implement the change above (or comment if you think that's not a good idea).
Another issue: IsNativeConvertSupported
tests the CPU we're compiling on, not necessarily the target CPU. Could the two be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another issue: IsNativeConvertSupported tests the CPU we're compiling on, not necessarily the target CPU. Could the two be different?
Yes, they could for AOT mode (although the current thunk runtime doesn't support this mode yet).
@kanvi-nervana Please check the features string from IrEmitter's target_machine_features() instead. You can get the string by calling getTargetFeatureString() from target_machine_.
Since we are adding a new function to check from feature string, it's probably good to add a small unit test for the function as well (feeding in feature strings with and without "amx-bf16" and "avxneconvert").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, I will update the PR with the requested changes.
xla/service/cpu/ir_emitter.cc
Outdated
#if defined(INTEL_MKL) && defined(ENABLE_ONEDNN_V3) | ||
!IsNativeConvertSupportedOnThisCPU() | ||
#else | ||
true | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another issue: IsNativeConvertSupported tests the CPU we're compiling on, not necessarily the target CPU. Could the two be different?
Yes, they could for AOT mode (although the current thunk runtime doesn't support this mode yet).
@kanvi-nervana Please check the features string from IrEmitter's target_machine_features() instead. You can get the string by calling getTargetFeatureString() from target_machine_.
Since we are adding a new function to check from feature string, it's probably good to add a small unit test for the function as well (feeding in feature strings with and without "amx-bf16" and "avxneconvert").
@penpornk I have addressed the comments. Please review. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes. It looks good. All new code must have a test, please so please add a small unit test. See Penporn's suggestion from a few days ago.
xla/service/cpu/ir_emitter.cc
Outdated
@@ -147,6 +148,16 @@ class IrEmitter::CpuElementalIrEmitter : public ElementalIrEmitter { | |||
return hlo_module_config_.debug_options().xla_cpu_enable_fast_min_max(); | |||
} | |||
|
|||
bool IsNativeConvertSupportedOnThisCPU(IrEmitter* ir_emitter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
bool IsNativeConvertSupportedOnThisCPU(IrEmitter* ir_emitter) { | |
bool IsNativeConvertSupportedOnTargetCPU(IrEmitter* ir_emitter) { |
xla/service/cpu/ir_emitter.cc
Outdated
@@ -147,6 +148,16 @@ class IrEmitter::CpuElementalIrEmitter : public ElementalIrEmitter { | |||
return hlo_module_config_.debug_options().xla_cpu_enable_fast_min_max(); | |||
} | |||
|
|||
bool IsNativeConvertSupportedOnThisCPU(IrEmitter* ir_emitter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add a unit test for this function by making it take feature_string
as a parameter instead. Then the unit test can just call the function with a few features strings to demonstrate that the function works fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes!
xla/service/cpu/ir_emitter.cc
Outdated
if (absl::StrContains(feature_string, "+avxneconvert") || | ||
absl::StrContains(feature_string, "+amx-bf16")) { | ||
return true; | ||
} | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more nit please, this can be simplified to:
if (absl::StrContains(feature_string, "+avxneconvert") || | |
absl::StrContains(feature_string, "+amx-bf16")) { | |
return true; | |
} | |
return false; | |
return absl::StrContains(feature_string, "+avxneconvert") || | |
absl::StrContains(feature_string, "+amx-bf16"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again for the PR!
Imported from GitHub PR #17222 The performance for some workloads dropped and git bisect points to this [commit](c48011a) on XLA to be causing the drop. The comments indicate that LLVM optimizations are being suppressed when converting from FP32-BF16 and back since it may cause performance degradation on other cpu's. Since, some cpu's can handle BF16 efficiently, this is not required and can be bypassed. Copybara import of the project: -- 36d2883 by Kanvi Khanna <[email protected]>: allow convert natively -- a7f6f71 by Kanvi Khanna <[email protected]>: Address comments -- 1422583 by Kanvi Khanna <[email protected]>: Add test -- 9693d9e by Kanvi Khanna <[email protected]>: address commment Merging this change closes #17222 FUTURE_COPYBARA_INTEGRATE_REVIEW=#17222 from Intel-tensorflow:kanvi/native_convert_support 9693d9e PiperOrigin-RevId: 683640752
Imported from GitHub PR openxla/xla#17222 The performance for some workloads dropped and git bisect points to this [commit](openxla/xla@c48011a) on XLA to be causing the drop. The comments indicate that LLVM optimizations are being suppressed when converting from FP32-BF16 and back since it may cause performance degradation on other cpu's. Since, some cpu's can handle BF16 efficiently, this is not required and can be bypassed. Copybara import of the project: -- 36d28839d38860dd4a222cba2da95f07083b74c1 by Kanvi Khanna <[email protected]>: allow convert natively -- a7f6f71a80e4848dce281aaf56cdc07de72cf7ee by Kanvi Khanna <[email protected]>: Address comments -- 14225831b897ffcc47baefccef98b9d925cdfea1 by Kanvi Khanna <[email protected]>: Add test -- 9693d9e90d89fef9c7c063aa9d9aa6c648915145 by Kanvi Khanna <[email protected]>: address commment Merging this change closes #17222 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#17222 from Intel-tensorflow:kanvi/native_convert_support 9693d9e90d89fef9c7c063aa9d9aa6c648915145 PiperOrigin-RevId: 683640752
Imported from GitHub PR #17222 The performance for some workloads dropped and git bisect points to this [commit](c48011a) on XLA to be causing the drop. The comments indicate that LLVM optimizations are being suppressed when converting from FP32-BF16 and back since it may cause performance degradation on other cpu's. Since, some cpu's can handle BF16 efficiently, this is not required and can be bypassed. Copybara import of the project: -- 36d2883 by Kanvi Khanna <[email protected]>: allow convert natively -- a7f6f71 by Kanvi Khanna <[email protected]>: Address comments -- 1422583 by Kanvi Khanna <[email protected]>: Add test -- 9693d9e by Kanvi Khanna <[email protected]>: address commment Merging this change closes #17222 FUTURE_COPYBARA_INTEGRATE_REVIEW=#17222 from Intel-tensorflow:kanvi/native_convert_support 9693d9e PiperOrigin-RevId: 683640752
Imported from GitHub PR openxla/xla#17222 The performance for some workloads dropped and git bisect points to this [commit](openxla/xla@c48011a) on XLA to be causing the drop. The comments indicate that LLVM optimizations are being suppressed when converting from FP32-BF16 and back since it may cause performance degradation on other cpu's. Since, some cpu's can handle BF16 efficiently, this is not required and can be bypassed. Copybara import of the project: -- 36d28839d38860dd4a222cba2da95f07083b74c1 by Kanvi Khanna <[email protected]>: allow convert natively -- a7f6f71a80e4848dce281aaf56cdc07de72cf7ee by Kanvi Khanna <[email protected]>: Address comments -- 14225831b897ffcc47baefccef98b9d925cdfea1 by Kanvi Khanna <[email protected]>: Add test -- 9693d9e90d89fef9c7c063aa9d9aa6c648915145 by Kanvi Khanna <[email protected]>: address commment Merging this change closes #17222 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#17222 from Intel-tensorflow:kanvi/native_convert_support 9693d9e90d89fef9c7c063aa9d9aa6c648915145 PiperOrigin-RevId: 683640752
Imported from GitHub PR #17222 The performance for some workloads dropped and git bisect points to this [commit](c48011a) on XLA to be causing the drop. The comments indicate that LLVM optimizations are being suppressed when converting from FP32-BF16 and back since it may cause performance degradation on other cpu's. Since, some cpu's can handle BF16 efficiently, this is not required and can be bypassed. Copybara import of the project: -- 36d2883 by Kanvi Khanna <[email protected]>: allow convert natively -- a7f6f71 by Kanvi Khanna <[email protected]>: Address comments -- 1422583 by Kanvi Khanna <[email protected]>: Add test -- 9693d9e by Kanvi Khanna <[email protected]>: address commment Merging this change closes #17222 FUTURE_COPYBARA_INTEGRATE_REVIEW=#17222 from Intel-tensorflow:kanvi/native_convert_support 9693d9e PiperOrigin-RevId: 683640752
Imported from GitHub PR openxla/xla#17222 The performance for some workloads dropped and git bisect points to this [commit](openxla/xla@c48011a) on XLA to be causing the drop. The comments indicate that LLVM optimizations are being suppressed when converting from FP32-BF16 and back since it may cause performance degradation on other cpu's. Since, some cpu's can handle BF16 efficiently, this is not required and can be bypassed. Copybara import of the project: -- 36d28839d38860dd4a222cba2da95f07083b74c1 by Kanvi Khanna <[email protected]>: allow convert natively -- a7f6f71a80e4848dce281aaf56cdc07de72cf7ee by Kanvi Khanna <[email protected]>: Address comments -- 14225831b897ffcc47baefccef98b9d925cdfea1 by Kanvi Khanna <[email protected]>: Add test -- 9693d9e90d89fef9c7c063aa9d9aa6c648915145 by Kanvi Khanna <[email protected]>: address commment Merging this change closes #17222 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#17222 from Intel-tensorflow:kanvi/native_convert_support 9693d9e90d89fef9c7c063aa9d9aa6c648915145 PiperOrigin-RevId: 683640752
Imported from GitHub PR #17222 The performance for some workloads dropped and git bisect points to this [commit](c48011a) on XLA to be causing the drop. The comments indicate that LLVM optimizations are being suppressed when converting from FP32-BF16 and back since it may cause performance degradation on other cpu's. Since, some cpu's can handle BF16 efficiently, this is not required and can be bypassed. Copybara import of the project: -- 36d2883 by Kanvi Khanna <[email protected]>: allow convert natively -- a7f6f71 by Kanvi Khanna <[email protected]>: Address comments -- 1422583 by Kanvi Khanna <[email protected]>: Add test -- 9693d9e by Kanvi Khanna <[email protected]>: address commment Merging this change closes #17222 FUTURE_COPYBARA_INTEGRATE_REVIEW=#17222 from Intel-tensorflow:kanvi/native_convert_support 9693d9e PiperOrigin-RevId: 683640752
Imported from GitHub PR openxla/xla#17222 The performance for some workloads dropped and git bisect points to this [commit](openxla/xla@c48011a) on XLA to be causing the drop. The comments indicate that LLVM optimizations are being suppressed when converting from FP32-BF16 and back since it may cause performance degradation on other cpu's. Since, some cpu's can handle BF16 efficiently, this is not required and can be bypassed. Copybara import of the project: -- 36d28839d38860dd4a222cba2da95f07083b74c1 by Kanvi Khanna <[email protected]>: allow convert natively -- a7f6f71a80e4848dce281aaf56cdc07de72cf7ee by Kanvi Khanna <[email protected]>: Address comments -- 14225831b897ffcc47baefccef98b9d925cdfea1 by Kanvi Khanna <[email protected]>: Add test -- 9693d9e90d89fef9c7c063aa9d9aa6c648915145 by Kanvi Khanna <[email protected]>: address commment Merging this change closes #17222 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#17222 from Intel-tensorflow:kanvi/native_convert_support 9693d9e90d89fef9c7c063aa9d9aa6c648915145 PiperOrigin-RevId: 683640752
Imported from GitHub PR #17222 The performance for some workloads dropped and git bisect points to this [commit](c48011a) on XLA to be causing the drop. The comments indicate that LLVM optimizations are being suppressed when converting from FP32-BF16 and back since it may cause performance degradation on other cpu's. Since, some cpu's can handle BF16 efficiently, this is not required and can be bypassed. Copybara import of the project: -- 36d2883 by Kanvi Khanna <[email protected]>: allow convert natively -- a7f6f71 by Kanvi Khanna <[email protected]>: Address comments -- 1422583 by Kanvi Khanna <[email protected]>: Add test -- 9693d9e by Kanvi Khanna <[email protected]>: address commment Merging this change closes #17222 FUTURE_COPYBARA_INTEGRATE_REVIEW=#17222 from Intel-tensorflow:kanvi/native_convert_support 9693d9e PiperOrigin-RevId: 683640752
Imported from GitHub PR openxla/xla#17222 The performance for some workloads dropped and git bisect points to this [commit](openxla/xla@c48011a) on XLA to be causing the drop. The comments indicate that LLVM optimizations are being suppressed when converting from FP32-BF16 and back since it may cause performance degradation on other cpu's. Since, some cpu's can handle BF16 efficiently, this is not required and can be bypassed. Copybara import of the project: -- 36d28839d38860dd4a222cba2da95f07083b74c1 by Kanvi Khanna <[email protected]>: allow convert natively -- a7f6f71a80e4848dce281aaf56cdc07de72cf7ee by Kanvi Khanna <[email protected]>: Address comments -- 14225831b897ffcc47baefccef98b9d925cdfea1 by Kanvi Khanna <[email protected]>: Add test -- 9693d9e90d89fef9c7c063aa9d9aa6c648915145 by Kanvi Khanna <[email protected]>: address commment Merging this change closes #17222 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#17222 from Intel-tensorflow:kanvi/native_convert_support 9693d9e90d89fef9c7c063aa9d9aa6c648915145 PiperOrigin-RevId: 683640752
Imported from GitHub PR #17222 The performance for some workloads dropped and git bisect points to this [commit](c48011a) on XLA to be causing the drop. The comments indicate that LLVM optimizations are being suppressed when converting from FP32-BF16 and back since it may cause performance degradation on other cpu's. Since, some cpu's can handle BF16 efficiently, this is not required and can be bypassed. Copybara import of the project: -- 36d2883 by Kanvi Khanna <[email protected]>: allow convert natively -- a7f6f71 by Kanvi Khanna <[email protected]>: Address comments -- 1422583 by Kanvi Khanna <[email protected]>: Add test -- 9693d9e by Kanvi Khanna <[email protected]>: address commment Merging this change closes #17222 FUTURE_COPYBARA_INTEGRATE_REVIEW=#17222 from Intel-tensorflow:kanvi/native_convert_support 9693d9e PiperOrigin-RevId: 683640752
Imported from GitHub PR openxla/xla#17222 The performance for some workloads dropped and git bisect points to this [commit](openxla/xla@c48011a) on XLA to be causing the drop. The comments indicate that LLVM optimizations are being suppressed when converting from FP32-BF16 and back since it may cause performance degradation on other cpu's. Since, some cpu's can handle BF16 efficiently, this is not required and can be bypassed. Copybara import of the project: -- 36d28839d38860dd4a222cba2da95f07083b74c1 by Kanvi Khanna <[email protected]>: allow convert natively -- a7f6f71a80e4848dce281aaf56cdc07de72cf7ee by Kanvi Khanna <[email protected]>: Address comments -- 14225831b897ffcc47baefccef98b9d925cdfea1 by Kanvi Khanna <[email protected]>: Add test -- 9693d9e90d89fef9c7c063aa9d9aa6c648915145 by Kanvi Khanna <[email protected]>: address commment Merging this change closes #17222 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#17222 from Intel-tensorflow:kanvi/native_convert_support 9693d9e90d89fef9c7c063aa9d9aa6c648915145 PiperOrigin-RevId: 683640752
Imported from GitHub PR #17222 The performance for some workloads dropped and git bisect points to this [commit](c48011a) on XLA to be causing the drop. The comments indicate that LLVM optimizations are being suppressed when converting from FP32-BF16 and back since it may cause performance degradation on other cpu's. Since, some cpu's can handle BF16 efficiently, this is not required and can be bypassed. Copybara import of the project: -- 36d2883 by Kanvi Khanna <[email protected]>: allow convert natively -- a7f6f71 by Kanvi Khanna <[email protected]>: Address comments -- 1422583 by Kanvi Khanna <[email protected]>: Add test -- 9693d9e by Kanvi Khanna <[email protected]>: address commment Merging this change closes #17222 FUTURE_COPYBARA_INTEGRATE_REVIEW=#17222 from Intel-tensorflow:kanvi/native_convert_support 9693d9e PiperOrigin-RevId: 684822680
Imported from GitHub PR openxla/xla#17222 The performance for some workloads dropped and git bisect points to this [commit](openxla/xla@c48011a) on XLA to be causing the drop. The comments indicate that LLVM optimizations are being suppressed when converting from FP32-BF16 and back since it may cause performance degradation on other cpu's. Since, some cpu's can handle BF16 efficiently, this is not required and can be bypassed. Copybara import of the project: -- 36d28839d38860dd4a222cba2da95f07083b74c1 by Kanvi Khanna <[email protected]>: allow convert natively -- a7f6f71a80e4848dce281aaf56cdc07de72cf7ee by Kanvi Khanna <[email protected]>: Address comments -- 14225831b897ffcc47baefccef98b9d925cdfea1 by Kanvi Khanna <[email protected]>: Add test -- 9693d9e90d89fef9c7c063aa9d9aa6c648915145 by Kanvi Khanna <[email protected]>: address commment Merging this change closes #17222 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#17222 from Intel-tensorflow:kanvi/native_convert_support 9693d9e90d89fef9c7c063aa9d9aa6c648915145 PiperOrigin-RevId: 684822680
Imported from GitHub PR #17222 The performance for some workloads dropped and git bisect points to this [commit](c48011a) on XLA to be causing the drop. The comments indicate that LLVM optimizations are being suppressed when converting from FP32-BF16 and back since it may cause performance degradation on other cpu's. Since, some cpu's can handle BF16 efficiently, this is not required and can be bypassed. Copybara import of the project: -- 36d2883 by Kanvi Khanna <[email protected]>: allow convert natively -- a7f6f71 by Kanvi Khanna <[email protected]>: Address comments -- 1422583 by Kanvi Khanna <[email protected]>: Add test -- 9693d9e by Kanvi Khanna <[email protected]>: address commment Merging this change closes #17222 FUTURE_COPYBARA_INTEGRATE_REVIEW=#17222 from Intel-tensorflow:kanvi/native_convert_support 9693d9e PiperOrigin-RevId: 684822680
Currently Triton emitter supports only scalar constant (that have rank 0). There is no strong reason why the emitter shouldn't support other shapes of constants that contain only 1 element. FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#17222 from Intel-tensorflow:kanvi/native_convert_support 9693d9e90d89fef9c7c063aa9d9aa6c648915145 PiperOrigin-RevId: 684730334
Imported from GitHub PR openxla/xla#17222 The performance for some workloads dropped and git bisect points to this [commit](openxla/xla@c48011a) on XLA to be causing the drop. The comments indicate that LLVM optimizations are being suppressed when converting from FP32-BF16 and back since it may cause performance degradation on other cpu's. Since, some cpu's can handle BF16 efficiently, this is not required and can be bypassed. Copybara import of the project: -- 36d28839d38860dd4a222cba2da95f07083b74c1 by Kanvi Khanna <[email protected]>: allow convert natively -- a7f6f71a80e4848dce281aaf56cdc07de72cf7ee by Kanvi Khanna <[email protected]>: Address comments -- 14225831b897ffcc47baefccef98b9d925cdfea1 by Kanvi Khanna <[email protected]>: Add test -- 9693d9e90d89fef9c7c063aa9d9aa6c648915145 by Kanvi Khanna <[email protected]>: address commment Merging this change closes #17222 PiperOrigin-RevId: 684858625
… rewrites. FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#17222 from Intel-tensorflow:kanvi/native_convert_support 9693d9e90d89fef9c7c063aa9d9aa6c648915145 PiperOrigin-RevId: 684613131
The performance for some workloads dropped and git bisect points to this commit on XLA to be causing the drop. The comments indicate that LLVM optimizations are being suppressed when converting from FP32-BF16 and back since it may cause performance degradation on other cpu's. Since, some cpu's can handle BF16 efficiently, this is not required and can be bypassed.