-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of quantizelinear for int4 #1706
base: develop
Are you sure you want to change the base?
Improve performance of quantizelinear for int4 #1706
Conversation
} else if (op.getOperatorName() == "unpack_scale") { | ||
assert(inElemType == b.getI32Type()); | ||
assert(outElemType == b.getF16Type()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general. But it would be nice to have a generic unpack
operator which would take input_type
and elemType
and return array of elemType[]
instead.
For example in this case, unpack(input)
could return vector<outElemType>()
and then scale
would be first element and bias
would be second element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a limitation in our codebase regarding linalg::GenericOp number of outputs (for example, look at findPostFusionTransforms in transformMapUtils.cpp). It's assumed the output is always one. We can do a generic unpack(input, element) so that it outputs the output[element].
} else { | ||
LLVM_DEBUG( | ||
llvm::dbgs() | ||
<< "Found a linalg.generic that takes as input the gemm A or B\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its checking outputs not inputs. genericOut
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1706 +/- ##
===========================================
- Coverage 78.52% 78.45% -0.08%
===========================================
Files 100 100
Lines 28346 28405 +59
Branches 4130 4146 +16
===========================================
+ Hits 22260 22285 +25
- Misses 4426 4458 +32
- Partials 1660 1662 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Value output = op.getOutput(); | ||
Location loc = op->getLoc(); | ||
|
||
Type origBiasType; | ||
if (bias) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add tests in migraphx-to-tosa.mlir
assert(inElemType == b.getI32Type()); | ||
assert(outElemType == b.getF16Type()); | ||
|
||
Value offset = b.create<arith::ConstantIntOp>(loc, 16, inElemType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add tests in rocmlir-custom-tosa-to-linalg.mlir
@@ -2549,7 +2549,16 @@ struct GridwiseGemmAccelRewritePattern | |||
|
|||
// Obtain data types of inputs. | |||
auto elementTypeA = op.getA().getType().getElementType(); | |||
auto maybeElementTypeALoad = getGemmInputElementType(op.getA()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a unit-test that uses the getGemmInputElementType
to test out that logic along with gridwisegemmtoblockwise
.
1748a96
to
f87b60a
Compare
In this PR we improve the performance of quantizelinear for int4, these are the changes:
Pack scale and bias together in the same tensor (quantizelinear)There's a PR in migraphx to also change the layout of the scale+bias tensor: ROCm/AMDMIGraphX#3718
This is the migraphx program of the layout change (int32 packing scale and bias together):
@pfultz2 pointed out we can use slice operations instead of changing quantizelinear to use one param. This simplifies this PR a lot.
TODO:
closes: https://github.com/ROCm/rocMLIR-internal/issues/1665