-
-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement aesara.tensor.matmul
#744
Conversation
Codecov Report
@@ Coverage Diff @@
## main #744 +/- ##
==========================================
+ Coverage 79.23% 79.25% +0.01%
==========================================
Files 152 152
Lines 47943 48006 +63
Branches 10909 10933 +24
==========================================
+ Hits 37990 38048 +58
+ Misses 7453 7449 -4
- Partials 2500 2509 +9
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
The next two important methods that need to be implemented are Op.grad
(or Op.L_op
) and Op.infer_shape
.
They're both optional, but they can really make or break the usefulness of an Op
. At the very least, it's good to have "stubs" for those that explicitly say they aren't implemented.
You can use tests.unittest_tools.verify_grad
to create numeric gradient tests (or even tests.tensor.utils.makeTester
for a basic test suite), and tests.unittest_tools.InferShapeTester
to create automate the Op.infer_shape
tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that np.matmul
functionality can be implemented by a helper function that dispatches to the existing aesara.tensor.math.Dot
and aesara.tensor.math.tensordot
? If so, that would save considerable time and effort attempting to (re)implement the Op.grad
and Op.infer_shape
logic.
aesara.tensor.matmul
Not sure this is best given that |
Could you elaborate on this part? Do I necessarily have to implement a |
Yeah, you need to create a subclass and then call the method(s) it provides. It can be a useful tool, but, if you want to make custom tests, that's also fine. |
as in |
8ad2797
to
fcb2d5c
Compare
tests/tensor/test_nlinalg.py
Outdated
self.rng = np.random.default_rng(utt.fetch_seed()) | ||
self.op = matmul | ||
self.op_class = MatMul |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use of a shared RNG state induces test method order dependence; instead, if you create and seed the RNG objects within each independent test unit, they can be run in any order (and in parallel) and produce consistent results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear: by "test unit" you mean each independent test_*
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear: by "test unit" you mean each independent
test_*
method?
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like all the tests are sharing the same class-level RNG object. As I mentioned earlier, this will make the results order-dependent. We need to construct the RNG object within each individual test in order to avoid that.
N.B. A fixture could be used if you don't want to copy-paste the RNG construction code each time.
aesara.tensor.matmul
aesara.tensor.matmul
701a93e
to
a336429
Compare
It looks like this can't be rebased by maintainers, and we need to (squash and) rebase to make sure it passes with all the recent changes—especially the shape-inference-related ones. @zoj613, is "Allow edits and access to secrets by maintainers" not enabled/checked? |
@brandonwillard is this PR ready to be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor changes are needed; otherwise, it looks good.
tests/tensor/test_nlinalg.py
Outdated
self.rng = np.random.default_rng(utt.fetch_seed()) | ||
self.op = matmul | ||
self.op_class = MatMul |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like all the tests are sharing the same class-level RNG object. As I mentioned earlier, this will make the results order-dependent. We need to construct the RNG object within each individual test in order to avoid that.
N.B. A fixture could be used if you don't want to copy-paste the RNG construction code each time.
Thank you so much for this PR, @zoj613! I'm keeping an eye on it, and it appears that some minor changes are required before this can be merged. Are you able to push this PR to the boundary line? I have a strong dependency on this PR. |
I've just noticed that In the meantime, I've updated the tests and docstrings. |
b9e3b2c
to
7e0f94c
Compare
Hello, @brandonwillard. Is this PR now ready to be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can merge in order to move #808 along, but we need to keep the associated issue open until at least a gradient implementation is provided.
Preferably, we should have both a gradient and non-Python implementation for the addition of an Op
, and, in this case, #757 is where our efforts need to be focused to accomplish that.
I am looking into adding a grad/L_op/R_op method for this. It appears that it won't be as straight forward as re-using one for the |
Yeah, I realized that the logic for doing that is a slightly specialized form of the logic we need in #757, so it's better that we focus our efforts there. |
closes #488
This implements an aesara equivalent of
np.matmul
.The behavior depends on the arguments in the following way.
If both arguments are 2-D they are multiplied like conventional matrices.
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
If the first argument is 1-D, it is promoted to a matrix by prepending a 1 to its dimensions. After matrix multiplication the prepended 1 is removed.
If the second argument is 1-D, it is promoted to a matrix by appending a 1 to its dimensions. After matrix multiplication the appended 1 is removed.
matmul
differs fromdot
in two important ways:Multiplication by scalars is not allowed.
Stacks of matrices are broadcast together as if the matrices were elements, respecting the signature
(n,k),(k,m)->(n,m)
References
https://numpy.org/doc/stable/reference/generated/numpy.matmul.html