-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advocating for Aggregator objects to perform reductions #33
Comments
Some of the things you're looking for are already in the pipeline and will be available soon. In particular the COUNT will be done far faster in v5.x, but not using an aggregator. The COUNT in Group 1 can already be done in the current GraphBLAS using the PLUS_PAIR semirings and GrB_mxv. It takes O(n + nvals(d)) time and memory if d is the resulting vector, in GraphBLAS v4.0.3, and will take O(nvals(d)) time and memory in v5.x once I add uniform-valued matrices and vectors. The other methods in Group 1 can be handled with different operators, if I added them or if they were done as user-defined operators. COUNT_NONZEROS in each row of A, for example, would be a PLUS_FIRSTISNONZ semiring with d=A*y and y is a dense vector, where
If A m-by-n is held by row and you want to reduce it to a vector d of length m, by 'summing' up each row, then it becomes a dot-product based GrB_mxv, which is O(nvals(d)) time for COUNT if done by GrB_reduce, and will be the same time in GrB_mxv in v5.x once I add uniform-valued matrices and vectors (so the dense vector y takes O(1) time and memory to build). If you want to reduce the columns, it becomes GrB_vxm, which is a saxpy-based method, and I use atomics to do the monoid, so it's still fast, efficient, and parallel, at O(nvals(d)) time. (when I say "time" I mean "work", not "time", to be precise; it would be O(nvals(d)/p) time on p threads, assuming no significant hash collisions). SUM_OF_SQUARES is the PLUS_POW semiring with a dense vector y whose entries are all equal to 2. I already have the POW binary operator so PLUS_POW could easily be added as a built-in method in my Source/Generated folder so it would be very fast. For COUNT_NONZERO, the time would be O(nvals(A)), as opposed to O(nvals(d)) for COUNT; both are asymptotically as fast as theoretically possible, and this will appear in v5.0.0 or v5.1.0. Uniform-valued matrices/vectors is 2nd on my TODO list, after I fix the MATLAB interface for R2021a (the MATLAB interface to v4.x and v5.x dies when using MATLAB R2021a, since R2021a includes its own v3.3.3 to do the built-in C=A*B, which conflicts with v4.x+). None of the aggregators you suggest, except the basic ones that are already monoids, can be done with atomic updates; they are by design sequential, with each aggregator owned by a single thread. They would be quite slow if you have a CSR matrix A and you want to aggregate down the columns of A (I'm not sure how my hash+atomics method for the hypersparse case would extend to the aggregators; it would be quite difficult to do). If you had a vector of n aggregators, it would take O(n+nvals(d)) time and memory and would make for a very slow aggregator but a very fast method based on GrB_vxm (taking O(nvals(d)) time). If n is huge, this is a big difference. The implementation would need hashed table of dynamically-created aggregators ... Then I'd need to give each of these its own spin-lock/critical section, or enforce sequential access somehow. It's a very daunting prospect to code. I see that more expressive reductions could be useful, but I hesitate to consider methods that have impose a fundamentally sequential computation into GraphBLAS, like updating a single aggregator. I think it would be better to start simple, and see how far we can take the existing monoid/semiring idea (such as COUNT and all other methods in Group 1, which I can almost already do asymptotically fast, and will do so in v5.x), and go from there. |
I agree with this sentiment. I think there is value in having Aggregator objects, so I can begin by adding aggregators to I updated SUM_OF_SQUARES and HYPOT to use POW as you suggested (my bad!). |
@eriknw I really like this proposal. Abstracting the complexity away from the user so they can treat aggregations as easy as monoids for reduction is a win for readability. It also allows GraphBLAS to be a more complete solution for general purpose sparse linear algebra where these kinds of aggregators would be expected to exist (especially simple ones like mean, std, count, etc). |
The complexity of the aggregators in a parallel algorithm inside GraphBLAS
is daunting, particularly when you consider reducing in the "wrong"
direction (say aggregating the columns when the matrix is stored by row).
I can already do the basic ones, with GrB* and sometimes GxB* methods.
mean is just sum(A)./degree(A), and degree(A) is just d = A*x or d=A'*x
with x = ones(n,1) and the PLUS_PAIR semirings. So then harmonic mean,
std, variance, kurtosis, skew, and so on can easily be done too, at least
if multiple calls to GrB* are done. I call your count(A) the degree(A), so
that's easy. The only GxB* that is required is the PAIR operator, to
compute the count or degree.
argmin and argmax can be done with the GxB* positional operators, and the
FIRST and LAST can be done that way too. You write:
ARGMIN and ARGMAX don't naturally generalize to Matrix to Scalar reduction.
but in fact they do. It just takes several calls to existing methods, plus
my GxB* positional binary ops. Something like this works for argmax
applied to the columns of the matrix A:
function [s,i] = argmax (A)
% argmax, using GraphBLAS
% does not handle matrices with nan's very well
% if column j has no entries, MATLAB returns s(j)=0 and i(j)=1, but
% this method returns s(j) and i(j) as empty entries.
% find the min in each column
s = max (A) ;
% locate where each max occurs in each column (there might be duplicates)
S = diag (s) ;
L = GrB.mxm (A, 'any.eq.double', S) ;
% drop the zeros in L
L = GrB.prune (L) ;
% find the position of the max entry in each column
[m,n] = size (A) ;
x = ones (1,n) ; % x = GrB.ones (1,n) in the next release
i = GrB.mxm (x, 'any.secondi1.int64', L) ;
with the test code:
rng ('default') ;
A = GrB.random (10, 10, 0.2)
[s,i] = argmax (A)
[s2,i2] = max (double (A))
The only reduction/aggregator in that list that I don't know how to do with
the current GrB* and GxB* methods is "is_monotonically_*". Also the mode
would be difficult to do.
These kind of methods do not lend themselves to implementation with a
monoid.
What is ptp and logaddexp?
…On Wed, Jun 9, 2021 at 2:31 PM Jim Kitchen ***@***.***> wrote:
@eriknw <https://github.com/eriknw> I really like this proposal.
Abstracting the complexity away from the user so they can treat
aggregations as easy as monoids for reduction is a win for readability. It
also allows GraphBLAS to be a more complete solution for general purpose
sparse linear algebra where these kinds of aggregators would be expected to
exist (especially simple ones like mean, std, count, etc).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#33 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEYIIOKYFA2EJHL5JZONWKDTR66PLANCNFSM43FBPYIA>
.
|
Here is the code with better format:
with the test code:
I'm using the secondi1 operator since MATLAB expects that indices start with 1, not zero. So for GraphBLAS proper, use the secondi not secondi1. |
The vector x is an "iso-valued" dense vector of all ones, and will take O(1) time and memory to create in v5.1, so this will work nicely for hypersparse matrices. It already works in my draft code. I can do:
and get the same result, except that s and i are vectors of dimension 2^60 instead of 10, in this case with just 8 entries each. |
assuming I change the statment x=ones(1,n) to x=GrB.ones(1,n) that is. |
I think there is a strong case to be made to add
Aggregator
objects to GraphBLAS to perform reductions. In software, it's powerful to give names to common operations, or, as is the case for aggregations, to reuse names users are already familiar with. This hides implementation details behind a cleaner API that better captures what the user wants to do instead of how to do it. I expect this would result in cleaner, more portable, and more performant code.Such a change to a GraphBLAS library (or the spec) should, of course, be done with care. Generic user-defined aggregations are complex, and systems that implement them are often too simple and lack functionality, performance, or both. Expressive, performant user-defined aggregations are necessarily complicated to support (imho), and we shouldn't lose sight of this. Having
Aggregator
objects allows a library to incrementally support more and more advanced aggregations (if desired), the details of which are hidden behind a nice API and opaque types. But one shouldn't start advanced. Let's begin simple.I think SuiteSparse:GraphBLAS can actually get really far by reusing its existing machinery. Most aggregations people care about do use monoids during a reduction step, which can be done with your O(1) dense vectors and suitable semirings. Many aggregations should be "simple" recipes for SuiteSparse (I quote "simple" so as to not make light of it--SuiteSparse is anything but simple!).
To help myself think through this, I have assembled a spreadsheet that describes how some common aggregations (and a few uncommon ones) may be performed in GraphBLAS. It's incomplete, but I think it's fairly illustrative:
https://docs.google.com/spreadsheets/d/1dQNrxLBVq8Y3JOdvNnIXRPZsPTDaf7lydztiG7Gbx8E/edit?usp=sharing
I split the aggregations into six groups of increasing complexity or likely difficultness to implement. The 0th group (not included) are the aggregations that are already well-handled by reductions with monoids, such as
SUM
,PROD
,ANY
,ALL
,MIN
,MAX
, etc. These could, of course, be rolled into Aggregators.Group 1
COUNT
COUNT_NONZERO
COUNT_ZERO
SUMOFSQUARES
Group 2
HYPOT
LOGADDEXP
LOGADDEXP2
Group 3
MEAN
VARP
VARS
STDEVP
STDEVS
GEOMETRIC_MEAN
HARMONIC_MEAN
PTP
SKEW
(not included)KURTOSIS
(not included)Group 4
ARGMIN
ARGMAX
Group 5
FIRST
LAST
FIRSTI
LASTI
FIRST_NONZERO
IS_MONOTONIC_DECREASING
Group 6
LASTI
(alternative)IS_MONOTONIC_DECREASING
(alternative)In group 5, we used
init_unaryop
orinit_indexunaryop
withmerge_binaryop
, but didn't define any update options. In this case, the init operation is applied to every element, and the elements are merged together (while maintaining order) withmerge_binaryop
.In group 6, we use an init operation, an update operation, and
merge_binaryop
. This means if we split the elements being aggregated into sublists (for parallel processing), then the init operation is only called on the first element in each sublist, and the update operation is used for all the rest.More behaviors and fields could be added. For example, if an update operation is specified without specifying a merge operation, then this could mean the elements must be processed in order, serially (i.e., no within-group parallelism!). For another example,
terminate_value
could be added to serve the same purpose asterminate_unaryop
.Known shortcomings:
FREQUENCY(val)
MOMENT(n)
NTH(n)
POWERMEAN(p)
STDEV(ddof)
VAR(ddof)
ZSCORE(val)
PERCENT_GT(val)
, etcMEAN_ABSOLUTE_DIFFERENCE(x)
(abs(x1 - x) + abs(x2 - x) + ...)NO_VALUE
.ONLY
could return the first element if and only if there is one element.VARS
) return inf, nan, or NO_VALUE if a single value in group? (I saynan
)MIN_BY
are challenging.BY
could be a UnaryOp or a Vector or Matrix the same shape as the input.ARGMAX
(probably in some way)MEDIAN
MODE
PERCENTILES
MAD
orMEAN_ABSOLUTE_DEVIATION
(around mean, median, mode, etc)NUNIQUE
orCOUNT_UNIQUE
ARGMIN
andARGMAX
don't naturally generalize to Matrix to Scalar reduction.I'll be honest, the thing I like most about this proposal is
COUNT
. I really want a canonical, portable way to calculate row degrees and column degrees 😃 .Please let me know if anything needs more explaining (or corrected). I intend this issue to be as much for the GraphBLAS committee and community as for SuiteSparse.
Cheers! 🍻
The text was updated successfully, but these errors were encountered: