Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix reduce to Vector with FIRST or SECOND #28

Open
eriknw opened this issue Feb 26, 2021 · 11 comments
Open

Matrix reduce to Vector with FIRST or SECOND #28

eriknw opened this issue Feb 26, 2021 · 11 comments
Labels
enhancement New feature or request

Comments

@eriknw
Copy link
Contributor

eriknw commented Feb 26, 2021

Per the spec, one can pass a BinaryOp to GrB_Matrix_reduce_BinaryOp to reduce the rows or columns of a Matrix to a Vector. SuiteSparse 4.0 changed so that only BinaryOps that correspond to known Monoids may be used. FIRST and SECOND do not have Monoids, therefore they cannot be used.

Nevertheless, I expected FIRST and SECOND to work in SuiteSparse. The intent is clear, and I suspect efficient implementations exist.

(Note: per the spec, the outcome of a non-commutative or non-associative binary operator is undefined. FIRST and SECOND are non-commutative--they commute to each other--so their behavior would also be undefined with the current spec. SuiteSparse often implies an ordering when applying BinaryOps and chooses to be well-defined).

@DrTimothyAldenDavis
Copy link
Owner

The algorithms I'm using for reduce-to-vector would break if the operator is not associative and commutative. FIRST and SECOND wouldn't work at all. Slower methods could work, but then I would have to have two sets of reduce-to-vector methods: one where the operator is well-behaved, and a slower one where it isn't. That's a lot of extra code to support.

@eriknw
Copy link
Contributor Author

eriknw commented Feb 26, 2021

Thanks. I understand this is the case for generic binary operators. If the internal data formats are sorted, I would expect there to be efficient specialized implementations for FIRST and SECOND.

Would you be okay with creating special implementations for these, or do you think this would get too messy? If you're okay with a specialized approach, would this be a good "first issue" to implement for new contributors?

@eriknw
Copy link
Contributor Author

eriknw commented Feb 26, 2021

fwiw, I can't think of a meaningful example when restricted to the current spec where a Monoid can't be used instead of a BinaryOp. So, I think it's fine that you don't fully support reduce with BinaryOp, and perfectly understandable if you don't want to add special code to support a couple edge cases (in this case, FIRST and SECOND).

A better, longer term approach may be to address some of the deficiencies of using BinaryOps (which there are several). In particular, it would be great to have generic machinery in place that can perform a reduction with FIRST or SECOND in parallel. I believe this is doable, but we'd need to determine an API.

For another example, suppose one created a user-define type that is a struct. With generic, parallel machinery in place, I would expect one could perform a reduction to e.g. add all the elements of one of the fields in the struct.

@DrTimothyAldenDavis
Copy link
Owner

I think the API would be different. There's a similar issue that arises in the CC problem, where an assignment C(I) max= A is done, using the max as the accumulator operator. But in this case, if the index vector has duplicates, they want to apply the accum operator to reduce down to a final result. This is not something the C API for GraphBLAS defines. Using FIRST and SECOND for the accum operator is not unlike the kind of reduction you want to do, if C is a vector and A a matrix. These two cases are different, but they are related in changing what it means to do a reduction.

@eriknw
Copy link
Contributor Author

eriknw commented Feb 26, 2021

That's an interesting case with "assign" with duplicate indices. I see you discuss this (or something very similar) in section 9.9. If I understand correctly, I think what you want here is an explicit "scatter" operation (to a list of indices) that accepts a binaryop/monoid to perform the reduction when values are scattered to the same index. This reduction could happen before the accumulation into the output, so, per normal GraphBLAS operations, accum could be a different binop. Fwiw, I would love to have a functional scatter as well where a function is used to determine which index a value is scattered to.

For interested readers (e.g., my future self): "build" is another place where reduction is kind of special. Could one expect to get better performance if a Monoid were accepted instead of a BinaryOp? SuiteSparse does accept FIRST and SECOND here and is well-defined where the spec is undefined if the binop is not associative or commutative.

@DrTimothyAldenDavis
Copy link
Owner

DrTimothyAldenDavis commented Feb 26, 2021 via email

@DrTimothyAldenDavis
Copy link
Owner

I think what you really need is not a reduce-to-vector with FIRST and SECOND. That is ill-defined for any other binary op. What you really need is a kind of GxB_select that can depend on the entry count in a given row or column. "Give me just the first 4 entries in each row, starting from the lowest column indices". That step appears in CC, but GraphBLAS can't do it so the LAGraph CC exports the matrix and does it itself.

@eriknw
Copy link
Contributor Author

eriknw commented Mar 1, 2021

It's good to bring select into the conversation, but I really would like a better reduction with binaryop. For example, performing a reduction on one field of a user-defined type that is a struct.

To do such a generic reduction well (and able to be done in parallel, so I will be talking about doing reductions in groups), I think at least four things are needed:

  • Y = unaryop(x): initialize a reduction with the first argument in a group. The output type Y may be different from the input type X.
  • Y = binaryop(Y, X): perform a step in the reduction within a group. The first argument is the previous result of the reduction within the group, and the second argument is the current value in the matrix.
  • Y = combine(Y, Y): a binaryop or monoid to combine results from groups.
  • is_commutative: whether the reduction commutes or not. If it doesn't commutes, then the combine steps must occur in order. Otherwise, groups may be combined in any order. This could probably be inferred/implied by whether the combine operator is a binaryop or monoid.

For example:

group1 = binop(binop(binop(unary(x1), x2), x3), x4)
group2 = binop(binop(binop(unary(x5), x6), x7), x8)
final = combine(group1, group2)

A further enhancement (that I'm not pushing for, but think is important to be aware of) would be to allow the type Y to be malloced, which would first happen in the unaryop. If it were malloced, then one would also need a finalize unaryop to perform the free and convert to a normal (fixed-size) GraphBLAS type. For completeness-sake, if malloc were used, then one could also provide a function to serialize and deserialize the object to send over the wire if necessary. Although complicated, this functionality would also be extremely powerful.

If order is important (i.e., for reductions that don't commute), then there is a natural order to do things: along a column, along a row, or along a vector. For matrix reduce to scalar, we'd need to choose a convention to linearize the elements (for example, like it were C-contiguous or row-wise), and setting the transpose descriptor on the input would choose the other option for linearizing a matrix.

FIRST and SECOND make perfect sense in view of this style of reduction. For these, the unaryop would simply be the identity, and the combine operator would be the same as the reducing binaryop (FIRST or SECOND).

If any of this functionality were available, I would surely use it. I don't need it though, and I think having this conversation is more important atm.

@eriknw
Copy link
Contributor Author

eriknw commented Apr 9, 2021

I like that your select variant has a clean API that is easy to understand. Instead of operating on the absolute index I or J of an element, it operates on, say, ith and jth, which is the count of number of elements previously seen along this row, column, or vector. It's perhaps a little awkward if both ith and jth are needed. Otherwise, the API can match your select or the new select. I don't have a feel for how common or useful such an operation would be.

Regarding reduction on user-defined binaryops again, there are many more operations that could be done with my proposal above when combined with user-defined types:

  • Get up to the first k elements.
    • it may be difficult to stop the iteration once k elements are seen.
  • Get a random k elements.
  • Determine if rows are monotonically increasing or decreasing.
  • Calculate approximate percentiles.
  • Calculate approximate count distinct.
  • Compute a checksum.
  • Compute the sum and count at the same time
    • If there were a finalize step at the end, this could then compute the mean.
  • Similarly, if there were a finalize, one could first compute the count, sum(x), and sum(x**2), and then calculate the standard deviation from these all in a single pass.
    • We can take this further and use it to compute a linear regression!
  • Lots more: power mean, geometric mean, central mean (two passes), log-sum-exp, skewness (two passes), etc.

It is common in modern data analytics system to have user-defined aggregation functions in a form similar to initialize-update-merge-evaluate. Variations exist, and I should point to literature or documentation...

@DrTimothyAldenDavis
Copy link
Owner

DrTimothyAldenDavis commented Apr 12, 2021 via email

@eriknw
Copy link
Contributor Author

eriknw commented Apr 14, 2021

Thanks for taking the time to reply. I think I can do a better job explaining what I consider most important in this issue, so please indulge me as I summarize below. This is a long comment, so please take as much time as you like.

Summary

  1. Reductions with BinaryOps (esp. user-defined ones) as currently in the spec have serious deficiencies.
  2. SuiteSparse:GraphBLAS does a reasonable thing and only supports reductions with Monoids or builtin BinaryOps that have associated Monoids.
  3. Even though I know (2), I was surprised that reducing a Matrix to a Vector with FIRST or SECOND didn't work (we can call this a user error--my bad).
  4. Stronger than (3), I believe reductions with user-defined BinaryOps can be extremely powerful and is likely to be useful for some graph algorithms (and certainly useful for a general sparse linear algebra library).
  5. Hence, I think it's worth the time to try to come up with a better approach to perform user-defined reductions not limited to Monoids.
  6. I proposed an approach that I think fits within GraphBLAS and that I think addresses many of the existing shortcomings.
  7. I would like feedback. Is this approach feasible? Implementable? Mathematically sound? Complete? Too much? Can it be improved? Should I spend the time to link to supporting references?
  8. Is there any interest in me creating new function signatures or PoC implementations of my proposal?
  9. I will be watching for examples where this may come in handy!

The Proposal

The goal:

  • support reductions with BinaryOps (including user-defined ones)
  • that can express many operations (more than can be expressed with matrix multiply with different semirings)
  • and can be parallelized efficiently.

Reduction methods include Matrix to Vector, Matrix to Scalar, Vector to Scalar, and building Matrices and Vectors with dup_op (this last one may require separate attention). Because order may matter during the reduction, I will call the group of elements that is being reduced over a list (if order didn't matter, I would call this a multiset). Since we require the reduction to be parallelizable, we allow the list of elements to be partitioned into sub-lists that can be operated on independently.

Instead of a single BinaryOp, we require three operators:

  • Initialize with UnaryOp: applied to the first item in a sub-list.
    • output domain may be anything (i.e., it is not limited to be coercible to or from the input domain)
  • Update with BinaryOp: operates on the previous result and the next item in the sub-list.
    • output domain should match the output domain of UnaryOp.
  • Merge with BinaryOp or Monoid: combines results from sub-lists.
    • If BinaryOp: sub-list results must be combined in order (i.e., doesn't commute), and updates with BinaryOp must operate in order.
      • Input and output domains should match.
    • If Monoid: sub-list results may be combined in any order, and updates with BinaryOp may operate in any order.
    • Associativity is assumed for BinaryOp and Monoid; e.g., tree reductions are allowed.
    • Instead of having variants for both options--BinaryOp and Monoid--another option would be to only support BinaryOp with a boolean flag whether the operation commutes or not (i.e., needs to operate in order or not).

If there are k elements in a list being reduced, the UnaryOp may be called between 1 to k times (i.e., once per sub-list). The expectation is that the list will be partitioned into enough sub-lists to take advantage of parallelism. The UnaryOp will probably be the identity fairly often, but it may also be a relatively expensive operation, so we don't want to always call it on every element.

It's worth noting that there is no notion of identity of the missing elements or of the BinaryOp. Such an identity is strictly for Monoids and doesn't apply here. If the list being reduced is empty, then the result is empty. If it only has one element, then the result is the result of the UnaryOp.

I realize this would be a lot of extra code to support. I'm not asking you to implement it (a JIT would be much more amazing!), but I do think it's worth having the conversation and keeping an eye open for use cases.

Misc.

Yeah, I understand the behavior and pitfalls of the variant of select you brought up. I'll let you know if I find any more use cases for it.

I wasn't asking for mode or median. I was saying that if I were able to perform reductions with user-defined BinaryOps as per my proposal, then I could use it to efficiently compute approximate percentiles. And lots of other things in a single reduction operation.

Going back a few comments, you described a situation that comes up in CC. If I understand correctly, I recognize this operation as a "scatter" with aggregation. I would love it if an astute reader of this issue would give this more thought!

I'm sorry for repeating myself. There's been a lot of good/interesting information in this thread, and I wanted to refocus :)

@DrTimothyAldenDavis DrTimothyAldenDavis added the enhancement New feature or request label Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants