Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excess allocation in dense apply #251

Open
rayegun opened this issue Nov 17, 2023 · 2 comments
Open

Excess allocation in dense apply #251

rayegun opened this issue Nov 17, 2023 · 2 comments
Labels
performance enhancement GraphBLAS is working but could be faster.

Comments

@rayegun
Copy link
Contributor

rayegun commented Nov 17, 2023

Dense apply seems to allocate when it shouldn't. GrB_apply(C, NULL, NULL, GrB_SIN_FP64, A, NULL) shouldn't allocate if C and A are both non-iso full-dense. I believe T is being written to and copied into C when it should be skipped and written directly into C. (I get this excess allocation in the Julia wrapper, so this might require verification in C.)

@DrTimothyAldenDavis
Copy link
Owner

Currently, I only operate on a matrix in place in GrB_apply if it C is aliased to A. The "(in-place-op)" would appear in the burble:

GraphBLAS/Source/GB_apply.c

Lines 275 to 281 in 337b580

else if (M == NULL && accum == NULL && (C == A) && C->type == T_type
&& GB_nnz (C) > 0)
{
GBURBLE ("(in-place-op) ") ;
// C = op (C), operating on the values in-place, with no typecasting
// of the output of the operator with the matrix C.
// No work to do if the op is identity.

I don't have a kernel that exploits the case for C = f(A) when both C and A are dense but C is not A. What you would see instead in the burble would be "(shallow-op)", which means I compute T = f(A) where some parts of T are shallow (the sparsity pattern). But a dense matrix has none of that so it's no help. So I allocate and compute T->x = f (A->x). I then transplant T->x into the output matrix C.

It should be too hard to fix this though. My core method, GB_apply_op, fills an array Cx with Cx = f(A->x) where A is the input matrix. If C is already dense then I can call GB_apply_op with (C->x, ... A, ...), from GB_apply.c. So this should be easy to fix.

@DrTimothyAldenDavis
Copy link
Owner

I need to wait until the 2.1 C API is done, however. It's too hard managing changes to both v8.2.1 (in the dev2 branch of SuiteSparse, where it's undergoing changes to the cmake build system), and v8.2.1 in the GraphBLAS reop, and v9.0.0 (which I have two branches for already: one in my GrB_get_set_proposal branch, and one in dev2 where I'm working on CUDA kernels).

That's 4 active branches already. The delay in the v2.1 C API is seriously degrading my GraphBLAS efforts. I released my versions based on the plan that the v2.1 C API would become official in September.

@DrTimothyAldenDavis DrTimothyAldenDavis added the performance enhancement GraphBLAS is working but could be faster. label Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance enhancement GraphBLAS is working but could be faster.
Projects
None yet
Development

No branches or pull requests

2 participants