Excess allocation in dense apply #251

rayegun · 2023-11-17T23:32:48Z

Dense apply seems to allocate when it shouldn't. GrB_apply(C, NULL, NULL, GrB_SIN_FP64, A, NULL) shouldn't allocate if C and A are both non-iso full-dense. I believe T is being written to and copied into C when it should be skipped and written directly into C. (I get this excess allocation in the Julia wrapper, so this might require verification in C.)

The text was updated successfully, but these errors were encountered:

DrTimothyAldenDavis · 2023-11-26T03:10:00Z

Currently, I only operate on a matrix in place in GrB_apply if it C is aliased to A. The "(in-place-op)" would appear in the burble:

GraphBLAS/Source/GB_apply.c

Lines 275 to 281 in 337b580

    
           else if (M == NULL && accum == NULL && (C == A) && C->type == T_type 
        
               && GB_nnz (C) > 0) 
        
           { 
        
               GBURBLE ("(in-place-op) ") ; 
        
               // C = op (C), operating on the values in-place, with no typecasting 
        
               // of the output of the operator with the matrix C. 
        
               // No work to do if the op is identity.

I don't have a kernel that exploits the case for C = f(A) when both C and A are dense but C is not A. What you would see instead in the burble would be "(shallow-op)", which means I compute T = f(A) where some parts of T are shallow (the sparsity pattern). But a dense matrix has none of that so it's no help. So I allocate and compute T->x = f (A->x). I then transplant T->x into the output matrix C.

It should be too hard to fix this though. My core method, GB_apply_op, fills an array Cx with Cx = f(A->x) where A is the input matrix. If C is already dense then I can call GB_apply_op with (C->x, ... A, ...), from GB_apply.c. So this should be easy to fix.

DrTimothyAldenDavis · 2023-11-26T03:27:53Z

I need to wait until the 2.1 C API is done, however. It's too hard managing changes to both v8.2.1 (in the dev2 branch of SuiteSparse, where it's undergoing changes to the cmake build system), and v8.2.1 in the GraphBLAS reop, and v9.0.0 (which I have two branches for already: one in my GrB_get_set_proposal branch, and one in dev2 where I'm working on CUDA kernels).

That's 4 active branches already. The delay in the v2.1 C API is seriously degrading my GraphBLAS efforts. I released my versions based on the plan that the v2.1 C API would become official in September.

DrTimothyAldenDavis added the performance enhancement GraphBLAS is working but could be faster. label Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excess allocation in dense apply #251

Excess allocation in dense apply #251

rayegun commented Nov 17, 2023

DrTimothyAldenDavis commented Nov 26, 2023

DrTimothyAldenDavis commented Nov 26, 2023

Excess allocation in dense apply #251

Excess allocation in dense apply #251

Comments

rayegun commented Nov 17, 2023

DrTimothyAldenDavis commented Nov 26, 2023

DrTimothyAldenDavis commented Nov 26, 2023