© Alfredo A. Correa, 2018-2024
(documentation in progress)
The BLAS Adaptor provides an interface for the BLAS and BLAS-like linear algebra libraries (cuBLAS and hipBLAS). Although BLAS is not strictly a multidimensional array library, as it only works on 1D (vectors) and 2D arrays (matrices), it is an extremely popular numeric library.
The adaptor library has a two-fold purpose:
First, it allows the abstracting of the stride information and the conjugation/transposition in the BLAS calls, simplifying the interface enormously and making it consistent with their GPU counterparts, such as cuBLAS. "View manipulators" automatically handle cases related to conjugation, transposition, and real and imaginary parts.
Second, it provides a functional interface to the BLAS calls, which is easier to use than the C-style interface and plays well with STL algorithms that assume a functional programming style.
This functions in this adaptor library strictly uses BLAS operations, the data is not processed outside the BLAS calls.
[[TOC]]
In the functional interface, most functions return special "views" rather than direct results.
The results are computed when converted to value types or assigned to other views.
Value types can be 2D (multi::array<T, 2>
), 1D (multi::array<T, 1>
) or 0D (multi::array<T, 0>
or scalars T
).
Views can be assigned to subarrays (e.g. multi::subarray<T, 2>
or multi::subarray<T, 1>
).
In this interface, functions like gemv
generates views that can be assigned to values or from which constructors can be called in the library without unnecessary copies or allocations.
Expressions such as multi::blas::gemv(alpha, A, x)
produce a range object that can be used in larger expressions, such as construction and assignemnt.
- Construction:
multi::array<double, 1> const y = multi::blas::gemv(alpha, A, x); // same effect as multi::array<double, 1> y({A.size()}); multi::blas::gemv(alpha, A, x, 0.0, y);
Among other advantages, the functional style gives the possibility of creating constant variables for results.
Like other parts of the library, when using auto
and the unary operator+
help generating concrete values.
auto const y = +multi::blas::gemv(alpha, A, x); // y variable is deduced as multi::array<double, 1> and latter constructed from the gemv operation
- Assignment:
multi::array<double, 1> y; // empty vector
y = multi::blas::gemv(alpha, A, x); // same as multi::blas::gemv(alpha, A, x, 0.0, y), y is resized if necessary
- Assignment (to subarray):
multi::array<double, 2> Y; // empty vector
Y[0] = multi::blas::gemv(alpha, A, x); // same as multi::blas::gemv(alpha, A, x, 0.0, Y[0]), Y[0] can't be resized because it is a subarray must have the correct size,
- Compound-assign:
multi::array<double, 1> y(A.size());
y += multi::blas::gemv(alpha, A, x); // same as multi::blas::gemv(alpha, A, x, 1.0, y)
This interface plays well with the style of the STL algorithms. For example, suppose we have a container of vectors, all of which need to be multiplied by a given array.
std::list<multi::array<double, 1> > vs = ...; // using std::list to avoid confusion
std::list<multi::array<double, 1> > ws = ...;
multi::array<double, 2> const A = ...;
std::transform(vs.begin(), vs.end(), ws.begin(), [&A](auto const& v) {return multi::blas::gemv(1.0, A, v);})
Although it shares some of the goals, this interface is independent of the C++26 Linear Algebra Proposal. The main difference with other BLAS adaptors is that this library aims to offer a functional interface.
Just as with BLAS, the library supports element of real (double
and float
) and complex (std::complex<double>
and std::complex<float>
) types.
Other types that are semantically equivalent and binary-compatible (such as thrust::complex
) also work directly.
These functions produce views (not copies) related to conjugation, and transposition. These typically replace the 'T', 'C' and 'N' characted arguments of the BLAS calls in the C or Fortran interfaces.
The conjugation operation is a unary operation that conjugates each element of the array, producing a view of the array that preserves the shape of the original array.
The transposition operation is a unary operation that transposes an array, producing a view of the array that transposed the elements (and the shape) of the original array.
This view returns the same array, implies no operations on the array; it is provided for completeness.
using complex = std::complex<double>;
complex const I{0.0, 1.0};
multi::array<complex, 2> B = {
{1.0 - 3.0*I, 6.0 + 2.0*I},
{8.0 + 2.0*I, 2.0 + 4.0*I},
{2.0 - 1.0*I, 1.0 + 1.0*I}
};
namespace blas = multi::blas;
multi::array<complex, 2> conjB = blas::C(B);
assert( blas::C(B)[1][2] == std::conj(B[1][2]) );
assert( blas::T(B)[1][2] == B[2][1] );
assert( blas::N(B)[1][2] == N[1][2] );
assert( blas::H(B)[1][2] == std::conj(B[2][1]) );
Note that views do not play well with self-assignment.
multi::array<double, 2> A({10, 10});
A = multi::blas::T(A); // undefined behavior, this is not the right way to transpose a matrix in-place
The main purpose of these functions is to manipulate arguments to BLAS interface functions.
(https://godbolt.org/z/Kjfa48d4P)
The functions in this level operate on one-dimensional arrays (vectors).
Here, we use multi::array<T, 1>
as representative of a vector, but a one-dimensional subarray, such as a row or a column of a 2D array, can also be used as a vector.
Copies the values of a vector to another.
This is similar to assigment, except that it used the underlying BLAS function (including parallelization if offered by the BLAS implementation) and has marginal utility.
However, this case serves as illustration of the functional interface, used in the rest of the library:
multi::blas::copy(v)
doesn't copy or allocates anything, it creates a "view" that can serve different purposes, illustrated in 3 different cases:
- The view can be used to construct a new vector (needing allocation),
Once again
operator+
helps with automatic type deduction.
multi::array<double, 1> const v = {1.0, 2.0, 3.0};
multi::array<double, 1> const v2 = multi::blas::copy(v); // case 1: allocates space for 3 elements and copies (using BLAS)
// auto const v2 = +v_copy; // same effect as line above
(Note that auto const v2 = v_copy;
would not create a value or perform a copy, it will simply hold a variable with the "copy range".
This is not recommended as it can be confused and create a dangling range.)
- to assign to an existing vector (and resize it if is needed and possible)
multi::array<double, 1> v3; // mutable vector
v3 = multi::blas::copy(v); // case 2: resizes v3 (allocates space for 3 elements) and copies
multi::array<double, 1> v4({3}, 0.0); // allocates space for 3 elements
v4 = multi::blas::copy(v); // case 2: assigns copies (no allocation necessary)
- to assign to a 1D subarray vector that is not resizable. The importance of this case is that it guarantees that no allocations are performed.
multi::array<double, 1> cA({3}, 0.0); // allocates space for 3 elements
v4({0, 2}) = multi::blas::copy(v); // case 3: LHS is not resizable, assigns copies to subrange (resizing is not possible or necessary, no allocations)
Swaps the values of two vectors. Vector extensions must match.
Note that the utility of multi::blas::copy
and multi::blas::swap
is redundant with native features of the library (such as plain assignment, copy construction and swap), the only difference is that these operations will be performed using the BLAS operations elementwise one dimensional arrays (vectors) on real or complex data only.
These function do not work on 2D arrays (matrices) as the BLAS functions do not support this.
Copying or swapping 2D arrays with arbitrary layouts using BLAS could be done row-by-row: std::transform(A2.begin(), A2.end(), B2.begin(), [](auto const& arow) {return multi::blas::copy(arow);})
(or, for efficiency column-by-column depending on the layout).
Unary operation that computes the norm-2 (Euclidean norm) of a vector. The result is convertible to a real scalar
multi::array<double, 1> const v = {1.0, 2.0, 3.0};
double const n = multi::blas::nrm2(v);
// auto const n = +multi::blas::nrm2(v);
multi::array<std::complex<double>, 2> const v = { {1.0, 2.0}, {2.0, 3.0}, {3.0, 4.0} };
double const n = multi::blas::nrm2(v[0]); // acting on a row view
// auto const n = +multi::blas::nrm2(v[0]);
Returns the sum of the absolute values of the elements of a vector (norm-1).
Index of the element with the largest absolute value (zero-based)
auto multi::blas::dot(
complex/real vector, complex/real vector) ->
convertible to complex/real scalar
Returns the dot product of two vectors with complex or real elements (T
).
multi::array<double, 1> const v = {1.0, 2.0, 3.0};
multi::array<double, 1> const w = {4.0, 5.0, 6.0};
double const d = multi::blas::dot(v, w);
// auto const d = +multi::blas::dot(v, w);
Conjugation can be applied to either vector argument,
using multi::blas::dot;
using multi::blas::C;
auto const d1 = +dot( v , w );
auto const d2 = +dot(C(v), w );
auto const d3 = +dot( v , C(w));
auto const d4 = +dot(C(v), C(w));
It is important to note that the left hand side of the assignment can be a scalar that is part of a heap allocation. In this case, the result is going to directly put at this location.
multi::array<double, 1> z = {0.0, 0.0, 0.0};
z[1] = multi::blas::dot(v, w);
This feature regarding scalar results is essential when operating on GPU memory since the whole operation can be performed on the device.
In CPUs BLAS
dot
functions has known bugs in different implementations. For example BLAS Apple Accelerate has bug insdot
while BLAS 32bit has a bug incdot
. In addition, some implementations of BLAS functions return the complex result in the stack and other write into a pointer. For this reason and for consistency, the library uses BLAS'sgemv
functions in place ofdot
in these cases. This should not affect the results.
Scales a vector.
Vector addition.
multi::array<double, 1> const x = ...;
multi::array<double, 1> y = ...;
y += blas::axpy(2.0, x); // same as blas:::axpy(+2.0, x, y)
y -= blas::axpy(2.0, x); // same as blas:::axpy(-2.0, x, y)
These functions operate on vectors and arrays.
Again, we use multi::array<T, 1>
as representative of a vector, but a one-dimensional subarray, such as a row or a column of a 2D array, can also be used as a vector.
multi::array<T, 2>
as representative of a matrices, but a two-dimensional subarray or larger of higher dimensional arrays can be used as long as one of the two interternal strides in 1.
This is limitation of BLAS, that only acts on certain layouts of 2D arrays.
auto multi::blas::gemv(
complex/real scalar ,
complex/real matrix) ->
convertible to complex/real vector
multi::array<double, 2> const A({4, 3});
multi::array<double, 1> const x = {1.0, 2.0, 3.0};
multi::array<double, 1> const x = {1.0, 2.0, 3.0, 4.0};
y = blas::gemv(5.0, A, x); // y <- 5.0 A * x
The gemv expression can be used for addition and subtraction,
y += blas::gemv(1.0, A, x); // y <- + A * x + y
y -= blas::gemv(1.0, A, x); // y <- - A * x + y
#include<multi/array.hpp>
#include<multi/adaptors/blas.hpp>
namespace multi = boost::multi;
int main() {
multi::array<double, 2> const A({2, 2});
multi::array<double, 2> const B({2, 2});
multi::array<double, 2> const C1 = multi::blas::gemm(1.0, A, B);
auto const C2 = + multi::blas::gemm(1.0, A, B);
}
https://godbolt.org/z/d1E7donWM
(need linking to BLAS to work, e.g. -lblas
or -lopenblas
or -lmkl
)
All these operations are now supported for CPU and GPU memory, real and complex.
scalars: aa
(bb
(
vectors: x
, y
matrices: A
, B
, C
vector operations: C
(*
) conjugation (element-wise)
matrix operations: J
(*
) conjugation (element-wise) (use C
for vectors), T
transpose, H
transpose conjugate (also C
, discouraged), U
/L
upper or lower triangular part (logical zeroing other side)
BLAS | mutable form | effect | operator form [³] | functional form | thrust/STL [¹] |
---|---|---|---|---|---|
SWAP | blas::swap(x, y) |
(x^y) |
swap_ranges(begin(x), end(x), begin(y)) |
||
COPY | blas::copy(x, y) |
y << x |
y = blas::copy(x) |
copy(begin(x), end(x), begin(y)) |
|
ASUM | blas::asum(x, res) |
x==0 /x!=0 isinf(x) is an(x) [²] |
res = blas::asum(x) |
transform_reduce(begin(x), end(x), 0.0, plus<>{}, [](auto const& e){return abs(e.real()) + abs(e.imag());}) |
|
NRM2 | blas::nrm2(x, res) |
abs(x) |
res = blas::nrm2(x); |
sqrt(trasnform_reduce(begin(x), end(x), 0.0, plus<>{}, [](auto const& e){return norm(e);})); |
|
SCAL | blas::scal(aa, x); |
x*=aa; |
for_each(begin(x), end(x), [aa](auto& e){return e*=aa;}) |
||
AXPY | blas::axpy(aa, x, y) |
y+=x y-=x y+=aa*x y-=aa*x
|
transform(x.begin(), x.end(), y.begin(), y.begin(), [aa](auto ex, auto ey) {return aa*ex + ey;} |
||
DOT | blas::dot(x, y, res) |
res = (x, y); |
res = blas::dot(x, y) |
inner_product(begin(x), end(x), begin(y), T{}); |
|
blas::dot(blas::C(x), y, res) |
res = (*x, y); |
res = blas::dot(blas::C(x), y) |
inner_product(begin(x), end(x), begin(y), T{}, plus<>{}, [](T const& t1, T const& t2) {return conj(t1)*t2;}); |
||
blas::dot(x, blas::C(y), res) |
res = (x, *y); |
res = blas::dot(x, blas::C(y)); |
inner_product(x.begin(), x.end(), y.begin(), T{}, plus<>{}, [](T const& t1, T const& t2) {return t1*conj(t2);}); |
||
blas::dot(blas::C(x), blas::C(y), res) |
|
auto res = conj(inner_product(x.begin(), x.end(), y.begin(), T{}); |
|||
GEMV | blas::gemv(aa, A, x, bb, y) |
y=A%x y=aa*A%x y+=A%x y+=aa*A%x [¤] |
y=blas::gemv(aa, A, x) y+=blas::gemv(aa, A, x)
|
transform(begin(A), end(A), begin(y), [&x, aa] (auto const& Ac) {return aa*blas::dot(Ac, x);}) |
|
blas::gemv(aa, blas::T(A), x, bb, y) |
y= ~A % x y=aa*(~A)%x y+=(~A)%x y+=aa*(~A)%x
|
y=blas::gemv(aa, blas::T(A), x) y+=blas::gemv(aa, blas::T(A), x)
|
transform(begin(transposed(A)), end(transposed(A)), begin(y), [&x, aa] (auto const& Ac) {return aa*blas::dot(Ac, x);}) |
||
blas::gemv(aa, blas::J(A), x, bb, y) |
y= *A % x y=aa*(*A)%x y+=(*A)%x y+=aa*(*A)%x
|
y=blas::gemv(aa, blas::J(A), x) y+=blas::gemv(aa, blas::J(A), x)
|
transform(begin(A), end(A), begin(y), [&x, aa] (auto const& Ac) {return aa*blas::dot(*Ac, x);}) |
||
blas::gemv(aa, blas::H(A), x, bb, y) |
|
transform(begin(transposed(A)), end(transposed(A)), begin(y), [&x, aa] (auto const& Ac) {return aa*blas::dot(*Ac, x);}) |
|||
GEMM | blas::gemm(aa, A, B, bb, C) |
C = aa*(A*B) |
C = blas::gemm(aa, A, B) C += blas::gemm(aa, A, B)
|
transform(begin(A), end(A), begin(C), begin(C), [&B, aa, bb] (auto const& Ar, auto&& Cr) {return blas::gemv(aa, blas::T(B), Ar, bb, move(Cr));}) |
|
blas::gemm(aa, A, blas::T(B), bb, C) |
C = aa*(A* ~B) |
C = blas::gemm(aa, A, blas::T(B)) C += blas::gemm(aa, A, blas::T(B))
|
transform(begin(A), end(A), begin(C), begin(C), [&B, aa, bb] (auto const& Ar, auto&& Cr) {return blas::gemv(aa, B, Ar, bb, move(Cr));}) |
||
blas::gemm(aa, blas::T(A), B, bb, C) |
C =~A * B C = aa*(~A * B) C+=~A * B C+=aa*(~A * B)
|
C = blas::gemm(aa, blas::T(A), B, bb, C) (or += ) |
transform(begin(transposed(A)), end(transposed(A)), begin(C), begin(C), [&B, aa, bb] (auto const& Ar, auto&& Cr) {return blas::gemv(aa, blas::T(B), Ar, bb, std::move(Cr));}) |
||
blas::gemm(aa, blas::T(A), blas::T(B), bb, C) |
C =~A * ~B C = aa*(~A * ~B) C+=~A * ~B C+=aa*(~A * ~B)
|
C = blas::gemm(aa, blas::T(A), blas::T(B), bb, C) (or += ) |
transform(begin(transposed(A)), end(transposed(A)), begin(C), begin(C), [&B, aa, bb] (auto const& Ar, auto&& Cr) {return blas::gemv(aa, B, Ar, bb, std::move(Cr));}) |
||
blas::gemm(aa, A, blas::J(B), bb, C) blas::gemm(..., blas::T(B), blas::H(A), ..., HC) and conjtranspose result) |
|
transform(begin(A), end(A), begin(C), begin(C), [BT=transposed(B)](auto const& Ar, auto&& Cr) {transform(begin(BT), end(BT), begin(Cr), begin(Cr), [&Ar](auto const& Bc, auto&& c) {return aa*blas::dot(Ar, blas::C(Bc)) + bb*c;}); return std::move(Cr);}); |
|||
blas::gemm(aa, blas::J(A), B, bb, C) |
|
transform(begin(A), end(A), begin(C), begin(C), [BT=transposed(B)](auto const& Ar, auto&& Cr) {transform(begin(BT), end(BT), begin(Cr), begin(Cr), [&Ar](auto const& Bc, auto&& c) {return aa*blas::dot(blas::C(Ar), Bc) + bb*c;}); return std::move(Cr);}); |
|||
blas::gemm(aa, blas::J(A), blas::J(B), bb, C) |
|
transform(begin(A), end(A), begin(C), begin(C), [BT=transposed(B)](auto const& Ar, auto&& Cr) {transform(begin(BT), end(BT), begin(Cr), begin(Cr), [&Ar](auto const& Bc, auto&& c) {return aa*blas::dot(blas::C(Ar), blas::C(Bc)) + bb*c;}); return std::move(Cr);}); |
|||
blas::gemm(aa, A, blas::H(B), bb, C) |
C = aa*(A* ~*B) (or += ) |
C = blas::gemm(aa, A, blas::H(B)) C += blas::gemm(aa, A, blas::H(B))
|
transform(begin(A), end(A), begin(CC), begin(CC), [&](auto const& Ar, auto&& Cr){return blas::gemv(aa, blas::J(B), Ar, bb, move(Cr));}) |
||
blas::gemm(aa, blas::H(A), B, bb, C) |
CC=~*A *B |
C=blas::gemm(aa, blas::H(A), B) |
transform(begin(transposed(A)), end(transposed(A)), begin(CC), begin(CC), [BT=transposed(B)](auto const& Ac, auto&& Cr) {transform(begin(BT), end(BT), begin(Cr), begin(Cr), [&Ac](auto const& Bc, auto&& c){return aa*blas::dot(blas::C(Ac), Bc) + bb*c;}); return move(Cr);}) |
||
blas::gemm(aa, blas::H(A), blas::H(B), bb, C) |
CC=~*A * ~*B |
C=blas::gemm(aa, blas::H(A), blas::H(B)) |
transform(begin(transposed(A)), end(transposed(A)), begin(CC), begin(CC), [&B](auto const& Ac, auto&& Cr) {transform(begin(B), end(B), begin(Cr), begin(Cr), [&Ac](auto const& Bc, auto&& c) {return conj(std::transform_reduce(begin(Ac), end(Ac), begin(Bc), 0.0*c, std::plus<>{}, [](auto const& a, auto const& b) {return a*b;}));}); return move(Cr);}) |
||
blas::gemm(aa, blas::T(A), blas::H(B), bb, C) |
CC=~A * ~*B |
C=blas::gemm(aa, blas::T(A), blas::H(B)) |
transform(begin(transposed(A)), end(transposed(A)), begin(CC), begin(CC), [&B](auto const& Ac, auto&& Cr) {transform(begin(B), end(B), begin(Cr), begin(Cr), [&Ac](auto const& Bc, auto&& c) {return std::transform_reduce(begin(Ac), end(Ac), begin(Bc), 0.0*c, std::plus<>{}, [](auto const& a, auto const& b) {return a*conj(b);});}); return move(Cr);}) |
||
blas::gemm(aa, blas::H(A), blas::T(B), bb, C) |
|
transform(begin(transposed(A)), end(transposed(A)), begin(CC), begin(CC), [&B](auto const& Ac, auto&& Cr) {transform(begin(B), end(B), begin(Cr), begin(Cr), [&Ac](auto const& Bc, auto&& c) {return std::transform_reduce(begin(Ac), end(Ac), begin(Bc), 0.0*c, std::plus<>{}, [](auto const& a, auto const& b) {return conj(a)*b;});}); return move(Cr);}) |
|||
blas::gemm(aa, blas::J(A), blas::H(B), bb, C) |
|
||||
blas::gemm(aa, blas::H(A), blas::J(B), bb, C) |
|
||||
TRSM | blas::trsm(blas::side::right, aa, blas::U(A), B) |
B /= U(A) |
TODO | ||
blas::trsm(blas::side::right, aa, blas::L(A), B) |
B /= L(A) |
TODO | |||
blas::trsm(blas::side::left, aa, blas::U(A), B) |
B |= U(A) |
TODO | |||
blas::trsm(blas::side::left, aa, blas::L(A), B) |
B |= L(A) |
TODO | |||
blas::trsm(blas::side::right, aa, blas::U(A), blas::J(B)) |
|
TODO | |||
blas::trsm(blas::side::right, aa, blas::L(A), blas::J(B)) |
|
TODO | |||
blas::trsm(blas::side::right, aa, blas::U(A), blas::H(B)) |
|
TODO | |||
blas::trsm(blas::side::right, aa, blas::L(A), blas::H(B)) |
|
TODO |
[¹]: for reference, not optimal.
[²]: asum
is interpreted as a mechanism to detect null vectors or vectors containing NaN or infinities.
[³]: needs explicit invocation using namespace multi::operators
namespace or of specific symbols using multi::operator*
/operator/=
/etc.
[¤]: y *=bb +=aa*A%x
(gemv(aa, A, x, bb, y)
) would also be possible.