Add GSVD with QR factorizations, 2-by-1 CS decomposition #406

christoph-conrads · 2020-04-22T15:21:24Z

This pull request adds a generalized singular value decomposition computed by means of a QR decomposition with column pivoting and the 2-by-1 CS decomposition to LAPACK. The PR requires #405.

Given a pair of matrices A, B with appropriate dimensions, the GSVD computes a decomposition

A = U1 D1 R Q^*,
B = U2 D2 R Q^*.

I would like to have feedback on the computation of the matrix R. Currently it is always computed but this is expensive because of an additional matrix-matrix multiplication followed by an RQ decomposition. Should I make this optional?

The PR is based on #63 but provides all implementations (single-precision real, double-precision real, single-precision complex, and double-precision complex) with tests removed because of the C++ dependency. The tests can be found in fb5dfb3.

christoph-conrads · 2020-04-22T15:29:25Z

Here is the motivation for the new GSVD solver from #63:

The generalized singular value decomposition can be computed directly with xGGSVD, xGGSVD3 (see LAWN 46) or by computing QR factorizations and a 2-by-1 CS decomposition. In the following, let us call the former approach direct GSVD and the latter QR+CSD. In my master's thesis I used both approaches and it turned out that the QR+CSD was always signifcantly faster than the direct GSVD for pairs of square matrices with up to 3600 columns (see Section 3.5 of my master's thesis).

I am convinced that the speed difference is persistent across platforms because the iterative phase of the solver is much faster for QR+CSD. The direct GSVD reduces both matrices A, B to upper triangular form before the iterative solver xTGSJA is called whereas the iterative phase of QR+CSD operates on a 2-by-1 isometric matrix Q^T = [Q1^T, Q2^T]^T, where Q1, Q2 are bidiagonal (A isometric <=> A^T A = I). Since the matrix Q is always well-conditioned, the CSD solver xBBCSD needs only few iterations to compute a solution. For comparison, the maximum number of iterations per singular value/eigenvalue is

6 for the CSD (see variable MAXITR in xbbcsd.f, line 356),

30 for the symmetric eigenvalue problem with Francis QR (MAXIT, dsteqr.f:154), and

40 for the GSVD (MAXIT, dtgsja.f:402).

codecov · 2020-04-22T15:35:59Z

Codecov Report

Merging #406 (2a8c4dd) into master (4b3c7c2) will decrease coverage by 0.43%.
The diff coverage is 0.29%.

❗ Current head 2a8c4dd differs from pull request most recent head 8a0bef1. Consider uploading reports for the commit 8a0bef1 to get more accurate results

@@            Coverage Diff             @@
##           master     #406      +/-   ##
==========================================
- Coverage   82.37%   81.93%   -0.44%     
==========================================
  Files        1894     1904      +10     
  Lines      190681   191698    +1017     
==========================================
+ Hits       157067   157068       +1     
- Misses      33614    34630    +1016

Impacted Files	Coverage Δ
SRC/cggqrcs.f	`0.00% <0.00%> (ø)`
SRC/clasrtr.f	`0.00% <0.00%> (ø)`
SRC/dggqrcs.f	`0.00% <0.00%> (ø)`
SRC/dlasrti.f	`0.00% <0.00%> (ø)`
SRC/dlasrtr.f	`0.00% <0.00%> (ø)`
SRC/sggqrcs.f	`0.00% <0.00%> (ø)`
SRC/slasrti.f	`0.00% <0.00%> (ø)`
SRC/slasrtr.f	`0.00% <0.00%> (ø)`
SRC/zggqrcs.f	`0.00% <0.00%> (ø)`
SRC/zlasrtr.f	`0.00% <0.00%> (ø)`
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b3c7c2...8a0bef1. Read the comment docs.

christoph-conrads · 2020-04-23T14:26:19Z

I would like to have feedback on the computation of the matrix R. Currently it is always computed but this is expensive because of an additional matrix-matrix multiplication followed by an RQ decomposition. Should I make this optional?

Yes, because then it is possible to compute only the generalized singular values for the cost of one QR decomposition with column pivoting, unitary matrix assembly, and 2-by-1 CS value computation. AFAIK xGGSVD3() must compute the upper-triangular factor R by design and would be considerably slower in this scenario.

TODO:

Re-read LAWN 46 and understand xGGSVD3()algorithm
Benchmark this idea before modifying the code

christoph-conrads · 2020-05-02T19:48:06Z

TODO:

Re-read LAWN 46 and understand xGGSVD3()algorithm

The whole theory behind xGGSVD3() is based on simultaneous transformations of a pair of upper triangular matrices.

christoph-conrads · 2020-05-02T21:10:13Z

The latest xGGQRCS code contains row scaling ({s,d}LASRTR) in an attempt to handle matrix pencils where the input matrices A, B differ significantly in norm. This does not even yield satisfactory singular values. The input matrices must be scaled for a backward stable algorithm (backward stable with respect to A, B individually, not with respect to [A, B]).

The remainder of this post will discuss how to compute the backward stable GSVD of (A, B) using matrix scaling and xGGQRCS (QR factorization followed by 2-by-1 CS decomposition).

Set w to the power of 2 nearest to norm(A) / norm(B) and compute the GSVD of [A; wB] with xGGQRCS. This yields orthogonal/unitary matrices U1, U2 and matrices with sine, cosine values S, C such that

A = U1 S X,
wB = U2 C X.
S^* S + C^*C = I.
X has full rank.

We now have to adjust X and the singular values to acquire the GSVD of (A, B).

We can immediately fix the singular values because if σ is a generalized singular value of the matrix pencil (A, wB), then wσ is a singular value of (A, B). xGGQRCS does not return singular values. Instead it returns radians values θ such that σ = tan(θ). Thus, the correct radians value is σ' = arctan(w * tan(θ)); the updated matrices Σ, Γ can be computed with the aid of the new radians values (Σ(i,i) = sin(θ'), Γ(i,i) = cos(θ')`). This operation should be numerically stable because

The arc tangent has maximum derivative 1 over its entire range and perfectly conditioned.
w * tan(θ) will be calculcated accurately because it is a floating-point multiplication.
tan(θ) is
- well conditioned for small angles (less than π/4) [i.e., there are vectors x such that ||wBx|| >= ||Ax||]
- infinite (or already negative) for angles near π/2; then the multiplication by w does not change anything [i.e., there are vectors x such that ||Ax|| > 0, ||wBx = 0]
badly conditioned for θ much larger than π/4 (this should not happen because it would indicate the existence of vectors x with ||wBx|| << ||Ax|| but wB has comparable norm to A).

The question is now if we need to change X and how. Let

|A|   |V1   | |Σ|
| | = |     | | | Y
|B|   |   V2| |Γ|

be the GSVD of (A, B) such that Σ^* Σ + Γ^* Γ = I. Observe

A^* A = X^* S^* S X = Y^* Σ^* Σ Y.
w^2 B^* B = X^* C^* C X = w^2 Y^* Γ^* Γ Y.

Moreover, S^* S and Σ^* Σ are be diagonal which is extremely helpful. For demonstration purposes assume that

S^* S = |0  |
        |  s|

C^* C = |1  |
        |  c|

Σ^* Σ = |0  |
        |  σ|

Γ^* Γ = |1  |
        |  γ|

Furthermore, we assume the (1,1) matrix entry arises through the matrix dimensions (meaning it is free of round-off error). With simple row scalings we can compute Y from X and the old and new singular values:


                      |0 0| |1   0| |1   0|             |0 0|
X^* S^* S X = X^* S^* |   | |     | |     | X = X^* S^* |   | Y.
                      |0 s| |0 σ/s| |0 s/σ|             |0 σ|

                      |1 0| |1   0| |1   0|             |1 0|
X^* C^* C X = X^* C^* |   | |     | |     | X = X^* C^* |   | wY.
                      |0 c| |0 γ/c| |0 c/γ|   =         |0 γ|

Thus

Y[1,:] = X[0,:] / w # first row of Y
Y[2,:] = X[1,:] * s/σ = X[1,:] * c/γ / w # second row of Y

Observe that S(1,1) zero but C(1,1) is not so there is a unique row scaling for the first row of X (divide by w). Similarly, C(i,i) might be zero but then S(i,i) must be one. Moreover, whenever S(i,i) =/= 0, C(i,i) =/= 0, there are two equations for each row. Both equations should yield the same result.

Here is a NumPy demo with a random test matrix causing a large backward error with the current xGGQRCS version:

#!/usr/bin/python3

# This program demonstrates how to adjust singular values and right-hand side
# matrix of the generalized singular value decomposition computed with SGGQRCS
# when one of the input matrices had to be scaled.
#
# Author: Christoph Conrads (https://christoph-conrads.name) 

import numpy as np
import numpy.linalg as L


def make_matrix(x):
    return np.matrix(x, dtype=np.float32)


def main():
    eps = np.finfo(np.float32).eps

    # unmodified input matrices with norm(B) >> norm(A)
    a = make_matrix([[-8.519847412e+02, +6.469862671e+02]])
    b = make_matrix([
        [+5.485938125e+05, -4.166526250e+05],
        [+1.846850781e+05, -1.402660781e+05],
        [+5.322575625e+05, -4.042448438e+05],
        [-1.630551465e+04, +1.238360352e+04],
        [-1.286453438e+05, +9.770555469e+04],
        [-1.323287812e+05, +1.005026797e+05],
        [+5.681228750e+05, -4.314841250e+05],
        [-3.107875312e+05, +2.360408594e+05],
        [+1.456551719e+05, -1.106233281e+05],
        [+1.365355156e+05, -1.036972344e+05]
    ])
    # GSVD computed by SGGQRCS with input A, B/(2**10)
    u1 = make_matrix([[1]])
    # SGGQRCS returns square matrices but we can ignore the other columns
    # because the matrix pencil has only rank 2
    u2 = make_matrix([
        [-5.237864256e-01, +3.081814051e-01],
        [-1.694306731e-01, -5.048786998e-01],
        [-5.036462545e-01, -1.015115157e-01],
        [+1.279635727e-02, +2.352355272e-01],
        [+1.268841326e-01, -4.299084842e-01],
        [+1.265842170e-01, -9.544299543e-02],
        [-5.364804864e-01, -2.056131810e-01],
        [+2.987869084e-01, -3.556257784e-01],
        [-1.335151047e-01, -4.078340828e-01],
        [-1.268201172e-01, -2.355325669e-01]
    ])
    x = make_matrix([
        [-1.029685547e+03, +7.820372925e+02],
        [-8.520648804e+02, +6.470472412e+02]
    ])
    # SGGQRCS returns radians values instead of singular values
    theta = np.float32(1.55708992481e+00)
    # scaling factor
    w = np.power(np.float32(2), np.float32(-10))

    # check for typos
    assert L.norm(u2.H*u2 - np.eye(2)) <= np.sqrt(2) * eps
    assert theta >= 0
    assert theta <= np.pi / 2

    # add helper functions
    copy = lambda x: np.matrix(np.copy(x))
    compute_relative_error = \
        lambda a, u, d, x: L.norm(a - u*d*x) / (L.norm(a) * eps)


    # assemble diagonal matrices
    d1 = make_matrix([[0, np.sin(theta)]])
    d2 = make_matrix([[1, 0], [0, np.cos(theta)]])


    # print information
    print('norm(B) / norm(A) = {:8.2e}'.format(L.norm(b)/L.norm(a)))
    print('Relative error A: {:8.2e}'.format(compute_relative_error(a,u1,d1,x)))
    print('Relative error B: {:8.2e}'.format(compute_relative_error(b,u2,d2,x)))


    # fix values
    theta_fixed = np.arctan(w * np.tan(theta))
    d1_fixed = make_matrix([[0, np.sin(theta_fixed)]])
    d2_fixed = make_matrix([[1, 0], [0, np.cos(theta_fixed)]])

    x_fixed = copy(x) / w
    # alternative: x_fixed[1,:] *= d1[0,1]/d1_fixed[0,1]
    x_fixed[1,:] *= d2[1,1]/d2_fixed[1,1]


    # recompute backward error
    fmt = 'Relative error {:s} after fixes: {:8.2e}'
    print(fmt.format("A", compute_relative_error(a,u1,d1_fixed,x_fixed)))
    print(fmt.format("B", compute_relative_error(b,u2,d2_fixed,x_fixed)))


if __name__ == '__main__':
    main()

Script output on my computer:

norm(B) / norm(A) = 1.24e+03
Relative error A: 1.73e+00
Relative error B: 8.38e+06
Relative error A after fixes: 2.71e+00
Relative error B after fixes: 1.49e+00

christoph-conrads · 2020-05-07T20:34:43Z

@julielangou It's done.

Given a pair of matrices (A, B), xGGQRCS computes the GSVD such that A = U1 D1 X, B = U2 D2 X.

The tests can be found in the branch qr+cs-gsvd in TESTING/cpp/ggqrcs.hpp due to the C++ and Boost.Test dependency.

Bonus functionality (not used by xGGQRCS because row sorting has no effect):

{s,d}LASRTR: row sorting
{s,d}LASRTI: sort an array of indices based on the values they references (used by xLASRTR)

christoph-conrads · 2020-05-13T10:47:28Z

The branch qr+cs-gsvd with the C++ tests and approximately 90% line coverage for xGGQRCS can be found here: https://travis-ci.org/github/christoph-conrads/lapack/branches

christoph-conrads · 2020-05-17T11:26:57Z

Benchmarks

This post compares the performance of the GSVD solvers xGGSVD3 and xGGQRCS.

Introduction

Of course, the comparison will be very limited because of the huge number of parameters relevant to the GSVD, e.g.,

the matrix dimensions,
generalized singular value distribution,
sparsity (all-zero columns, rows),
matrix condition numbers,
floating-point format,
field (real or complex values),
computing all or none of the GSVD matrices,
BLAS implementation (Netlib, OpenBLAS, MKL...),
number of threads,
compilers,
CPU
- instruction set architecture
- CPU architecture
...

In this post we benchmark with matrices A = U1 D1 X, B = U2 D2 X, where

A, B have the same number of rows,
the number of rows and columns is between 8 and 128 inclusive,
the generalized singular values are uniformly distributed either in
- [0, π/4),
- [0, π/2000), or
- [π1999/8000, π/4),
X has condition number 2**(d/4), d being the number of significand digits.
U1, U2 are random,
all matrices must be computed or none,
Netlib BLAS,
CPU time is measured,
at least 100 random matrices must be decomposed and the CPU time must sum up to more than one second.

Code: TESTING/cpp/gsvd_benchmarks.cpp

Attention: xGGSVD3 computes a decomposition A = U1 D1 R Q^*, B = U2 D2 R Q^*. Depending on the problem at hand, it may not suffice to measure only execution time of the GSVD solver.

Results

xGGQRCS is faster than xGGSVD3 if A, B are (almost) square.
xGGQRCS is the slowest relative to xGGSVD3 if A, B have only a small number of columns compared to the number of rows. (Remember A, B have the same number of rows in the benchmarks.)
Switching from real to complex arithmetic has a smaller impact on CPU time for xGGQRCS than for xGGSVD3.
When computing only generalized singular values, xGGQRCS has a smaller CPU time reduction than xGGSVD3.
When computing only generalized singular values and for matrices with more columns than rows, the larger the number of columns, the more xGGQRCS slows down relative to xGGSVD3.
The larger the matrices, the larger the maximum differences in CPU time.
xGGQRCS needs a larger workspace than xGGSVD3.
The larger the number of rows, the larger the workspace of xGGQRCS relative to xGGSVD3.

Plots

The plots below compare the CPU time consumed by the xGGQRCS compared to xGGSVD3 in single-precision when

computing generalized singular values and matrices (upper row),
computing only generalized singular values (lower row),

in

real arithmetic (SGGQRCS, SGGSVD3) in the left half,
complex arithmetic (CGGQRCS, CGGSVD3) in the right half.

The colors indicate the number of matrix rows; the brighter the colors, the larger the number of rows.

The x axis shows the number of columns in the matrices, the y axis shows the CPU time needed by xGGQRCS divided by the CPU time needed by xGGSVD3.

thijssteel

Besides a few minor remarks, looks good to me.

I would worry about one thing: i'm pretty sure the SVD is a little better at identifying the numerical rank than RRQR. I haven't looked into this, but is it possible that the direct GSVD is then also better at this than this method?

SRC/dggqrcs.f

SRC/dlasrti.f

SRC/CMakeLists.txt

SRC/dggqrcs.f

christoph-conrads · 2021-01-16T11:28:19Z

I would worry about one thing: i'm pretty sure the SVD is a little better at identifying the numerical rank than RRQR. I haven't looked into this, but is it possible that the direct GSVD is then also better at this than this method?

The backward error introduced by a QR transformation of a matrix K is on the order of eps ||K|| (see, e.g., Matrix Computations, 4th ed., Section 5.2) meaning all small singular values of the R factor will have a large absolute error. Consequently, this makes rank determination heuristic.

Both GSVD and QR+CS initially apply one or more permuted QR factorizations, see xGGSVP3 or Computing the Generalized Singular Value Decomposition (Equation (1.5)) and xGGQRCS. Based on the fact that the QR factorization is a large backward error source, I do not see one method being generally better than the other with respect to rank determination.

SRC/cggqrcs.f

SRC/zggqrcs.f

christoph-conrads · 2021-02-14T19:03:30Z

@thijssteel Please let me know if you are satisfied by the changes.

christoph-conrads · 2021-02-14T21:28:38Z

Benchmarks

This post compares the performance of the GSVD solvers xGGSVD3 and xGGQRCS.

[snip]

Plots

The plots below compare the CPU time consumed by the xGGQRCS compared to xGGSVD3 in single-precision when
* computing generalized singular values and matrices (upper row),
* computing only generalized singular values (lower row),
in
* real arithmetic (SGGQRCS, SGGSVD3) in the left half,
* complex arithmetic (CGGQRCS, CGGSVD3) in the right half.

The colors indicate the number of matrix rows; the brighter the colors, the larger the number of rows.

The x axis shows the number of columns in the matrices, the y axis shows the CPU time needed by xGGQRCS divided by the CPU time needed by xGGSVD3.

Intel Cascade Lake results (commit 6a61522bd8fd74dadf406bef1c628f94b3947216) on a virtual private server. Note that CGGQRCS takes now more than twice the CPU time of CGGSVD3 for matrix pairs with less than 20 columns when only singular values are requested. This difference was less pronounced in the last set of benchmarks.

gsvd-benchmarks-6a61522bd8fd74dadf406bef1c628f94b3947216.log

plot-gsvd-benchmark.py.gz

h-vetinari · 2021-02-24T08:09:02Z

How consistent is the scaling behaviour observed in the benchmarks? Would it be reliable enough to specify a convenience function (e.g. for SV-only) that switches between xGGSVD3 and xGGQRCS in a way that (mostly) avoids the doubled time for small n,m, but benefits in the larger regime? Because I'd be very tempted to do that as a user (if I had benchmarked things myself), but it would be much nicer if the library already provided that, than having to come with it oneself.

Eyeballing the graphs, I'm thinking of something along the lines of n <= 20 || m/n < 3 or perhaps log(m)/n < ?...

langou · 2021-02-24T14:32:29Z

Hi @h-vetinari,

Yes this is a good idea.

(1) we could write higher levels wrappers that would call the "best" function. The current structure (clean interface for a stand alone xGGSVD3 and clean interface for a stand alone xGGQRCS) is good with me. What we might be missing is higher levels wrappers.

(2) However my preference is, for now, to leave this kind of tuning for higher levels (than LAPACK). Higher levels like Matlab, Octave, Numpy, Scipy, etc. This kind of tuning could be present a lot in LAPACK. For example for QR, there is TSQR (for tall and skinny matrices) and there is the standard QR algorithm. When n < a_function( m ), then TSQR is better than QR. For SYEV, there are four different algorithms (SYEVR, SYEV, SYEVX, SYEVD). Sometimes there are switches from one algorithm to the next when the choice is "obvious", but as soon as the choice gets complicated / machine dependent, these switches are not there.

(3) For now, we could have some comments in the header of these subroutines to give brief explanations to the users and the various choices. Also timings as done by @christoph-conrads are useful to explain to users the trade-off.

(4) For the future, I am not against starting some higher level wrappers. Like QR(m,n,A,lda). This would leave opportunities for software like MKL, OpenBLAS, to optimize the algorithm used by these functions. We could provide reference implementation for these high level wrappers. We should look at what is in LAPACKE.

So bottom line, my personal opinion is to leave things as is for now. But it is a possible to-come-soon agenda items. Opinions welcome.

Cheers, Julien.

weslleyspereira · 2021-02-24T17:46:02Z

Nice job @christoph-conrads ! I think we should add tests for the new routines. Maybe we can just replicate the tests for xCGGSVD3. What do you think?

christoph-conrads · 2021-02-24T18:20:14Z

Nice job @christoph-conrads ! I think we should add tests for the new routines. Maybe we can just replicate the tests for xCGGSVD3. What do you think?

There are tests in the christoph-conrads/qr+cs-gsvd branch in TESTING/cpp/xGGQRCS_tests.cpp and the tests are passing in Travis CI. The tests are not part of the pull request because they are in C++ and require the Boost C++ libraries.

christoph-conrads · 2021-02-24T18:38:15Z

Would it be reliable enough to specify a convenience function (e.g. for SV-only) that switches between xGGSVD3 and xGGQRCS in a way that (mostly) avoids the doubled time for small n,m, but benefits in the larger regime?

No, because there is no such regime. Consider the matrix pencil (A, B) below:

    |1 0|      |0 0|
A = |   |  B = |   |
    |0 0|      |0 1|

Judging only by the dimensions, this is the optimal case xGGQRCS yet in practice xGGSVD3 will be much faster here.

The xGGSVD3 preprocessing(!) stage will determine after executing two QR factorizations that this matrix pencil has the singular value pairs (1, 0) and (0, 1) (or that it has the singular values zero and infinity). xGGQRCS must actually execute in its entirety to arrive at the same conclusion. The critical property in this example are the nullspaces of the matrices and not their dimensions.

This is of course the counter-example from mathematics but I really doubt that such a switch is reliably possible in practice because there are too many relevant variables including: number of rows (matrix A), number of rows (matrix B), number of columns, rank of A, rank of B, rank of (A, B), whether or not to assemble the matrices U1, U2, and X. This list contains only quantities from mathematics. In addition, there are factors like the BLAS implementation, CPU architecture, or the number of cores.

Finally, the major problem with xGGQRCS is its large memory consumption in comparison to xGGSVD3. A method deciding on the fly between xGGQRCS and xGGSVD3 would burden the user with the xGGQRCS storage thirst.

weslleyspereira · 2021-02-24T18:39:42Z

There are tests in the christoph-conrads/qr+cs-gsvd branch in TESTING/cpp/xGGQRCS_tests.cpp and the tests are passing in Travis CI. The tests are not part of the pull request because they are in C++ and require the Boost C++ libraries.

Yes, I saw these tests. I just thought a quick way to get the new methods covered by the current Lapack/TESTING/EIG would be to adapt CERRGG and CGSVTS3 subroutines. And maybe add your new tests in a different PR.

christoph-conrads · 2021-02-24T18:53:40Z

There are tests in the christoph-conrads/qr+cs-gsvd branch in TESTING/cpp/xGGQRCS_tests.cpp and the tests are passing in Travis CI. The tests are not part of the pull request because they are in C++ and require the Boost C++ libraries.

Yes, I saw these tests. I just thought a quick way to get the new methods covered by the current Lapack/TESTING/EIG would be to adapt CERRGG and CGSVTS3 subroutines. And maybe add your new tests in a different PR.

(Emphasis mine)

Adopting xGSVTS3 is not trivial because xGGSVD3 computes factorizations of the form A = U C R V^* while xGGQRCS computes A = U C X. I will try to get it done this weekend.

christoph-conrads · 2021-02-24T19:35:10Z

There are tests in the christoph-conrads/qr+cs-gsvd branch in TESTING/cpp/xGGQRCS_tests.cpp and the tests are passing in Travis CI. The tests are not part of the pull request because they are in C++ and require the Boost C++ libraries.

Yes, I saw these tests. I just thought a quick way to get the new methods covered by the current Lapack/TESTING/EIG would be to adapt CERRGG and CGSVTS3 subroutines. And maybe add your new tests in a different PR.

(Emphasis mine)

Adopting xGSVTS3 is not trivial because xGGSVD3 computes factorizations of the form A = U C R V^* while xGGQRCS computes A = U C X. I will try to get it done this weekend.

*       SUBROUTINE SERRGG( PATH, NUNIT )                                                                                                
*                                                                                                                                       
*       .. Scalar Arguments ..                                                                                                          
*       CHARACTER*3        PATH                                                                                                         
*       INTEGER            NUNIT                                                                                                        
* [snip]
      ELSE IF( LSAMEN( 3, PATH, 'GQR' ) ) THEN                                                                                          
*                                                                                                                                       
*        SGGQRF                                                                                                                         
*                                                                                                                                       
         SRNAMT = 'SGGQRF'                                                                                                              
         INFOT = 1                                                                                                                      
         CALL SGGQRF( -1, 0, 0, A, 1, R1, B, 1, R2, W, LW, INFO )                                                                       
* [snip]
*                                                                                                                                       
*        SGGRQF                                                                                                                         
*                                                                                                                                       
         SRNAMT = 'SGGRQF'

@weslleyspereira Which three letter identifier should I use? If GQR is reused (xGGQRCS), the code would be tested together with the QR and RQ factorization.

weslleyspereira · 2021-02-24T19:59:20Z

@weslleyspereira Which three letter identifier should I use? If GQR is reused (xG_GQR_CS), the code would be tested together with the QR and RQ factorization.

I see... Maybe just xQRCS would be clear enough, no?

christoph-conrads · 2021-02-24T20:09:42Z

@weslleyspereira Which three letter identifier should I use? If GQR is reused (xG_GQR_CS), the code would be tested together with the QR and RQ factorization.

I see... Maybe just xQRCS would be clear enough, no?

Think like it's 1977, not 2021:

*       CHARACTER*3        PATH

weslleyspereira · 2021-02-24T20:22:07Z

Ah... now I saw your mention to the three-letter PATH. I thought it could be the SRNAMT id, sorry.
I would suggest gQR, but the cases don't matter. This takes me to QRC, which doesn't sound good, or UCX, which represents well the final decomposition format but not the routine's name. I don't know...

christoph-conrads · 2021-02-27T19:23:46Z

How consistent is the scaling behaviour observed in the benchmarks?

(1) we could write higher levels wrappers that would call the "best" function. The current structure (clean interface for a stand alone xGGSVD3 and clean interface for a stand alone xGGQRCS) is good with me. What we might be missing is higher levels wrappers.

To have xGGQRCS outperform xGGSVD3 more consistently, xGGQRCS could preprocess the matrices with the function xGGSVP3.

xGGSVP3 computes at most four QR factorizations to detect

the common nullspace of the matrices,
the singular value pairs (1,0) (singular value infinity), and
the singular value pairs (0,1) (singular value zero).

xGGSVP3 reduces the matrices to upper triangular form which xGGQRCS does not need but the preprocessing step would stop xGGQRCS from underperforming when the matrices are heavily rank deficient.

weslleyspereira · 2021-05-21T22:01:36Z

Hi.

The PR christoph-conrads#3 has some bug fixes, and test cases for the new routines. Consider merging that one before accepting this.

* These changes are also in christoph-conrads#2. I just split the branches to ease the merge,

christoph-conrads · 2021-05-22T20:43:56Z

TODO list:

cherry-pick tests implemented by Weslley in Add tests for xGGQRCS christoph-conrads/lapack#3, update API if necessary
~~review and integrate overflow checks in Try to improve norm(A) == 0 in *GGQRCS christoph-conrads/lapack#2~~ not merged, reasoning in post from May 24, 2021
modify switches test in the qr+cs-gsvd branch to flip JOBU1, JOBU2, JOBX as well
remove debugging code
implement xGGQRCS for all input types
- SGGQRCS
- DGGQRCS
- CGGQRCS
- ZGGQRCS

christoph-conrads · 2021-05-24T19:34:44Z

Hi @weslleyspereira,

I do not want to merge christoph-conrads#2 because the current code already checks for overflows. Moreover, the checks about when a matrix norm is too small are purely heuristic and they do not necessarily indicate (user) errors. In such a case, I prefer to stick with simplicity.

christoph-conrads · 2021-05-24T19:37:52Z

@langou @weslleyspereira The column pre-processing requires a new function that copies and transposes matrices and it was named SGETRP. Should I keep the name or should it be renamed to, e.g., SLATRP because of its functionality similar to SLACPY?

langou · 2021-05-24T20:57:46Z

@langou @weslleyspereira The column pre-processing requires a new function that copies and transposes matrices and it was named SGETRP. Should I keep the name or should it be renamed to, e.g., SLATRP because of its functionality similar to SLACPY?

Hi @christoph-conrads,

Thanks for asking.

Reading https://github.com/christoph-conrads/lapack/blob/qr+cs-gsvd/SRC/sgetrp.f

The prefix LA is for "auxiliary" and it was used consistently when LAPACK had clearer distinction between "driver" and "computational" and "auxiliary". There are a lots of xLAwyz subroutines out there. :). There is some merit to have a distinction between "driver" and "computational" and "auxiliary", but this distinction has slowly become more and more foreign over time. Also, one can argue whether a transpose routine should considered "driver" or "computational" or "auxiliary". This is not obvious and it depends on the definitions of these three categories. It seems that the definitions depend on whom you ask.

I like SGETRP more than SLATRP. I do not dislike SLATRP though.

So SGETRP is out-of-place transpose. Sounds good. Thanks for the contribution.

Note that Gustavson and Swirszcz have a nice in-place transposition algorithm. (This is not in LAPACK, but, this is in the wishlist.) See Gustavson F.G., Swirszcz T. (2007) In-Place Transposition of Rectangular Matrices. In: Kågström B., Elmroth E., Dongarra J., Waśniewski J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2006. Lecture Notes in Computer Science, vol 4699. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75755-9_68

Note as well that MKL has mkl_?imatcopy for scaling and in-place transposition/copying of matrices and mkl_?omatcopy for scaling and out-of-place transposition/copying of matrices and mkl_?omatcopy2 for two-strided version of mkl_?omatcopy.

h-vetinari · 2021-06-29T07:04:29Z

Since 3.10.0 was released now, this should probably be re-milestoned.

langou · 2021-06-29T13:31:02Z

Good point @h-vetinari. Thanks! (Done.)

christoph-conrads · 2021-10-24T19:54:58Z

Debugging the failing test `regression_preprocessing_20210606`

tl;Dr SGGQRCS fails because of SORCSD2BY1 computing incorrect factors U1 in the presence of tiny, numerically irrelevant values. edit continued in #634 /edit

Test regression_preprocessing_20210606 in branch christoph-conrads/qr+cs-gsvd.

The test fails with a correctly computed matrix B and an incorrect matrix A = U1 S X. The fact that B = U2 C X is computed correctly is an important hint because it implies that

the singular values in the matrix C are correct and by extension the matrix S,
the matrix X is correct.

A looks as follows:

-1.21e+00 +3.80e-01 +9.33e-02 -4.93e-02 +1.16e-01 -3.58e-07 +4.64e-02
+0.00e+00 +0.00e+00 +0.00e+00 +0.00e+00 +0.00e+00 +0.00e+00 +0.00e+00

The matrix assembled from U1 S X (which should be equal to A) looks as follows:

-8.59e-01 +2.69e-01 +6.60e-02 -3.49e-02 +8.20e-02 -9.78e-08 +3.28e-02
-8.59e-01 +2.69e-01 +6.60e-02 -3.49e-02 +8.20e-02 -9.78e-08 +3.28e-02

Since the pre-processing of A is disabled, the matrix U1 is computed by SORCSD2BY1 and never modified by SGGQRCS allowing one to focus on SORCSD2BY1 as the problem source or a memory problem (i.e., a buffer overread or a buffer overflow).

edit
Moreover, the Frobenius norms of the matrices U1 computed with and without pre-processing of B, respectively, are identical hinting at a mathematical problem with the matrix U1 instead of a memory bug.
/edit

Given an isometric matrix Q = [X1; X2], SORCSD2BY1 computes the CS decomposition of Q.
The input matrix Q for SORCSD2BY1 looks as follows, where the first two rows belong to the matrix X1 = U1 S V^* and the other two rows to the matrix X2 = U2 C V^*; SORCSD2BY1 computes U1, U2, S, C, V. U1, S, C are identical to the matrices mentioned above in the context of SGGQRCS.

("isometric": Q has no more rows than columns and Q^* Q is the identity)

-2.38418579E-07 1.00000000     -2.66202893E-10
 0.00000000     0.00000000      2.66202893E-10
 1.00000000     2.38418579E-07 -5.55111512E-17
 0.00000000     2.66202893E-10  1.00000000

This matrix can in single-precision be considered isometric. Moreover, U1 should be a permutation of the identity matrix with (almost) arbitrary signs, e.g.,

-1  0
 0 -1

(If you have trouble seeing this, then set all the matrix entries small in modulus to zero.)

Instead, the following matrix is computed:

-sqrt(2) +sqrt(2)
-sqrt(2) -sqrt(2)

This is wrong. After a bit of experimenting, one can construct the test xUNCSD2BY1_regression_20211024 triggering the problem with the following simpler matrix (ε is the epsilon value for single- or double-precision arithmetic, respectively; the test triggers the misbehavior also in double-precision but the constant c was chosen by experimenting with single-precision code):

c := 2^-45
μ := c ε

    | 0  1 -μ |
Q = | 0  0 +μ |
    | 1  0  0 |
    | 0  μ  1 |

briansutton · 2021-10-25T22:17:18Z

About the failed test reported by @christoph-conrads just above:

There seems to be a discrepancy between two comparisons with zero around the "twice is enough" procedure of SORBDB6. Within this subroutine, SLASSQ is used to measure the squared norm for a vector. But then the calling routine SORBDB5 uses SNRM2 to measure a squared norm in a few places. Under certain conditions, SLASSQ returns zero but SNRM2 returns nonzero for the same vector.

I think the least disruptive fix is for SORBDB6 to explicitly set all entries to zero when SLASSQ measures the norm to be zero.

While tracking this down, I noticed a second bug: In lines 296-301 of SORBDB5, the increments INCX1 and INCX2 are not currently used.

Here is a diff that may fix both problems:

--- lapack-3.10.0/SRC/sorbdb6.f
+++ sorbdb6.f
@@ -167,14 +167,13 @@
 *  =====================================================================
 *
 *     .. Parameters ..
-      REAL               ALPHASQ, REALONE, REALZERO
-      PARAMETER          ( ALPHASQ = 0.01E0, REALONE = 1.0E0,
-     $                     REALZERO = 0.0E0 )
+      REAL               ALPHASQ, REALZERO
+      PARAMETER          ( ALPHASQ = 0.01E0, REALZERO = 0.0E0 )
       REAL               NEGONE, ONE, ZERO
       PARAMETER          ( NEGONE = -1.0E0, ONE = 1.0E0, ZERO = 0.0E0 )
 *     ..
 *     .. Local Scalars ..
-      INTEGER            I
+      INTEGER            I, IX
       REAL               NORMSQ1, NORMSQ2, SCL1, SCL2, SSQ1, SSQ2
 *     ..
 *     .. External Subroutines ..
@@ -215,12 +214,21 @@
 *     space
 *
       SCL1 = REALZERO
-      SSQ1 = REALONE
+      SSQ1 = REALZERO
       CALL SLASSQ( M1, X1, INCX1, SCL1, SSQ1 )
       SCL2 = REALZERO
-      SSQ2 = REALONE
+      SSQ2 = REALZERO
       CALL SLASSQ( M2, X2, INCX2, SCL2, SSQ2 )
       NORMSQ1 = SCL1**2*SSQ1 + SCL2**2*SSQ2
+      IF ( NORMSQ1 .EQ. 0 ) THEN
+         DO IX = 1, 1 + (M1-1)*INCX1, INCX1
+           X1( IX ) = ZERO
+         END DO
+         DO IX = 1, 1 + (M2-1)*INCX2, INCX2
+           X2( IX ) = ZERO
+         END DO
+        RETURN
+      END IF
 *
       IF( M1 .EQ. 0 ) THEN
          DO I = 1, N
@@ -239,10 +247,10 @@
      $            INCX2 )
 *
       SCL1 = REALZERO
-      SSQ1 = REALONE
+      SSQ1 = REALZERO
       CALL SLASSQ( M1, X1, INCX1, SCL1, SSQ1 )
       SCL2 = REALZERO
-      SSQ2 = REALONE
+      SSQ2 = REALZERO
       CALL SLASSQ( M2, X2, INCX2, SCL2, SSQ2 )
       NORMSQ2 = SCL1**2*SSQ1 + SCL2**2*SSQ2
 *
@@ -255,6 +263,12 @@
       END IF
 *
       IF( NORMSQ2 .EQ. ZERO ) THEN
+         DO IX = 1, 1 + (M1-1)*INCX1, INCX1
+           X1( IX ) = ZERO
+         END DO
+         DO IX = 1, 1 + (M2-1)*INCX2, INCX2
+           X2( IX ) = ZERO
+         END DO
          RETURN
       END IF
 *
@@ -281,10 +295,10 @@
      $            INCX2 )
 *
       SCL1 = REALZERO
-      SSQ1 = REALONE
+      SSQ1 = REALZERO
       CALL SLASSQ( M1, X1, INCX1, SCL1, SSQ1 )
       SCL2 = REALZERO
-      SSQ2 = REALONE
+      SSQ2 = REALZERO
       CALL SLASSQ( M1, X1, INCX1, SCL1, SSQ1 )
       NORMSQ2 = SCL1**2*SSQ1 + SCL2**2*SSQ2
 *
@@ -293,11 +307,11 @@
 *     truncate it to zero.
 *
       IF( NORMSQ2 .LT. ALPHASQ*NORMSQ1 ) THEN
-         DO I = 1, M1
-            X1(I) = ZERO
+         DO IX = 1, 1 + (M1-1)*INCX1, INCX1
+            X1(IX) = ZERO
          END DO
-         DO I = 1, M2
-            X2(I) = ZERO
+         DO IX = 1, 1 + (M2-1)*INCX2, INCX2
+            X2(IX) = ZERO
          END DO
       END IF
 *

This patch was authored by Brian D. Sutton and posted to the discussion of LAPACK pull request Reference-LAPACK#406. * fix indexing for vector increments different from one * always set vectors that are numerically zero to zero Previously SORBDB6 would only set vectors to zero if a second iteration of Gram-Schmidt was necessary. This would cause problems on the caller site if the test for a zero vector differed from the SORBDB6 test for zero.

christoph-conrads · 2021-10-26T18:41:55Z

About the failed test reported by @christoph-conrads just above:

There seems to be a discrepancy between two comparisons with zero around the "twice is enough" procedure of SORBDB6. Within this subroutine, SLASSQ is used to measure the squared norm for a vector. But then the calling routine SORBDB5 uses SNRM2 to measure a squared norm in a few places. Under certain conditions, SLASSQ returns zero but SNRM2 returns nonzero for the same vector.

Yes, I reached the same conclusion in #634.

I think the least disruptive fix is for SORBDB6 to explicitly set all entries to zero when SLASSQ measures the norm to be zero.

This is probably the best fix if the API should be left intact (one could return a boolean).

While tracking this down, I noticed a second bug: In lines 296-301 of SORBDB5, the increments INCX1 and INCX2 are not currently used.

Good catch.

Here is a diff that may fix both problems:
[snip]

This works in single-precision. Your code can be found in commit 9ef9480 which belongs to the branch 634-fix-xORBDB5-zero-check. I will port this to xUNBDB5.

christoph-conrads · 2021-10-26T19:57:39Z

There is a problem in the diff:

+      SSQ1 = REALZERO
       CALL SLASSQ( M1, X1, INCX1, SCL1, SSQ1 )
       SCL2 = REALZERO
-      SSQ2 = REALONE
+      SSQ2 = REALZERO
       CALL SLASSQ( M1, X1, INCX1, SCL1, SSQ1 )
       NORMSQ2 = SCL1**2*SSQ1 + SCL2**2*SSQ2

It says SLASSQ( M1, X1, INCX1, SCL1, SSQ1 ) twice now. I will fix this tomorrow in the code.

christoph-conrads · 2021-10-27T16:36:11Z

There is a problem in the diff:

+      SSQ1 = REALZERO
       CALL SLASSQ( M1, X1, INCX1, SCL1, SSQ1 )
       SCL2 = REALZERO
-      SSQ2 = REALONE
+      SSQ2 = REALZERO
       CALL SLASSQ( M1, X1, INCX1, SCL1, SSQ1 )
       NORMSQ2 = SCL1**2*SSQ1 + SCL2**2*SSQ2

It says SLASSQ( M1, X1, INCX1, SCL1, SSQ1 ) twice now. I will fix this tomorrow in the code.

No, this is a bug in the xORBDB6/xUNBDB6 code:

lapack/SRC/sorbdb6.f

Lines 283 to 294 in 5d4180c

    
                 SCL1 = REALZERO 
        
                 SSQ1 = REALONE 
        
                 CALL SLASSQ( M1, X1, INCX1, SCL1, SSQ1 ) 
        
                 SCL2 = REALZERO 
        
                 SSQ2 = REALONE 
        
                 CALL SLASSQ( M1, X1, INCX1, SCL1, SSQ1 ) 
        
                 NORMSQ2 = SCL1**2*SSQ1 + SCL2**2*SSQ2 
        
           * 
        
           *     If second projection is sufficiently large in norm, then do 
        
           *     nothing more. Alternatively, if it shrunk significantly, then 
        
           *     truncate it to zero. 
        
           *

SCL2 and SSQ2 are not updated after the second Gram-Schmidt orthogonalization.

This patch was authored by Brian D. Sutton and posted to the discussion of LAPACK pull request Reference-LAPACK#406. * fix indexing for vector increments different from one * always set vectors that are numerically zero to zero Previously SORBDB6 would only set vectors to zero if a second iteration of Gram-Schmidt was necessary. This would cause problems on the caller site if the test for a zero vector differed from the SORBDB6 test for zero.

h-vetinari · 2022-04-19T23:03:05Z

@christoph-conrads
What's the current status on this? I ask because @weslleyspereira mentioned in #629 that 3.11.0 is not too far away anymore:

I think a minor release, 3.11.0, is a better place to have new algorithms. I don't think it will take long to have it released.

christoph-conrads · 2022-04-24T19:44:01Z

Hi @h-vetinari, #647 must be applied and there were failing tests (see the qr+cs-gsvd branch) the last time I checked. I will take a look at it next weekend.

This patch was authored by Brian D. Sutton and posted to the discussion of LAPACK pull request Reference-LAPACK#406. * fix indexing for vector increments different from one * always set vectors that are numerically zero to zero Previously SORBDB6 would only set vectors to zero if a second iteration of Gram-Schmidt was necessary. This would cause problems on the caller site if the test for a zero vector differed from the SORBDB6 test for zero.

christoph-conrads · 2022-08-05T10:15:50Z

I am looking at the code and understanding the cause of a bug where all matrices except the matrix U1 (A = U1 S X) are numerically accurate when the pre-processing of A is active.

christoph-conrads · 2023-12-07T10:22:44Z

An update regarding the state of this PR:

The latest code can be found here: https://github.com/christoph-conrads/lapack/tree/qr+cs-gsvd (this branch is not the PR branch because of the C++ tests and their extensive dependencies on the Boost C++ libraries).
The PR is blocked by problems in the 2-by-1 CS decomposition code, see xLARFGP unnecessarily overflows #938 and xBBCSD: read of unitialised value #943.
The real single-precision implementation is complete with a work-around for xLARFGP unnecessarily overflows #938.
The real double-precision implementation is ~~in progress~~ complete; ~~the work-around for xLARFGP unnecessarily overflows #938 does not work as envisioned.~~ xLARFGP unnecessarily overflows #938 is not triggered in double precision. The problems were caused by DBBCSD may compute insufficiently accurate singular vectors #965.
~~The complex single-precision implementation is in progress.~~
All implementations are present (single precision, double precision, and both complex-valued implementations).
The computed decomposition is inaccurate when one of the input matrices exhibits strong row scaling.

Last update: Jan 5, 2024

thijssteel previously requested changes Jan 9, 2021

View reviewed changes

christoph-conrads commented Jan 17, 2021

View reviewed changes

SRC/cggqrcs.f Outdated Show resolved Hide resolved

SRC/cggqrcs.f Outdated Show resolved Hide resolved

SRC/zggqrcs.f Outdated Show resolved Hide resolved

christoph-conrads force-pushed the add-xGGQRCS branch 2 times, most recently from cdbd846 to fdcae32 Compare February 12, 2021 17:48

christoph-conrads force-pushed the add-xGGQRCS branch from a7febc1 to 9a1f28c Compare February 14, 2021 20:58

christoph-conrads force-pushed the add-xGGQRCS branch from 9a1f28c to 9e20c49 Compare February 21, 2021 18:33

christoph-conrads force-pushed the add-xGGQRCS branch from 9e20c49 to e6163a1 Compare February 24, 2021 20:57

langou modified the milestones: LAPACK 3.10.0, LAPACK 3.11.0 Jun 29, 2021

christoph-conrads mentioned this pull request Oct 27, 2021

xORBDB6 performs numerically unadvisable operations #634

Closed

2 tasks

julielangou removed this from the LAPACK 3.11.0 milestone Nov 12, 2022

christoph-conrads mentioned this pull request Nov 2, 2023

SORCSD2BY1 computes inaccurate result #917

Closed

2 tasks

christoph-conrads mentioned this pull request Nov 19, 2023

xLARFGP unnecessarily overflows #938

Open

2 tasks

christoph-conrads mentioned this pull request Dec 29, 2023

DBBCSD may compute insufficiently accurate singular vectors #965

Open

2 tasks

christoph-conrads mentioned this pull request Jan 17, 2024

C/ZLARFGP must re-scale when ALPHA is not real #980

Closed

2 tasks

Add GSVD with QR factorizations, 2-by-1 CS decomposition #406

Are you sure you want to change the base?

Add GSVD with QR factorizations, 2-by-1 CS decomposition #406

Conversation

christoph-conrads commented Apr 22, 2020

christoph-conrads commented Apr 22, 2020 • edited Loading

codecov bot commented Apr 22, 2020 • edited Loading

Codecov Report

christoph-conrads commented Apr 23, 2020 • edited Loading

christoph-conrads commented May 2, 2020

christoph-conrads commented May 2, 2020 • edited Loading

christoph-conrads commented May 7, 2020

christoph-conrads commented May 13, 2020

christoph-conrads commented May 17, 2020 • edited Loading

Benchmarks

Introduction

Results

Plots

thijssteel left a comment

Choose a reason for hiding this comment

christoph-conrads commented Jan 16, 2021

christoph-conrads commented Feb 14, 2021

christoph-conrads commented Feb 14, 2021 • edited Loading

Benchmarks

Plots

h-vetinari commented Feb 24, 2021

langou commented Feb 24, 2021

weslleyspereira commented Feb 24, 2021

christoph-conrads commented Feb 24, 2021

christoph-conrads commented Feb 24, 2021 • edited Loading

weslleyspereira commented Feb 24, 2021

christoph-conrads commented Feb 24, 2021

christoph-conrads commented Feb 24, 2021

weslleyspereira commented Feb 24, 2021

christoph-conrads commented Feb 24, 2021

weslleyspereira commented Feb 24, 2021 • edited Loading

christoph-conrads commented Feb 27, 2021

weslleyspereira commented May 21, 2021 • edited Loading

christoph-conrads commented May 22, 2021 • edited Loading

christoph-conrads commented May 24, 2021

christoph-conrads commented May 24, 2021

langou commented May 24, 2021 • edited Loading

h-vetinari commented Jun 29, 2021

langou commented Jun 29, 2021

christoph-conrads commented Oct 24, 2021 • edited Loading

Debugging the failing test regression_preprocessing_20210606

briansutton commented Oct 25, 2021

christoph-conrads commented Oct 26, 2021

christoph-conrads commented Oct 26, 2021 • edited Loading

christoph-conrads commented Oct 27, 2021

h-vetinari commented Apr 19, 2022

christoph-conrads commented Apr 24, 2022

christoph-conrads commented Aug 5, 2022

christoph-conrads commented Dec 7, 2023 • edited Loading

christoph-conrads commented Apr 22, 2020 •

edited

Loading

codecov bot commented Apr 22, 2020 •

edited

Loading

christoph-conrads commented Apr 23, 2020 •

edited

Loading

christoph-conrads commented May 2, 2020 •

edited

Loading

christoph-conrads commented May 17, 2020 •

edited

Loading

christoph-conrads commented Feb 14, 2021 •

edited

Loading

christoph-conrads commented Feb 24, 2021 •

edited

Loading

weslleyspereira commented Feb 24, 2021 •

edited

Loading

weslleyspereira commented May 21, 2021 •

edited

Loading

christoph-conrads commented May 22, 2021 •

edited

Loading

langou commented May 24, 2021 •

edited

Loading

christoph-conrads commented Oct 24, 2021 •

edited

Loading

Debugging the failing test `regression_preprocessing_20210606`

christoph-conrads commented Oct 26, 2021 •

edited

Loading

christoph-conrads commented Dec 7, 2023 •

edited

Loading