Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xORBDB6 performs numerically unadvisable operations #634

Closed
2 tasks done
christoph-conrads opened this issue Oct 25, 2021 · 10 comments · Fixed by #647
Closed
2 tasks done

xORBDB6 performs numerically unadvisable operations #634

christoph-conrads opened this issue Oct 25, 2021 · 10 comments · Fixed by #647

Comments

@christoph-conrads
Copy link
Contributor

Description

xORBDB5 and xORBDB6 compute vector norms differently causing disagreement on when a vector is numerically zero.

Given an isometric matrix Q and a vector x, xORBDB5 computes a vector x' that is orthogonal to the span of the columns of Q. (Q isometric means that the following two properties hold: Q has at most as many columns as rows and Q^* Q = I). Internally xORBDB5 calls xORBDB6. Given an isometric matrix Q and a vector x, xORBDB6 projects x onto the complement of the column span of Q. The code contains only the ominous comment that

Kahan's "twice is enough" criterion

is applied. Probably this means that xORBDB6 uses at most two iterations of classical Gram-Schmidt orthogonalization to compute its results. This approach is known as CGS2, see BarlowS2011 or GiraudLR2002 and it matches the matrix-vector multiplications inside this function. The problem are the computation of the vector norm.

For a xORBDB5, the norm of the vector computed by xORBDB6 is computed with SNRM2:

lapack/SRC/sorbdb5.f

Lines 223 to 226 in 5d4180c

IF( SNRM2(M1,X1,INCX1) .NE. ZERO
$ .OR. SNRM2(M2,X2,INCX2) .NE. ZERO ) THEN
RETURN
END IF

For xORBDB6, the vector norm is computed as follows:

lapack/SRC/sorbdb6.f

Lines 241 to 247 in 5d4180c

SCL1 = REALZERO
SSQ1 = REALONE
CALL SLASSQ( M1, X1, INCX1, SCL1, SSQ1 )
SCL2 = REALZERO
SSQ2 = REALONE
CALL SLASSQ( M2, X2, INCX2, SCL2, SSQ2 )
NORMSQ2 = SCL1**2*SSQ1 + SCL2**2*SSQ2

Consider the input to xORBDB6 below:

c := 2^-45
μ := c ε

    | 0  1 -μ |
Q = | 0  0 +μ |
    | 1  0  0 |
    | 0  μ  1 |

With x = e_2 (i.e., the second standard basis vector), the xORBDB6 variable NORMSQ2 is zero whereas SNRM2 is nonzero causing xORBDB5 to return an incorrect vector.

Checklist

  • I've included a minimal example to reproduce the issue
  • I'd be willing to make a PR to solve this issue
@christoph-conrads christoph-conrads changed the title xORBDB5, xORBDB6 disagree on criterion for zero vector xORBDB5, xORBDB6 disagree when vectors are numerically zero Oct 25, 2021
christoph-conrads added a commit to christoph-conrads/lapack that referenced this issue Oct 26, 2021
@christoph-conrads
Copy link
Contributor Author

A test highlighting the problem can be found in christoph-conrads/qr+cs-gsvd branch in commit 6c576c4. The test does not trigger yet in double precision because NORMSQ2 in DORBDB6 is not zero while it is zero in SORBDB6.

@christoph-conrads
Copy link
Contributor Author

lapack/SRC/sorbdb6.f

Lines 241 to 247 in 5d4180c

SCL1 = REALZERO
SSQ1 = REALONE
CALL SLASSQ( M1, X1, INCX1, SCL1, SSQ1 )
SCL2 = REALZERO
SSQ2 = REALONE
CALL SLASSQ( M2, X2, INCX2, SCL2, SSQ2 )
NORMSQ2 = SCL1**2*SSQ1 + SCL2**2*SSQ2

This should be

      SCL = REALZERO
      SSQ = REALONE
      CALL SLASSQ( M1, X1, INCX1, SCL, SSQ )
      CALL SLASSQ( M2, X2, INCX2, SCL, SSQ )
      NORMSQ2 = SCL**2 * SSQ

@christoph-conrads christoph-conrads changed the title xORBDB5, xORBDB6 disagree when vectors are numerically zero xORBDB6 performs numerically unadvisable operations Oct 27, 2021
@christoph-conrads
Copy link
Contributor Author

christoph-conrads commented Oct 27, 2021

In my opinion, the functions xORBDB6/xUNBDB6 should be modified to enhance numerical stability.

The first issue is that there are no restrictions on the norm of X. Right now this is not a problem because the only caller xORBDB5 is passing unit-length vectors. Either the documentation should require unit-length vectors or the function should rescale vectors if necessary.

NORMSQ2 = SCL1**2*SSQ1 + SCL2**2*SSQ2

The expression might over- or underflow. An underflow is one of the causes of the problem observed in the pull request #406 test regression_preprocessing_20210606. The fix is as follows:

      SCL = REALZERO
      SSQ = REALONE
      CALL SLASSQ( M1, X1, INCX1, SCL, SSQ )
      CALL SLASSQ( M2, X2, INCX2, SCL, SSQ )
      NORM = SCL * SQRT(SSQ)

lapack/SRC/sorbdb6.f

Lines 257 to 259 in 5d4180c

IF( NORMSQ2 .EQ. ZERO ) THEN
RETURN
END IF

edit For xORBDB6, the right-hand side constant could be REALZERO for consistency.

This branch is only taken if X is exactly zero after the first Gram-Schmidt orthogonalization which will practically never be the case (unless the norm computation underflows). The condition should be replaced with something like

if norm_after_gs <= (m1 + m2) * eps * norm:
    x(:) = 0
    return

lapack/SRC/sorbdb6.f

Lines 283 to 289 in 5d4180c

SCL1 = REALZERO
SSQ1 = REALONE
CALL SLASSQ( M1, X1, INCX1, SCL1, SSQ1 )
SCL2 = REALZERO
SSQ2 = REALONE
CALL SLASSQ( M1, X1, INCX1, SCL1, SSQ1 )
NORMSQ2 = SCL1**2*SSQ1 + SCL2**2*SSQ2

After the second Gram-Schmidt orthogonalization, SCL2 and SSQ2 are not updated properly.

lapack/SRC/sorbdb6.f

Lines 290 to 295 in 5d4180c

*
* If second projection is sufficiently large in norm, then do
* nothing more. Alternatively, if it shrunk significantly, then
* truncate it to zero.
*
IF( NORMSQ2 .LT. ALPHASQ*NORMSQ1 ) THEN

This comparison might fail if the right-hand side underflows; this can happen if the vector X was already small in norm after the first Gram-Schmidt orthogonalization. X should be re-scaled.

For completeness, a pull request should fix the vector increments as Brian Sutton correctly noted and also check or handle negative increments like other LAPACK functions.

@briansutton @langou @weslleyspereira Please comment on my proposal.

edit ALPHASQ must be updated

christoph-conrads added a commit to christoph-conrads/lapack that referenced this issue Oct 27, 2021
@weslleyspereira
Copy link
Collaborator

Hi @christoph-conrads. Thanks for the time of explaining all of this!

I talked to @langou last week. He had something else to say about the precision of the final orthogonal vector depending on how Q is far from being a true orthogonal matrix. I'm not sure if he will be able to comment on this thread this week, so here are my comments:

In my opinion, the functions xORBDB6/xUNBDB6 should be modified to enhance numerical stability.

The first issue is that there are no restrictions on the norm of X. Right now this is not a problem because the only caller xORBDB5 is passing unit-length vectors. Either the documentation should require unit-length vectors or the function should rescale vectors if necessary.

I agree. I would go with updating the documentation.

NORMSQ2 = SCL1**2*SSQ1 + SCL2**2*SSQ2

The expression might over- or underflow. An underflow is one of the causes of the problem observed in the pull request #406 test regression_preprocessing_20210606. The fix is as follows:

      SCL = REALZERO
      SSQ = REALONE
      CALL SLASSQ( M1, X1, INCX1, SCL, SSQ )
      CALL SLASSQ( M2, X2, INCX2, SCL, SSQ )
      NORM = SCL * SQRT(SSQ)

I agree.

lapack/SRC/sorbdb6.f

Lines 257 to 259 in 5d4180c

IF( NORMSQ2 .EQ. ZERO ) THEN
RETURN
END IF

edit For xORBDB6, the right-hand side constant could be REALZERO for consistency.

I agree.

This branch is only taken if X is exactly zero after the first Gram-Schmidt orthogonalization which will practically never be the case (unless the norm computation underflows). The condition should be replaced with something like

if norm_after_gs <= (m1 + m2) * eps * norm:
    x(:) = 0
    return

I agree the condition should either be removed or be changed another one involving some tolerance. I'm not sure which tolerance. I think it makes sense to use something like norm_after_gs <= eps * norm. I don't understand the why we would need the factor (m1 + m2). Also, I think this:

lapack/SRC/sorbdb5.f

Lines 223 to 226 in 5d4180c

IF( SNRM2(M1,X1,INCX1) .NE. ZERO
$ .OR. SNRM2(M2,X2,INCX2) .NE. ZERO ) THEN
RETURN
END IF

should be modified using the same pattern.

lapack/SRC/sorbdb6.f

Lines 283 to 289 in 5d4180c

SCL1 = REALZERO
SSQ1 = REALONE
CALL SLASSQ( M1, X1, INCX1, SCL1, SSQ1 )
SCL2 = REALZERO
SSQ2 = REALONE
CALL SLASSQ( M1, X1, INCX1, SCL1, SSQ1 )
NORMSQ2 = SCL1**2*SSQ1 + SCL2**2*SSQ2

After the second Gram-Schmidt orthogonalization, SCL2 and SSQ2 are not updated properly.

Oh... this is a bug!

lapack/SRC/sorbdb6.f

Lines 290 to 295 in 5d4180c

*
* If second projection is sufficiently large in norm, then do
* nothing more. Alternatively, if it shrunk significantly, then
* truncate it to zero.
*
IF( NORMSQ2 .LT. ALPHASQ*NORMSQ1 ) THEN

This comparison might fail if the right-hand side underflows; this can happen if the vector X was already small in norm after the first Gram-Schmidt orthogonalization. X should be re-scaled.

Suppose we use a more appropriate check for norm_after_gs after the first Gram-Schmidt orthogonalization as you suggests. Then, X won't need to be re-scaled, right?

For completeness, a pull request should fix the vector increments as Brian Sutton correctly noted and also check or handle negative increments like other LAPACK functions.

I agree.

@briansutton @langou @weslleyspereira Please comment on my proposal.

edit ALPHASQ must be updated

I agree. The new value would be ALPHASQ = 0.1E0.

@christoph-conrads
Copy link
Contributor Author

christoph-conrads commented Nov 11, 2021

This branch is only taken if X is exactly zero after the first Gram-Schmidt orthogonalization which will practically never be the case (unless the norm computation underflows). The condition should be replaced with something like

if norm_after_gs <= (m1 + m2) * eps * norm:
    x(:) = 0
    return

[snip] I don't understand the why we would need the factor (m1 + m2).

The factor is essentially the error bound n ε/2 |x^* y| on the result of a single inner product |x^* y| multiplied with a magic constant convenient for programming. Strictly speaking, the right-hand side should be sqrt(n^2) ε/2. I mixed up the number of rows and columns in the expression.

In Highman Accuracy and Stability of Numerical Algorithms (2002) the norm-wise error bound for matrix-vector multiplication y = Ax is given as ||y - y'||₂ ≤ sqrt(min(m, n)) γ(n) ||A||₂ ||x||₂, where

  • γ(n) = (n ε) / (2 - n ε),
  • m x n are the dimensions of the matrix A, and
  • y' is the result of the matrix computation in floating-point arithmetic (see §3.5 Matrix Multiplication).

With the improved error bounds in Jeannerod, Rump: Improved Error Bounds for Inner Products in Floating-Point Arithmetic (2013) it might be possible to consider the function γ(n) as γ(n) = n ε/2. Strictly speaking, this holds only for the one, maximum and Frobenius norm (see the §5 Concluding Remarks). The expression in §5 Concluding Remarks can also be used for the case p = 2 (the spectral and Euclidean norm, respectively) because the matrix is isometric; hence ||A||ₚ = sqrt(n) ||A||₂, p = F.

Also, I think this:

lapack/SRC/sorbdb5.f

Lines 223 to 226 in 5d4180c

IF( SNRM2(M1,X1,INCX1) .NE. ZERO
$ .OR. SNRM2(M2,X2,INCX2) .NE. ZERO ) THEN
RETURN
END IF

should be modified using the same pattern.

If xORBDB6 considers a vector to be numerically zero, then it is set to zero, see SRC/sorbdb6.f, line 295ff.. Absent a dedicated boolean return value indicating whether the vector is considered to lie in the range of the matrix or not, I think this is the best solution because it stops the caller from having to know about the exact test employed by xORBDB6 to check for numerical zero. This fix was also proposed by Brian D. Sutton in pull request 406.

edit ALPHASQ must be updated

I agree. The new value would be ALPHASQ = 0.1E0.

ALPHASQ should also be renamed to ALPHA (ALPHASQ).

@christoph-conrads
Copy link
Contributor Author

christoph-conrads commented Jan 30, 2022

Suppose we use a more appropriate check for norm_after_gs after the first Gram-Schmidt orthogonalization as you suggests. Then, X won't need to be re-scaled, right?

I do not think so. If the input vector x has initially unit norm, then the algorithm will terminate early if the norm of (I - Q^* Q) x is less than n * ε, where n is the dimension of x. Hence, if the projection is repeated, then ||(I - Q^* Q)^2 x||_2 is at least n^2 ε**2 which is much larger than any denormalized number for IEEE floats. Consequently, I would not assume there to be underflows and hence, no need for re-scaling.

@langou
Copy link
Contributor

langou commented Apr 12, 2022

I talked to @langou last week. He had something else to say about the precision of the final orthogonal vector depending on how Q is far from being a true orthogonal matrix. I'm not sure if he will be able to comment on this thread this week

I have been wondering why we do not use some Householder reflections for these routines, as opposed to Gram-Schmidt. This seems doable. And we would not be worried about numerical stability.

The proof of 2005 for stability of CGS2 requires the assumption that the input matrix is numerically nonsingular. (So "twice is enough" if something like "epsilon * kappa < 1".) I do not know whether this condition will hold in this piece of code.

There are a few places in LAPACK where we use Gram-Schmidt and I think we should make an effort to use Householder reflections instead.

Cheers,
Julien.

@christoph-conrads
Copy link
Contributor Author

christoph-conrads commented Apr 24, 2022

The proof of 2005 for stability of CGS2 requires the assumption that the input matrix is numerically nonsingular. (So "twice is enough" if something like "epsilon * kappa < 1".) I do not know whether this condition will hold in this piece of code.

The caller must ensure that Q has orthonormal columns due to the function name (xORBDB6) and the documentation clarifies it as well.

@christoph-conrads
Copy link
Contributor Author

I have been wondering why we do not use some Householder reflections for these routines, as opposed to Gram-Schmidt. This seems doable. And we would not be worried about numerical stability.

The matrix Q must have orthonormal columns; Gram-Schmidt can be applied directly without negative consequences.

@langou
Copy link
Contributor

langou commented Apr 24, 2022

Hi Christoph,

Even if Q is a perfectly orthonormal basis, we can imagine that we run into issues with a two-projection Gram-Schmidt scheme.

If x has many components in Span(Q) ( ||Q^Tx || ~ || x || ), then the first projection will essentially returns a random vector. It is possible (unlikely but possible) that this random vector is again in Span(Q). So the second projection is again a random vector, which is unlikely to be orthogonal to Span(Q). Therefore we can imagine that we do not have orthogonality after two steps.

The trick to prove that twice is enough is to assume that Q^T x is not too small. The assumption essentially is needed to prove that the noise of the first projection is less than the magnitude of the first projection. So that whatever pollution is after the first projection is a problem for orthogonality, but is small enough to guarantee that the first projection is not in Span(Q).

With all that being said, I think the code works with high probability and is very well written so all good with me

Julien.

christoph-conrads added a commit to christoph-conrads/lapack that referenced this issue Jul 14, 2022
christoph-conrads added a commit to christoph-conrads/lapack that referenced this issue Dec 7, 2023
christoph-conrads added a commit to christoph-conrads/lapack that referenced this issue Dec 19, 2023
christoph-conrads added a commit to christoph-conrads/lapack that referenced this issue Dec 29, 2023
christoph-conrads added a commit to christoph-conrads/lapack that referenced this issue Jan 17, 2024
christoph-conrads added a commit to christoph-conrads/lapack that referenced this issue Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants