RADM Algorithm

import numpy as np
import pandas as pd
from scipy.linalg import sqrtm
import random

import sys
sys.path.append('../')
import RandomMatrix as RM

np.random.seed(140)

E. Brian Davies' paper Approximate Diagonalization proposes an algorithm for improved numerical stability for diagonlization computations. The Randomized Appromiate Diagonalization Method (RADM) can be used to compute an approximate diagonalization for matricies with large eigenvector condition numbers.

Suppose $A$ can be diagonlized as $VDV^{-1}$ where $V$ has a large condition number. As we have seen before, working with matrices with small condition numbers is desirable when working in a finite arithmetic enviroment (like computers). RADM suggests introducing a tiny, random perturbation to $A$ before diagonlizing in order to decrease the eigenvector condition number. So instead of diagonlizing $A$, we diagonlize $A+ \delta E$ where $E$ is a gaussian matrix with iid standard normals in its entries and $\delta$ is some small positive number. In this

We will demonstrate the effectiveness of RADM by recreating some of the emperical results from Daveis' paper. Unlike Daveis, however, our perturbation will come from a Ginibre matrix. We will begin by running Daveis' experiment of finding a value of $\delta$ for a certain class of matrices.

Measuring Accuracy

Daveis proposed the following formula for measuring the accuracy of an approximate diagonalization of $A$ by diagonalizing $A+E=VDV^{-1}$

$$\sigma(A,V,E,\epsilon) = \kappa(V) \epsilon + ||E||$$

where $\epsilon$ is the constant degree of accuracy.

The $\kappa(V) \epsilon$ term accounts for the errors caused by numerical inprecision and $||E||$ accounts for error caused by introducing the perturbation. We implement the function below.

def sigma(A, V, E, ϵ=1e-16):
    return np.linalg.cond(V) * ϵ + np.linalg.norm(E, ord=2)

Finding the optimal $\delta$

Suppose we define $A$ to be a 20 by 20 matrix in the following way

$$ (A)_{r,s} = \begin{cases} \frac{r}{20} & \text{if } s=r+1 \\ 0 & \text{else} \end{cases} $$

The matrix is implemented below

def A_matrix(r,s):
    if s == r+1:
        return r/20
    return 0
A = RM.Generate_Custom(A_matrix, 20, 20)

The algorithm dictates that we introduce a perturbation to $A$ of the form $\delta Z$. Let's try to find the optimal value of $\delta$ for this matrix (the choice of $\delta$ that minimizes $\sigma(A,V,E,\epsilon)$).

Z = RM.Generate_Ginibre(20)
sigma_vals = []
log_condition = []
δs = np.append(10.0**(-np.arange(16)), 0)
for δ in δs:
    E = δ * Z/np.linalg.norm(Z, ord=2)
    eigenvalues, V = np.linalg.eig(A+E)
    sigma_vals.append(sigma(A,V,E))
    log_condition.append(np.log10(np.linalg.cond(V)))
pd.DataFrame({"$\delta$": δs, "$$\sigma(A,V,E,\epsilon)$$": sigma_vals, "$$log_{10}(\kappa(V))$$": log_condition})

	$\delta$	$$\sigma(A,V,E,\epsilon)$$	$$log_{10}(\kappa(V))$$
0	1.000000e+00	1.000000e+00	1.260679
1	1.000000e-01	1.000000e-01	1.813794
2	1.000000e-02	1.000000e-02	2.825438
3	1.000000e-03	1.000000e-03	3.739809
4	1.000000e-04	1.000000e-04	4.554702
5	1.000000e-05	1.000003e-05	5.440051
6	1.000000e-06	1.000250e-06	6.397352
7	1.000000e-07	1.023161e-07	7.364753
8	1.000000e-08	3.041050e-08	8.309854
9	1.000000e-09	2.313358e-07	9.362361
10	1.000000e-10	1.521347e-06	10.182200
11	1.000000e-11	1.163115e-05	11.065622
12	1.000000e-12	9.933064e-05	11.997083
13	1.000000e-13	6.349410e-04	12.802733
14	1.000000e-14	4.449525e-03	13.648314
15	1.000000e-15	3.306881e-02	14.519419
16	0.000000e+00	inf	inf

The optimal value of $\delta$ is $10^{-8}$. Our results are nearly identical to Daveis' paper, except we have chosen to use a Ginibre matrix for the perturbation.

Square Root of Matrices

The square root of a matrix $A=VDV^{-1}$ can be defined as follows

$$\sqrt{A} = V\sqrt{D}V^{-1}$$

Where $\sqrt{D}$ is the diagonal matrix $D$ but with all its values square rooted. This resembles a square root because $(\sqrt{A})^2 = A$ as shown in the next line

$$ (\sqrt{A})^2 = V\sqrt{D}V^{-1}V\sqrt{D}V^{-1}= VDV^{-1} = A$$

So square rooting a matrix is helpful when you have a matrix $A$ and you want to find a matrix $B$ such that $B^2=A$. Let us try to find the square root of $A$ (as defined in the earlier section) in python

sqrtm(A);

Failed to find a square root.

Perhaps we will have better luck using the RADM algorithm with $\delta=10^{-8}$.

δ = 10**(-8) 
Z = RM.Generate_Ginibre(20)
E = δ * Z/np.linalg.norm(Z, ord=2)
eigenvalues, V = np.linalg.eig(A+E)
D_sqrt = np.sqrt(np.diag(eigenvalues))
A_sqrt = V @ D_sqrt @ np.linalg.inv(V)

Seems like our code ran successfully, but how do we know if it is any good? Lets try to measure $||A - \left( \sqrt{A} \right)^2||$

np.linalg.norm(A-np.linalg.matrix_power(A_sqrt, 2), ord=2)

4.1103251651171263e-07

Seeing that this difference is extremely small, RADM was very effective in finding an approximate square root for our matrix.

Comparing Results

So is using RADM always the best approach? Not necessarily. Lets compare the performance of Scipy's sqrtm function and the RADM algorithm. We will make the following adjustment to the definition of $A$, call it ${}_{c}A$, so that the sqrtm function will be able to find an approximate square root.

$$ {}_{c}A_{r,s} = \begin{cases} \frac{r}{20} & \text{if } s=r+1 \\ c & \text{if } s=r \\ 0 & \text{else} \end{cases} $$

def c_A_matrix(r,s,n,c):
    if s == r+1:
        return r/n
    if s == r:
        return c
    return 0

We will try to compute the square root of $_{c}A$ directly using Scipy and call that solution $_{c}S$. Then we will take the square root of $_{c}A$ ourselves using the RADM algorithm with $\delta=10^{-8}$ and call that solution $_{c}R$.

c_R2_minus_c_A_norm = []
c_S2_minus_c_A_norm = []
c_vals = np.array([.8, .6, .4, .3,.2,.1, .05, .02, .01])
for c in c_vals:
    c_A = RM.Generate_Custom(lambda i,j: c_A_matrix(i,j,c,20), 20, 20)
    c_S = sqrtm(c_A)
    Z = RM.Generate_Ginibre(20)
    E = δ * Z/np.linalg.norm(Z, ord=2)
    eigenvalues, V = np.linalg.eig(c_A+E)
    D_sqrt = np.sqrt(np.diag(eigenvalues))
    c_R = V @ D_sqrt @ np.linalg.inv(V)
    c_R2_minus_c_A_norm.append(np.linalg.norm(np.linalg.matrix_power(c_R, 2)-c_A, ord=2))
    c_S2_minus_c_A_norm.append(np.linalg.norm(np.linalg.matrix_power(c_S, 2)-c_A, ord=2))
pd.DataFrame({"$c$": c_vals, 
              "$$||_{c}R^2 - \ _{c}A||$$": c_R2_minus_c_A_norm, 
              "$$||_{c}S^2 - \ _{c}A||$$": c_S2_minus_c_A_norm})

	$c$	$$\|\|_{c}R^2 - \ _{c}A\|\|$$	$$\|\|_{c}S^2 - \ _{c}A\|\|$$
0	0.80	0.000067	5.851644e-15
1	0.60	0.000017	5.777242e-15
2	0.40	0.000096	1.484274e-14
3	0.30	0.000061	2.408642e-13
4	0.20	0.000114	3.532053e-11
5	0.10	0.000799	8.002574e-07
6	0.05	0.003461	6.703286e-02
7	0.02	0.237373	3.021511e+06
8	0.01	1.028855	7.097185e+11

As we can see, for $c$ values close to 1, Numpy's sqrtm function is much more accurate for computing $\sqrt{_{}{}_{c}A}$. But as $c$ gets close to 0, the error explodes for sqrtm whereas the error only marginally increases for the RADM algorithm. Why might this be the case? Let's see what happens to the condition number of the matrix of eigenvectors Python uses to diagonlize ${}_{c}A$ as $c$ gets small.

condition_numbers = []
for c in c_vals:
    c_A = np.fromfunction(np.vectorize(lambda i,j: c_A_matrix(i,j,c,20), otypes=[float]), (20,20))
    eigenvalues, V = np.linalg.eig(c_A)
    condition_numbers.append(np.linalg.cond(V))
pd.DataFrame({"$c$": c_vals, 
              "$$\log_{10}(\kappa(S))$$": np.log10(condition_numbers),
              "$$||_{c}R^2 - \ _{c}A||$$": c_R2_minus_c_A_norm, 
              "$$||_{c}S^2 - \ _{c}A||$$": c_S2_minus_c_A_norm})

	$c$	$$\log_{10}(\kappa(S))$$	$$\|\|_{c}R^2 - \ _{c}A\|\|$$	$$\|\|_{c}S^2 - \ _{c}A\|\|$$
0	0.80	276.686149	0.000067	5.851644e-15
1	0.60	278.935046	0.000017	5.777242e-15
2	0.40	282.104689	0.000096	1.484274e-14
3	0.30	284.353586	0.000061	2.408642e-13
4	0.20	287.523229	0.000114	3.532053e-11
5	0.10	292.941769	0.000799	8.002574e-07
6	0.05	298.360309	0.003461	6.703286e-02
7	0.02	305.523232	0.237373	3.021511e+06
8	0.01	inf	1.028855	7.097185e+11

As we can see, the RADM algorithm is helpful for matrices with very large eigenvector condition numbers. If the condition number of a matrix is already small, the matrix should be well behaved and produce numerically stable results without needing any perturbation.

Regularization Theorem

The following theorem was proven by Jess Banks, Archit Kulkarni, Satyaki Mukherjee, and Nikhil Srivastava in their paper Gaussian Regularization of the Pseudospectrum and Davies’ Conjecture.

Theorem: Suppose $A \in \mathbb{C}^{n \times n}$ and $\delta \in (0,1)$. Then there is a matrix $E \in \mathbb{C}^{n \times n}$ such that $||E|| \leq \delta ||A||$ and

$$\kappa_{V}(A+E) \leq 4n^{3/2} \left(1+\frac{1}{\delta} \right)$$

This theorem implies that if the eigenvector condition number of a matrix is huge, a small perturbation can dramatically help improve numerical stability.