Alan edelman and po ru loh mit applied mathematics random matrices october 10 2010
Download
1 / 31

An NLA Look at σ min Universality (& the Stoch Diff Operator) - PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on

Alan Edelman and Po-Ru Loh MIT Applied Mathematics Random Matrices October 10, 2010. An NLA Look at σ min Universality (& the Stoch Diff Operator). Outline. History of σ min universality Proof idea of Tao-Vu (in NLA language) What the proof doesn't prove

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' An NLA Look at σ min Universality (& the Stoch Diff Operator)' - hu-grimes


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Alan edelman and po ru loh mit applied mathematics random matrices october 10 2010

Alan Edelman and Po-Ru Loh

MIT Applied Mathematics

Random Matrices

October 10, 2010

An NLA Look at σmin Universality(& the Stoch Diff Operator)


Outline
Outline

  • History of σmin universality

  • Proof idea of Tao-Vu (in NLA language)

  • What the proof doesn't prove

  • Do stochastic differential operators say more?


A bit of history
A bit of history

  • E ('89): Explicit formula (for finite iid Gaussian n x n matrices) for distribution of σn

    • Analytic techniques: Integration of a joint density function, Tricomi functions, Kummer's differential equation, etc.

  • Striking convergence to limiting distribution even in non-Gaussian case

    • Parallel Matlab in the MIT computer room

    • “Please don’t turn off” (no way to save work in the background on those old Sun workstations)




Extending to non gaussians how
Extending to non-Gaussians: How? entries

  • Central limit theorem is a mathematical statement and a “way of life”

    • Formally: a (series of) theorems – with assumptions (e.g., iid) – and if the assumptions are not met, the theorems don't apply

    • Way of life: “When a bunch of random variables are mixed up enough, they behave as if Gaussian”

    • Example from our discussions: Does the square root of a sum of squares of (almost) iid random variables go to χn? Probably an application of CLT but not precisely CLT without some tinkering (what does “go to” mean when n is changing?)


Outline1
Outline entries

  • History of σmin universality

  • Proof idea of Tao-Vu (in NLA language)

  • What the proof doesn't prove

  • Do stochastic differential operators say more?


Tao vu 09 the rigorous proof
Tao-Vu ('09) “the rigorous proof”! entries

  • Basic idea (NLA reformulation)...Consider a 2x2 block QR decomposition of M:

    1. The smallest singular value of R22, scaled by √n/s, is a good estimate for σn!

    2. R22 (viewed as the product Q2T M2) is roughly s x s Gaussian

n-s s

n-s s

M = (M1 M2) = QR = (Q1 Q2)( )

Note: Q2T M2 = R22

R11 R12 n-s

R22 s


Basic idea part 1 s n s n
Basic idea part 1: σ entriess√n/s ≈ σn

  • The smallest singular value of M is the reciprocal of the largest singular value of M-1

  • Singular values of R22 are exactly the inverse singular values of an s-row subsample of M-1

  • The largest singular value of an approximately low-rank matrix reliably shows up in a random sample (Vempala et al.; Rokhlin, Tygert et al.)

    • Referred to as “property testing” in theoretical CS terminology


Basic idea part 2 r 22 gaussian
Basic idea part 2: R entries22 ≈ Gaussian

  • Recall R22 = Q2T M2

  • Note that Q1 is determined by M1 and thus independent of M2

  • Q2 can be any orthogonal completion of Q1

  • Thus, multiplying by Q2T “randomly stirs up” entries of the (independent) n x s matrix M2

  • Any “rough edges” of M2 should be smoothed away in the s x s result R22


Basic idea recap
Basic idea (recap) entries

  • 1. The smallest singular value of R22, scaled by √n/s, is a good estimate for σn!

  • 2. R22 (viewed as the product Q2T M2) ≈ s x s Gaussian

  • We feel comfortable (from our CLT “way of life”) that part 2 works well

  • How well does part 1 work?


Outline2
Outline entries

  • History of σmin universality

  • Proof idea of Tao-Vu (in NLA language)

  • What the proof doesn't prove

  • Do stochastic differential operators say more?


Fluctuates around 10 of truth at least for this matrix

Fluctuates around ±10% of truth entries

(at least for this matrix)

How good is the s x s estimator?


A few more tries

A few more tries... entries

How good is the s x s estimator?


A lot more tries again accurate to say 10

A lot more tries... again, accurate to say 10% entries

More s x s estimator experimentsGaussian entries, ±1 entries

15

15

15

15


A lot more tries now comparing matrix sizes s 10 to 50 of n n 200 a bit better

A lot more tries, now comparing matrix sizes entries

s = 10 to 50% (of n)... n = 200 a bit better

More s x s estimator experimentsn = 100 vs. n = 200


How good is the s x s estimator
How good is the s x s estimator? entries

  • On one hand, surprisingly good, especially when not expecting any such result

    • “Did you know you can get the smallest singular value to within 10% just by looking at a corner of the QR?”

  • On the other hand, still clearly an approximation: n would need to be huge in order to reach human-indistinguishable agreement


Bounds from the proof
Bounds from the proof entries

  • “C is a sufficiently large const (104 suffices)”

  • Implied constants in O(...) depend on E|ξ|C

    • For ξ = Gaussian, this is 9999!!

  • s = n500/C

    • To get s = 10, n ≈ 1020?

  • Various tail bounds go as n-1/C

    • To get 1% chance of failure, n ≈ 1020000??




What the proof doesn t prove

… but the truth is far stronger than what the approximation can tell us

What the proof doesn't prove

  • The s x s estimator is pretty nifty...


Outline3
Outline approximation can tell us

  • History of σmin universality

  • Proof idea of Tao-Vu (in NLA language)

  • What the proof doesn't prove

  • Do stochastic differential operators say more?


Can another approach get us closer to the truth
Can another approach get us closer to the truth? approximation can tell us

  • Recall the standard numerical SVD algorithm starting with Householder bidiagonalization

  • In the case of Gaussian random matrices, each Householder step puts a χ distribution on the bidiagonal and leaves the remaining subrectangle Gaussian

  • At each stage, all χ's and Gaussians in the entries are independent of each other (due to isotropy of multivariate Gaussians)



A stochastic operator connection
A stochastic operator connection approximation can tell us

  • E ('03) argued that the bidiagonal of χ's can be viewed as a discretization of a stochastic Bessel operator

  • – √x d/dx + “noise” / √2

  • As n grows, the discretization becomes smoother, and the (scaled) singular value distributions of the matrices ought to converge to those of the operator


A stochastic operator connection1
A stochastic operator connection approximation can tell us


k=1 approximation can tell us

How close are we if we use kxk chi’s in the bottom,

rest Gaussian?

n=200; t=1000000; v=zeros(t,1);

for k=1

x=sqrt(n:-1:1);

y=sqrt(n-1:-1:1);

v=zeros(t,1);

k, endx=(n-k+1:n); endy=(n-k+1:n-1);

dofx=k:-1:1;

dofy=(k-1):-1:1;

for i=1:t

yy=y+randn(1,n-1)/sqrt(2);

xx=x+randn(1,n)/sqrt(2);

xx(endx)=sqrt(chi2rnd(dofx));

yy(endy)=sqrt(chi2rnd(dofy));

v(i)=min(bidsvd(xx,yy));

if rem(i,500)==0,[i k],end

end

hold off

v=v*sqrt(n);

n=100

n=200

k=2

k=3

k=0

k=4

Area of Detail

k=5

k=6

k=7..10

k=inf

1 Million Trials in each experiment

(Probably as n→inf, there is still a little upswing for finite k?)


A stochastic operator connection2
A stochastic operator connection approximation can tell us

  • Ramírez and Rider ('09) produced a proof

  • In further work with Virág, they have applied the SDO machinery to obtain similar convergence results for largest eigenvalues of beta distributions, etc.


Extending to non gaussians how1
Extending to non-Gaussians: How? approximation can tell us

  • The bidiagonalization mechanism shouldn't care too much about the difference...

  • Each Householder spin “stirs up” the entries of the remaining subrectangle, making them “more Gaussian” (according to Berry-Esseen, qTx is close to Gaussian as long as entries of q are evenly distributed)

  • Almost-Gaussians combine into (almost-independent) almost-χ's

  • Original n2 entries compress to 2n-1


Sdo mechanism
SDO mechanism approximation can tell us

  • Old intuition: non-Gaussian n x n matrices act like Gaussian n x n matrices (which we understand)

  • New view: non-Gaussian and Gaussian n x n matrices are both discretizations of the same object

  • Non-random discretizations have graininess in step size, where to take finite differences, etc.

  • SDO discretizations have issues like almost-independence... but can be overcome?


Some grand questions
Some grand questions approximation can tell us

  • Could an SDO approach circumvent property testing (sampling the bottom-right s x s) and thereby get closer to the truth?

  • Does the mathematics of today have enough technology for this? (If not, can someone invent the new technology we need?)


ad