Abstract: Many statistical models for real-world data have the following structure. Let A be a low rank matrix, and E a matrix of the same dimension. The objective is to approximate a parameter f(A) from the noisy data A’ = A + E. A represents the ground truth, A’ the observable data, and E the noise. In statistical settings, E is taken to be random. While this set-up is simple, it represents an extremely rich environment in which to study problems in data science.
In this talk, I will discuss how spectral perturbation theory is employed to solve problems in statistics and data science. However, classical perturbation bounds, like the Davis-Kahan theorem, are wasteful in the random E setting. This motivates us to look at the problem through the lens of random matrix theory. I will discuss our improved spectral perturbation bounds and applications.