Kernel-based test is widely used non-parametric statistic to compare two distributions, particularly multivariate data distributions. However, such kernel-based test, notably the Maximum Mean Discrepancy (MMD), is known to face difficulties for high-dimensional data as well as computational challenges in practice. This talk will start from analyzing kernel tests applied to low-dimensional manifold data embedded in high dimensional space, and theoretically show that curse-of-dimensionality can be automatically avoided using local kernels. Our non-asymptotic result proves test power at finite sample size, and holds for a class of regular and decay kernel functions that are not necessarily positive semi-definite. We then discuss the practical challenges of kernel tests, primarily the choice of kernel bandwidth and the computational bottleneck. For the former, we show a recent analysis of k-nearest-neighbor self-tuned kernel which provably reduces variance error and improves the stability of kernel methods at places where data density can be low (joint work with Hau-Tieng Wu, Duke). For the latter, we revisit neural network classification two-sample tests, which show empirical advantage yet lack full theoretical understanding, especially that of a trained neural network. To the end of understanding training dynamics of neural network two-sample tests, we introduce neural tangent kernel (NTK) MMD, which provably approximates kernel MMD of a finite-width NTK and consequently enjoys theoretical kernel test power guarantee. In practice, NTK-MMD can be computed from small-batch one-pass stochastic gradient descent on the training split, and allows calibration of test threshold via test-split-only bootstrap (thus avoiding evaluating network gradients on the test samples). Joint work with Yao Xie, Georgia Tech.