Multiple Dataset Order Selection Algorithms

We present an algorithm, originally introduced in [2], for estimating the number of correlated components between two datasets when the dimensions of the datasets are large though number of samples is limited, the sample-poor regime, which is the case for multimodal fusion of medical imaging data [1]. This algorithm solves this problem by performing a principal component analysis (PCA) rank-reduction preprocessing step before estimating the number of correlated components using the canonical correlation analysis (CCA). The proposed technique, principal component analysis and canonical correlation analysis (PCA-CCA), determines both the number of dimensions that should be retained through the PCA step and the number of correlated signals found using CCA using reduced-rank versions of the classical Bartlett-Lawley hypothesis test [3], [4] and the minimum description length information-theoretic criterion, respectively. Though motivated by the case where the number of samples is much smaller than the dimension of the datasets, PCA-CCA also shows high performance in the sample-rich regime as well.

    Principal Component Analysis and Canonical Correlation Analysis (PCA-CCA) (PCA-CCA) [1], [2]

PCA-CCA

PCA-CCA is based upon a sequence of hypothesis tests, similar to the classical Bartlett-Lawley test [3], [4].


References:

[1] Y. Levin-Schwartz. Y. Song, P. J. Schreier, V. D. Calhoun, T. Adali, "Sample-poor estimation of order and correlated signal subspace with application to fusion of medical imaging data," NeuroImage, vol. 134, pp. 486-493, 2016.
[2] Y. Song, P. J. Schreier, D. Ramirez and T. Hasija, "Canonical correlation analysis of high-dimensional data with very small sample support," Signal Processing, vol. 128, pp. 449-458, 2016.
[3] M. S. Bartlett, "The statistical significance of canonical correlations," Biometrika, vol. 32, no. 1, pp. 29-37, 1941.
[4] D. N. Lawley, "Tests of significance in canonical analysis," Biometrika, vol. 32, no. 1, pp. 59-66, 1959.