## Abstract

## Keywords

## Introduction

Helmer M, Warrington S, Mohammadi-Nejad A-R, Lisa J, Howell A, Rosand B, *et al.* (2020): On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations. https://doi.org/10.1101/2020.08.25.265546

Helmer M, Warrington S, Mohammadi-Nejad A-R, Lisa J, Howell A, Rosand B, *et al.* (2020): On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations. https://doi.org/10.1101/2020.08.25.265546

Helmer M, Warrington S, Mohammadi-Nejad A-R, Lisa J, Howell A, Rosand B, *et al.* (2020): On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations. https://doi.org/10.1101/2020.08.25.265546

*et al.* (2020): On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations. https://doi.org/10.1101/2020.08.25.265546

*et al.* (2020): On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations. https://doi.org/10.1101/2020.08.25.265546

### Part 1: Technical background of CCA and PLS

#### CCA/PLS optimization and nomenclature

Model | Relationship | Model weights | Latent variable | Correlation between original variables and latent variable |
---|---|---|---|---|

CCA | mode/association | canonical vector/coefficient | canonical variable/variate | structure correlation |

PLS | association | salience | score | loading |

Wegelin JA (2000): A Survey on Partial Least Squares (PLS) Methods, with Emphasis on the Two-Block Case. Retrieved from https://stat.uw.edu/sites/default/files/files/reports/2000/tr371.pdf

### Limitations of standard CCA/PLS

Wegelin JA (2000): A Survey on Partial Least Squares (PLS) Methods, with Emphasis on the Two-Block Case. Retrieved from https://stat.uw.edu/sites/default/files/files/reports/2000/tr371.pdf

### Standard CCA with PCA dimensionality reduction (PCA-CCA)

*et al.* (2020): On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations. https://doi.org/10.1101/2020.08.25.265546

### Regularized CCA (RCCA)

^{1}, which forces the weights to be small but does not make them zero. L2-norm regularization has been proposed for CCA (

Tuzhilina E, Tozzi L, Hastie T (2020): Canonical Correlation Analysis in high dimensions with structured regularization. Retrieved from http://arxiv.org/abs/2011.01650

where the two hyperparameters (${c}_{x}$, ${c}_{y}$) control the amount of regularization and provide a smooth transition between standard CCA (${c}_{x}={c}_{y}=0$, not regularized) and standard PLS (${c}_{x}={c}_{y}=1$, most regularized) (

^{2}, which sets some of the weight values to zero resulting in variable selection and promoting sparsity. Sparse solutions facilitate the interpretability of the model and may improve performance when only a subset of variables is relevant (

### Analytic frameworks for CCA/PLS models

*a priori*). In the predictive framework (Figure 2B), CCA/PLS is fitted on a training/optimization set and evaluated on a test/holdout set, thus the statistical inference is based on out-of-sample correlation. This procedure assesses the generalizability of the model, i.e., how well the association found in the training set generalizes to an independent test set. In the predictive framework, the hyperparameters are usually optimized, therefore the training/optimization set is further divided into a training and a validation set and the best hyperparameters are selected based on out-of-sample correlation in the validation set. In both descriptive and predictive frameworks, permutation inference (based on in-sample or out-of-sample correlation) is often used to assess the number of significant associative effects (

### Part 2: Demonstrations of CCA and PLS analyses

#### Description of experiments

Model | Analytical framework | Hyperparameter optimization | Model hyperparameter |
---|---|---|---|

High-dimensional data | |||

PCA-CCA | Descriptive | None (fixed) | Number of PCs |

PCA-CCA | Predictive | None (fixed) | Number of PCs |

PCA-CCA | Predictive | Data-driven | Number of PCs |

RCCA | Predictive | Data-driven | Amount of L2-norm regularization |

Standard PLS | Predictive | None | None |

SPLS | Predictive | Data-driven | Amount of L1-norm regularization |

Low-dimensional data | |||

Standard CCA | Predictive | None | None |

RCCA | Predictive | Data-driven | Amount of L2-norm regularization |

Standard PLS | Predictive | None | None |

SPLS | Predictive | Data-driven | Amount of L1-norm regularization |

Data | HCP | ADNI | Simulation | |||
---|---|---|---|---|---|---|

Low-dimensional | High-dimensional | Low-dimensional | High-dimensional | Low-dimensional | High-dimensional | |

Subjects | Healthy (N=1001) | Healthy (N=1001) | Healthy + clinical (N=592) | Healthy + clinical (N=592) | Not applicable (N=1000) | Not applicable (N=1000) |

Brain variables | Connectivity of 25 ICA components (D=300) | Connectivity of 200 ICA components (D=19900) | ROI-wise grey matter volume (D=120) | Voxel-wise grey matter volume (D=168130) | Not applicable (D=100) | Not applicable (D=20000) |

Behavioural variables | Behaviour, psychometrics, demographics (D=145) | Behaviour, psychometrics, demographics (D=145) | Items of MMSE questionnaire (D=31) | Items of MMSE questionnaire (D=31) | Not applicable (D=100) | Not applicable (D=100) |

### In-sample vs. out-of-sample correlation in high-dimensional data

*second*associative effect for standard PLS (PLS-2) and SPLS (SPLS-2) in the HCP dataset as it is the most similar to the first associative effects identified by the other models.

Model | Brain | Behaviour | Across-modality relationship | |||
---|---|---|---|---|---|---|

Stability of weights^{1} | Explained variance^{2} | Stability of weights^{1} | Explained variance^{2} | In-sample correlation^{3} | Out-of-sample correlation^{4} | |

ADNI dataset | ||||||

PCA-CCA (fixed PCs) | 0.86 (± 0.00) | 8.47 (± 0.16) | 0.85 (± 0.01) | 14.91 (± 0.23) | 0.70 (± 0.00) | 0.55 (± 0.01) |

PCA-CCA (data-driven PCs) | 0.70 (± 0.01) | 5.26 (± 0.25) | 0.93 (± 0.00) | 15.73 (± 0.13) | 0.83 (± 0.01) | 0.65 (± 0.01) |

RCCA (L2-reg. opt.) | 0.82 (± 0.00) | 5.47 (± 0.06) | 0.94 (± 0.00) | 16.63 (± 0.26) | 0.98 (± 0.00) | 0.66 (± 0.01) |

Standard PLS | 0.96 (± 0.00) | 21.54 (± 0.16) | 0.94 (± 0.00) | 18.64 (± 0.21) | 0.44 (± 0.00) | 0.43 (± 0.01) |

SPLS (L1-reg. opt.) | 0.83 (± 0.02) | 14.05 (± 0.13) | 0.96 (± 0.01) | 15.86 (± 0.42) | 0.60 (± 0.00) | 0.61 (± 0.01) |

HCP dataset | ||||||

PCA-CCA (fixed PCs) | 0.72 (± 0.01) | 0.42 (± 0.01) | 0.78 (± 0.01) | 2.67 (± 0.10) | 0.76 (± 0.00) | 0.47 (± 0.02) |

PCA-CCA (data-driven PCs) | 0.56 (± 0.02) | 0.35 (± 0.03) | 0.53 (± 0.04) | 3.73 (± 0.39) | 0.76 (± 0.01) | 0.45 (± 0.03) |

RCCA (L2-reg. opt.) | 0.78 (± 0.01) | 0.29 (± 0.01) | 0.88 (± 0.01) | 4.39 (± 0.18) | 1.00 (± 0.00) | 0.52 (± 0.02) |

Standard PLS-2 | 0.52 (± 0.04) | 0.50 (± 0.05) | 0.62 (± 0.05) | 8.07 (± 0.30) | 0.79 (± 0.02) | 0.21 (± 0.02) |

SPLS-2 (L1-reg. opt.) | 0.25 (± 0.04) | 0.48 (± 0.07) | 0.51 (± 0.05) | 7.23 (± 0.37) | 0.64 (± 0.04) | 0.25 (± 0.03) |

Simulated dataset | ||||||

PCA-CCA (fixed PCs) | 0.74 (± 0.01) | 0.76 (± 0.01) | 0.90 (± 0.00) | 1.82 (± 0.01) | 0.80 (± 0.00) | 0.67 (± 0.01) |

PCA-CCA (data-driven PCs) | 0.96 (± 0.00) | 0.85 (± 0.00) | 0.91 (± 0.00) | 1.95 (± 0.02) | 0.73 (± 0.01) | 0.70 (± 0.01) |

RCCA (L2-reg. opt.) | 0.93 (± 0.00) | 0.77 (± 0.00) | 0.97 (± 0.00) | 1.99 (± 0.01) | 0.83 (± 0.01) | 0.71 (± 0.01) |

Standard PLS | 0.94 (± 0.00) | 0.84 (± 0.00) | 0.97 (± 0.00) | 2.07 (± 0.01) | 0.81 (± 0.00) | 0.71 (± 0.01) |

SPLS (L1-reg. opt.) | 0.78 (± 0.03) | 0.84 (± 0.00) | 1.00 (± 0.00) | 1.94 (± 0.01) | 0.79 (± 0.01) | 0.73 (± 0.01) |

^{1}similarity of model weights measured by Pearson correlation between each pair of training sets of the outer data splits;

^{2}percent variance explained by the model relative to all within-modality variance in the training sets of the outer data splits;

^{3}correlation between the latent variables in the training sets of the outer data splits;

^{4}correlation between the latent variables in the test sets of the outer data splits; opt, optimized; PC, principal component; L1-reg., L1-norm regularization; L2-reg., L2-norm regularization.

^{1}L2-norm: ${\mathbf{w}}_{2}=\sum _{i}{w}_{i}^{2}$, where $\mathbf{w}=\left({w}_{1},{w}_{2},\phantom{\rule{0.25em}{0ex}}\dots ,{w}_{n}\right)$ is a vector of size $n$

^{2}L1-norm: ${\mathbf{w}}_{1}=\sum _{i}\left|{w}_{i}\right|$, where $\mathbf{w}=\left({w}_{1},{w}_{2},\phantom{\rule{0.25em}{0ex}}\dots ,{w}_{n}\right)$ is a vector of size $n$

*first*associative effect identified by RCCA is more similar to the

*second*effect found by standard PLS and SPLS (Figure 6). This is likely because the within-modality variances in the HCP dataset differ substantially from the identity matrix and therefore the difference between the objectives of CCA and PLS models is more pronounced (see Eqs. (Eq. 1), (Eq. 2)). The brain and behavioural weights were similar across the three models in both real datasets, especially the top-ranked variables (i.e., the variables with the highest weights). Similar to RCCA, standard PLS and SPLS recovered sufficiently the true weights of the generative model, however the SPLS model assigned fewer non-zero weights to non-relevant variables (Figure 7C-E). These results demonstrate that, when the signal is sparse, SPLS can lead to high true positive and high true negative rates of weight recovery (Table S2). Table S4 shows the sparsity of the associative effects identified by SPLS.

## Conclusion

*a priori*selection. Data-driven PCA-CCA has a comparable regularizing effect to RCCA. Sparsity (i.e., L1-norm regularization) can facilitate the interpretability and the generalizability of the models but it can also introduce instability. Sparsity is most useful when the associative effect itself is sparse (e.g., in the ADNI and simulated datasets). Data-driven PCA-CCA, RCCA and SPLS yielded similar model weights and accounted for similar variances.

*et al.* (2020): On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations. https://doi.org/10.1101/2020.08.25.265546

## Uncited reference

Dinga R, Schmaal L, Penninx BWJH, Veltman DJ, Marquand AF (2020): Controlling for effects of confounding variables on machine learning predictions. https://doi.org/10.1101/2020.08.17.255034

## Acknowledgements

## Supplementary Material

## References

- Statistical Challenges in “Big Data” Human Neuroimaging.
*Neuron.*2018; 97: 263-268 - Inference in the age of big data: Future perspectives on neuroscience.
*Neuroimage.*2017; 155: 549-564 - Towards Algorithmic Analytics for Large-scale Datasets.
*Nat Mach Intell.*2019; 1: 296-306 - Relations between two sets of variates.
*Biometrika.*1936; 28: 321 Wold H (1985): Partial least squares. In: Kotz S, Johnson N, editors. Encyclopedia of Statistical Sciences. New York: Wiley Online Library, pp 581–591.

- Somatosensory-Motor Dysconnectivity Spans Multiple Transdiagnostic Dimensions of Psychopathology.
*Biol Psychiatry.*2019; 86: 779-791 - Resting-state connectivity biomarkers define neurophysiological subtypes of depression.
*Nat Med.*2017; 23: 28-38 - Multivariate associations among behavioral, clinical, and multimodal imaging phenotypes in patients with psychosis.
*JAMA Psychiatry.*2018; 75: 386-395 - Topography and behavioral relevance of the global signal in the human brain.
*Sci Rep.*2019; 9: 1-10 - The relationship between spatial configuration and functional connectivity of brain regions.
*Elife.*2018; 7: 1-27 - Linked dimensions of psychopathology and connectivity in functional brain networks.
*Nat Commun.*2018; 9: 3003 - Multivariate Patterns of Brain-Behavior-Environment Associations in the Adolescent Brain and Cognitive Development Study.
*Biol Psychiatry.*2021; 89: 510-520 - Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population.
*Neuroimage.*2014; 84: 698-711 - Partial least squares correlation of multivariate cognitive abilities and local brain structure in children and adolescents.
*Neuroimage.*2013; 82: 284-294 - Neurobehavioural characterisation and stratification of reinforcement-related behaviour.
*Nat Hum Behav.*2020; 4: 544-558 - Significant correlation between a set of genetic polymorphisms and a functional brain network revealed by feature selection and sparse Partial Least Squares.
*Neuroimage.*2012; 63: 11-24 - A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.
*Biostatistics.*2009; 10: 515-534 - Functional corticostriatal connection topographies predict goal-directed behaviour in humans.
*Nat Hum Behav.*2017; 1: 1-9 - Correspondence between fMRI and SNP data by group sparse canonical correlation analysis.
*Med Image Anal.*2014; 18: 891-902 - Identification of neurobehavioural symptom groups based on shared brain mechanisms.
*Nat Hum Behav.*2019; 3: 1306-1318 - Patterns of thought: Population variation in the associations between large-scale network organisation and self-reported experiences at rest.
*Neuroimage.*2018; 176: 518-527 - A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
*Nat Neurosci.*2015; 18: 1565-1567 - Traces of Trauma: A Multivariate Pattern Analysis of Childhood Trauma, Brain Structure, and Clinical Phenotypes.
*Biol Psychiatry.*2020; 88: 829-842 Alnaes D, Kaufmann T, Marquand AF, Smith SM, Westlye LT (2020): Patterns of sociocognitive stratification and perinatal risk in the child brain. Proc Natl Acad Sci 202001517.

- Brain-behaviour modes of covariation in healthy and clinically depressed young people.
*Sci Rep.*2019; 911536 *et al.*(2020): On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations. https://doi.org/10.1101/2020.08.25.265546- Multiple Holdouts With Stability: Improving the Generalizability of Machine Learning Analyses of Brain–Behavior Relationships.
*Biol Psychiatry.*2020; 87: 368-376 - Evaluating the evidence for biotypes of depression: Methodological replication and extension of.
*NeuroImage Clin.*2019; 22101796 - A Tutorial on Canonical Correlation Methods.
*ACM Comput Surv.*2017; 50: 1-33 - A technical review of canonical correlation analysis for neuroscience applications.
*Hum Brain Mapp.*2020; 41: 3807-3833 Wang H-T, Smallwood J, Mourao-Miranda J, Xia CH, Satterthwaite TD, Bassett DS, Bzdok D (2020): Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists. Neuroimage 116745.

- Partial Least Squares (PLS) methods for neuroimaging: A tutorial and review.
*Neuroimage.*2011; 56: 455-475 - Canonical correlations with fallible data.
*Psychometrika.*1964; 29: 55-65 Rosipal R, Krämer N (2006): Overview and Recent Advances in Partial Least Squares. In: Saunders C, Grobelnik M, Gunn S, Shawe-Taylor J, editors. Subspace, Latent Structure and Feature Selection. Berlin, Heidelberg: Springer Berlin Heidelberg, pp 34–51.

- Perturbation analysis of the canonical correlations of matrix pairs.
*Linear Algebra Appl.*1994; 210: 3-28 Wegelin JA (2000): A Survey on Partial Least Squares (PLS) Methods, with Emphasis on the Two-Block Case. Retrieved from https://stat.uw.edu/sites/default/files/files/reports/2000/tr371.pdf

- Discovering genetic associations with high-dimensional neuroimaging phenotypes: A sparse reduced-rank regression approach.
*Neuroimage.*2010; 53: 1147-1159 - Canonical correlation analysis: A general parametric significance-testing system.
*Psychol Bull.*1978; 85: 410-416 - Reduced-rank regression for the multivariate linear model.
*J Multivar Anal.*1975; 5: 248-264 - Regression Shrinkage and Selection via the Lasso.
*J R Stat Soc Ser B.*1996; 58: 267-288 - Ridge Regression: Applications to Nonorthogonal Problems.
*Technometrics.*1970; 12: 69-82 - Regularization and variable selection via the elastic net.
*J R Stat Soc Ser B Stat Methodol.*2005; 67: 301-320 - Canonical ridge and econometrics of joint production.
*J Econom.*1976; 4: 147-166 - Canonical correlation analysis: An overview with application to learning methods.
*Neural Comput.*2004; 16: 2639-2664 - Regularized Generalized Canonical Correlation Analysis.
*Psychometrika.*2011; 76: 257-284 Tuzhilina E, Tozzi L, Hastie T (2020): Canonical Correlation Analysis in high dimensions with structured regularization. Retrieved from http://arxiv.org/abs/2011.01650

- A Sparse PLS for Variable Selection when Integrating Omics Data.
*Stat Appl Genet Mol Biol.*2008; 7 - Quantifying the Association between Gene Expressions and DNA-Markers by Penalized Canonical Correlation Analysis.
*Stat Appl Genet Mol Biol.*2008; 7 - Sparse canonical correlation analysis with application to genomic data integration.
*Stat Appl Genet Mol Biol.*2009; 8 (Article 1) - A multiple hold-out framework for Sparse Partial Least Squares.
*J Neurosci Methods.*2016; 271: 182-194 - Sparse canonical correlation analysis.
*Mach Learn.*2017; 83: 331-353 - Sparse CCA via Precision Adjusted Iterative Thresholding.
*Proc Int Conf Artif Intell Stat.*2013; (Retrieved from) - Sparse CCA: Adaptive estimation and computational barriers.
*Ann Stat.*2017; 45: 2074-2101 Mai Q, Zhang X (2019): An iterative penalized least squares approach to sparse canonical correlation analysis.

*Biometrics*734–744.- Optimal Whitening and Decorrelation.
*Am Stat.*2018; 72: 309-314 - To explain or to predict?.
*Stat Sci.*2010; 25: 289-310 Bzdok D, Engemann D, Thirion B (2020): Inference and Prediction Diverge in Biomedicine. Patterns (New York, NY

*)*1: 100119.- Single subject prediction of brain disorders in neuroimaging: Promises and pitfalls.
*Neuroimage.*2017; 145: 137-165 Abdi H (2010): Partial least squares regression and projection on latent structure regression (PLS Regression). Wiley Interdiscip Rev Comput Stat 2: 97–106.

- Permutation inference for canonical correlation analysis.
*Neuroimage.*2020; 220117065 - Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems.
*BMC Bioinformatics.*2011; 12: 253 - Multivariate morphological brain signatures predict patients with chronic abdominal pain from healthy control subjects.
*Pain.*2015; 156: 1545-1554 - A variant of sparse partial least squares for variable selection and data exploration.
*Front Neuroinform.*2014; 8 - Multi-level block permutation.
*Neuroimage.*2015; 123: 253-268 - Predictive modelling using neuroimaging data in the presence of confounds.
*Neuroimage.*2017; 150: 23-49 Dinga R, Schmaal L, Penninx BWJH, Veltman DJ, Marquand AF (2020): Controlling for effects of confounding variables on machine learning predictions. https://doi.org/10.1101/2020.08.17.255034

- Canonical Correlation Analysis for Identifying Biotypes of Depression.
*Biol Psychiatry Cogn Neurosci Neuroimaging.*2020; 5: 478-480 - Canonical correlation analysis of high-dimensional data with very small sample support.
*Signal Processing.*2016; 128: 449-458 - Functional and Optogenetic Approaches to Discovering Stable Subtype-Specific Circuit Mechanisms in Depression.
*Biol Psychiatry Cogn Neurosci Neuroimaging.*2019; 4: 554-566 - Reply to: A Closer Look at Depression Biotypes: Correspondence Relating to Grosenick et al. (2019).
*Biol Psychiatry Cogn Neurosci Neuroimaging.*2020; 5: 556 - A Closer Look at Depression Biotypes: Correspondence Relating to Grosenick et al. (2019).
*Biol Psychiatry Cogn Neurosci Neuroimaging.*2020; 5: 554-555 - Implementation of a new parcellation of the orbitofrontal cortex in the automated anatomical labeling atlas.
*Neuroimage.*2015; 122: 1-5 - Mini-mental state.
*J Psychiatr Res.*1975; 12: 189-198

## Article Info

### Publication History

### Publication stage

In Press Journal Pre-Proof### Footnotes

Code availability

The code used for the different CCA/PLS analyses is implemented in a CCA/PLS toolkit which is available at <u>http://www.mlnl.cs.ucl.ac.uk/resources/cca_pls_toolkit.html</u> together with a demo demonstrating how to use the toolkit for generating the SPLS results for the low-dimensional simulated dataset.

Disclosure

The authors report no biomedical financial interests or potential conflicts of interest.

### Identification

### Copyright

### User License

Creative Commons Attribution (CC BY 4.0) |## Permitted

- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes

Elsevier's open access license policy