Advertisement

Canonical Correlation Analysis and Partial Least Squares for identifying brain-behaviour associations: a tutorial and a comparative study

  • Agoston Mihalik
    Correspondence
    Corresponding author: Agoston Mihalik (Department of Psychiatry, University of Cambridge, Cambridge CB2 0SZ, UK; +44 7552235333; )
    Affiliations
    Centre for Medical Image Computing, Department of Computer Science, University College London, United Kingdom

    Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, United Kingdom

    Department of Psychiatry, University of Cambridge, United Kingdom
    Search for articles by this author
  • James Chapman
    Affiliations
    Centre for Medical Image Computing, Department of Computer Science, University College London, United Kingdom

    Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, United Kingdom
    Search for articles by this author
  • Rick A. Adams
    Affiliations
    Centre for Medical Image Computing, Department of Computer Science, University College London, United Kingdom

    Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, United Kingdom

    Wellcome Centre for Human Neuroimaging, University College London, United Kingdom
    Search for articles by this author
  • Nils R. Winter
    Affiliations
    Institute of Translational Psychiatry, University of Münster, Germany
    Search for articles by this author
  • Fabio S. Ferreira
    Affiliations
    Centre for Medical Image Computing, Department of Computer Science, University College London, United Kingdom

    Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, United Kingdom
    Search for articles by this author
  • John Shawe-Taylor
    Affiliations
    Department of Computer Science, University College London, United Kingdom
    Search for articles by this author
  • Janaina Mourão-Miranda
    Affiliations
    Centre for Medical Image Computing, Department of Computer Science, University College London, United Kingdom

    Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, United Kingdom
    Search for articles by this author
  • for theAlzheimer’s Disease Neuroimaging Initiative
    Author Footnotes
    ∗ Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf
  • Author Footnotes
    ∗ Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf
Open AccessPublished:August 08, 2022DOI:https://doi.org/10.1016/j.bpsc.2022.07.012

      Abstract

      Canonical Correlation Analysis (CCA) and Partial Least Squares (PLS) are powerful multivariate methods for capturing associations across two modalities of data (e.g., brain and behaviour). However, when the sample size is similar or smaller than the number of variables in the data, CCA and PLS models may overfit, i.e., find spurious associations that generalise poorly to new data. Dimensionality reduction and regularized extensions of CCA and PLS have been proposed to address this problem, yet most studies using these approaches have some limitations.
      This work gives a theoretical and practical introduction into the most common CCA/PLS models and their regularized variants. We examine the limitations of standard CCA and PLS when the sample size is similar or smaller than the number of variables. We discuss how dimensionality reduction and regularization techniques address this problem and explain their main advantages and disadvantages. We highlight crucial aspects of the CCA/PLS analysis framework, including optimising the hyperparameters of the model and testing the identified associations for statistical significance. We apply the described CCA/PLS models to simulated data and real data from the Human Connectome Project and the Alzheimer’s Disease Neuroimaging Initiative (both of n>500). We use both low and high dimensionality versions of each data (i.e., ratios between sample size and variables in the range of ∼1-10 and ∼0.1-0.01) to demonstrate the impact of data dimensionality on the models. Finally, we summarize the key lessons of the tutorial.

      Keywords

      Introduction

      Neuroimaging datasets with sample sizes of n>1000 (e.g., UK Biobank, Human Connectome Project, Alzheimer’s Disease Neuroimaging Initiative) represent a unique opportunity to advance population neuroscience and mental health (
      • Smith S.M.
      • Nichols T.E.
      Statistical Challenges in “Big Data” Human Neuroimaging.
      ,
      • Bzdok D.
      • Yeo B.T.T.
      Inference in the age of big data: Future perspectives on neuroscience.
      ,
      • Bzdok D.
      • Nichols T.E.
      • Smith S.M.
      Towards Algorithmic Analytics for Large-scale Datasets.
      ). These datasets comprise multiple data modalities (e.g., structural Magnetic Resonance Imaging (MRI), resting-state functional MRI, mental health, cognition, environmental factors and genetics), several of which can be high-dimensional, meaning there are hundreds or thousands of variables per subject. Understanding the links across these different modalities is fundamental for enabling new discoveries, however, analysing multimodal datasets with more variables than samples poses technical challenges.
      The most established methods to find associations across multiple modalities of multivariate data are Canonical Correlation Analysis (CCA) (
      • Hotelling H.
      Relations between two sets of variates.
      ) and Partial Least Squares (PLS) (

      Wold H (1985): Partial least squares. In: Kotz S, Johnson N, editors. Encyclopedia of Statistical Sciences. New York: Wiley Online Library, pp 581–591.

      ). CCA and PLS have recently become very popular with numerous applications linking brain imaging to behaviour or genetics (e.g., (
      • Kebets V.
      • Holmes A.J.
      • Orban C.
      • Tang S.
      • Li J.
      • Sun N.
      • et al.
      Somatosensory-Motor Dysconnectivity Spans Multiple Transdiagnostic Dimensions of Psychopathology.
      ,
      • Drysdale A.T.
      • Grosenick L.
      • Downar J.
      • Dunlop K.
      • Mansouri F.
      • Meng Y.
      • et al.
      Resting-state connectivity biomarkers define neurophysiological subtypes of depression.
      ,
      • Moser D.A.
      • Doucet G.E.
      • Lee W.H.
      • Rasgon A.
      • Krinsky H.
      • Leibu E.
      • et al.
      Multivariate associations among behavioral, clinical, and multimodal imaging phenotypes in patients with psychosis.
      ,
      • Li J.
      • Bolt T.
      • Bzdok D.
      • Nomi J.S.
      • Yeo B.T.T.
      • Spreng R.N.
      • Uddin L.Q.
      Topography and behavioral relevance of the global signal in the human brain.
      ,
      • Bijsterbosch J.D.
      • Woolrich M.W.
      • Glasser M.F.
      • Robinson E.C.
      • Beckmann C.F.
      • Van Essen D.C.
      • et al.
      The relationship between spatial configuration and functional connectivity of brain regions.
      ,
      • Xia C.H.
      • Ma Z.
      • Ciric R.
      • Gu S.
      • Betzel R.F.
      • Kaczkurkin A.N.
      • et al.
      Linked dimensions of psychopathology and connectivity in functional brain networks.
      ,
      • Modabbernia A.
      • Janiri D.
      • Doucet G.E.
      • Reichenberg A.
      • Frangou S.
      Multivariate Patterns of Brain-Behavior-Environment Associations in the Adolescent Brain and Cognitive Development Study.
      ,
      • Avants B.B.
      • Libon D.J.
      • Rascovsky K.
      • Boller A.
      • McMillan C.T.
      • Massimo L.
      • et al.
      Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population.
      ,
      • Ziegler G.
      • Dahnke R.
      • Winkler A.D.
      • Gaser C.
      Partial least squares correlation of multivariate cognitive abilities and local brain structure in children and adolescents.
      ,
      • Jia T.
      • Ing A.
      • Quinlan E.B.
      • Tay N.
      • Luo Q.
      • Francesca B.
      • et al.
      Neurobehavioural characterisation and stratification of reinforcement-related behaviour.
      ,
      • Le Floch E.
      • Guillemot V.
      • Frouin V.
      • Pinel P.
      • Lalanne C.
      • Trinchera L.
      • et al.
      Significant correlation between a set of genetic polymorphisms and a functional brain network revealed by feature selection and sparse Partial Least Squares.
      ,
      • Witten D.M.
      • Tibshirani R.
      • Hastie T.
      A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.
      ,
      • Marquand A.F.
      • Haak K.V.
      • Beckmann C.F.
      Functional corticostriatal connection topographies predict goal-directed behaviour in humans.
      ,
      • Lin D.
      • Calhoun V.D.
      • Wang Y.P.
      Correspondence between fMRI and SNP data by group sparse canonical correlation analysis.
      ,
      • Ing A.
      • Sämann P.G.
      • Chu C.
      • Tay N.
      • Biondo F.
      • Robert G.
      • et al.
      Identification of neurobehavioural symptom groups based on shared brain mechanisms.
      ,
      • Wang H.T.
      • Bzdok D.
      • Margulies D.
      • Craddock C.
      • Milham M.
      • Jefferies E.
      • Smallwood J.
      Patterns of thought: Population variation in the associations between large-scale network organisation and self-reported experiences at rest.
      ,
      • Smith S.M.
      • Nichols T.E.
      • Vidaurre D.
      • Winkler A.M.
      • Behrens T.E.J.
      • Glasser M.F.
      • et al.
      A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
      ,
      • Popovic D.
      • Ruef A.
      • Dwyer D.B.
      • Antonucci L.A.
      • Eder J.
      • Sanfelici R.
      • et al.
      Traces of Trauma: A Multivariate Pattern Analysis of Childhood Trauma, Brain Structure, and Clinical Phenotypes.
      ,

      Alnaes D, Kaufmann T, Marquand AF, Smith SM, Westlye LT (2020): Patterns of sociocognitive stratification and perinatal risk in the child brain. Proc Natl Acad Sci 202001517.

      ,
      • Mihalik A.
      • Ferreira F.S.
      • Rosa M.J.
      • Moutoussis M.
      • Ziegler G.
      • Monteiro J.M.
      • et al.
      Brain-behaviour modes of covariation in healthy and clinically depressed young people.
      ,

      Helmer M, Warrington S, Mohammadi-Nejad A-R, Lisa J, Howell A, Rosand B, et al. (2020): On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations. https://doi.org/10.1101/2020.08.25.265546

      )). However, when the variables in at least one modality (e.g., brain) outnumber the sample size, standard CCA and PLS models may overfit, i.e., more likely to find spurious associations that generalize poorly to independent samples (see e.g., (

      Helmer M, Warrington S, Mohammadi-Nejad A-R, Lisa J, Howell A, Rosand B, et al. (2020): On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations. https://doi.org/10.1101/2020.08.25.265546

      ,
      • Mihalik A.
      • Ferreira F.S.
      • Moutoussis M.
      • Ziegler G.
      • Adams R.A.
      • Rosa M.J.
      • et al.
      Multiple Holdouts With Stability: Improving the Generalizability of Machine Learning Analyses of Brain–Behavior Relationships.
      ,
      • Dinga R.
      • Schmaal L.
      • Penninx B.W.J.H.
      • van Tol M.J.
      • Veltman D.J.
      • van Velzen L.
      • et al.
      Evaluating the evidence for biotypes of depression: Methodological replication and extension of.
      )). Moreover, there is no unique standard CCA solution when the number of variables exceeds the sample size. Two approaches have been proposed to address this problem: i) reducing the dimensionality of the data with Principal Component Analysis (PCA) (
      • Li J.
      • Bolt T.
      • Bzdok D.
      • Nomi J.S.
      • Yeo B.T.T.
      • Spreng R.N.
      • Uddin L.Q.
      Topography and behavioral relevance of the global signal in the human brain.
      ,
      • Bijsterbosch J.D.
      • Woolrich M.W.
      • Glasser M.F.
      • Robinson E.C.
      • Beckmann C.F.
      • Van Essen D.C.
      • et al.
      The relationship between spatial configuration and functional connectivity of brain regions.
      ,
      • Modabbernia A.
      • Janiri D.
      • Doucet G.E.
      • Reichenberg A.
      • Frangou S.
      Multivariate Patterns of Brain-Behavior-Environment Associations in the Adolescent Brain and Cognitive Development Study.
      ,
      • Smith S.M.
      • Nichols T.E.
      • Vidaurre D.
      • Winkler A.M.
      • Behrens T.E.J.
      • Glasser M.F.
      • et al.
      A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
      ,

      Alnaes D, Kaufmann T, Marquand AF, Smith SM, Westlye LT (2020): Patterns of sociocognitive stratification and perinatal risk in the child brain. Proc Natl Acad Sci 202001517.

      ,

      Helmer M, Warrington S, Mohammadi-Nejad A-R, Lisa J, Howell A, Rosand B, et al. (2020): On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations. https://doi.org/10.1101/2020.08.25.265546

      ); ii) using regularized extensions of CCA and PLS (
      • Xia C.H.
      • Ma Z.
      • Ciric R.
      • Gu S.
      • Betzel R.F.
      • Kaczkurkin A.N.
      • et al.
      Linked dimensions of psychopathology and connectivity in functional brain networks.
      ,
      • Ing A.
      • Sämann P.G.
      • Chu C.
      • Tay N.
      • Biondo F.
      • Robert G.
      • et al.
      Identification of neurobehavioural symptom groups based on shared brain mechanisms.
      ,
      • Popovic D.
      • Ruef A.
      • Dwyer D.B.
      • Antonucci L.A.
      • Eder J.
      • Sanfelici R.
      • et al.
      Traces of Trauma: A Multivariate Pattern Analysis of Childhood Trauma, Brain Structure, and Clinical Phenotypes.
      ,
      • Mihalik A.
      • Ferreira F.S.
      • Moutoussis M.
      • Ziegler G.
      • Adams R.A.
      • Rosa M.J.
      • et al.
      Multiple Holdouts With Stability: Improving the Generalizability of Machine Learning Analyses of Brain–Behavior Relationships.
      ). Most studies using these approaches have potential limitations, however. For instance: i) they usually do not optimise the hyperparameters (e.g., number of principal components or amount of regularization) (
      • Li J.
      • Bolt T.
      • Bzdok D.
      • Nomi J.S.
      • Yeo B.T.T.
      • Spreng R.N.
      • Uddin L.Q.
      Topography and behavioral relevance of the global signal in the human brain.
      ,
      • Bijsterbosch J.D.
      • Woolrich M.W.
      • Glasser M.F.
      • Robinson E.C.
      • Beckmann C.F.
      • Van Essen D.C.
      • et al.
      The relationship between spatial configuration and functional connectivity of brain regions.
      ,
      • Modabbernia A.
      • Janiri D.
      • Doucet G.E.
      • Reichenberg A.
      • Frangou S.
      Multivariate Patterns of Brain-Behavior-Environment Associations in the Adolescent Brain and Cognitive Development Study.
      ,
      • Jia T.
      • Ing A.
      • Quinlan E.B.
      • Tay N.
      • Luo Q.
      • Francesca B.
      • et al.
      Neurobehavioural characterisation and stratification of reinforcement-related behaviour.
      ,
      • Smith S.M.
      • Nichols T.E.
      • Vidaurre D.
      • Winkler A.M.
      • Behrens T.E.J.
      • Glasser M.F.
      • et al.
      A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
      ,

      Alnaes D, Kaufmann T, Marquand AF, Smith SM, Westlye LT (2020): Patterns of sociocognitive stratification and perinatal risk in the child brain. Proc Natl Acad Sci 202001517.

      ,

      Helmer M, Warrington S, Mohammadi-Nejad A-R, Lisa J, Howell A, Rosand B, et al. (2020): On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations. https://doi.org/10.1101/2020.08.25.265546

      ); ii) many studies do not test the significance of the associations using hold-out data (e.g., out-of-sample correlation) (
      • Drysdale A.T.
      • Grosenick L.
      • Downar J.
      • Dunlop K.
      • Mansouri F.
      • Meng Y.
      • et al.
      Resting-state connectivity biomarkers define neurophysiological subtypes of depression.
      ,
      • Li J.
      • Bolt T.
      • Bzdok D.
      • Nomi J.S.
      • Yeo B.T.T.
      • Spreng R.N.
      • Uddin L.Q.
      Topography and behavioral relevance of the global signal in the human brain.
      ,
      • Bijsterbosch J.D.
      • Woolrich M.W.
      • Glasser M.F.
      • Robinson E.C.
      • Beckmann C.F.
      • Van Essen D.C.
      • et al.
      The relationship between spatial configuration and functional connectivity of brain regions.
      ,
      • Xia C.H.
      • Ma Z.
      • Ciric R.
      • Gu S.
      • Betzel R.F.
      • Kaczkurkin A.N.
      • et al.
      Linked dimensions of psychopathology and connectivity in functional brain networks.
      ,
      • Smith S.M.
      • Nichols T.E.
      • Vidaurre D.
      • Winkler A.M.
      • Behrens T.E.J.
      • Glasser M.F.
      • et al.
      A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
      ); iii) they often do not assess the stability of the CCA/PLS model (
      • Drysdale A.T.
      • Grosenick L.
      • Downar J.
      • Dunlop K.
      • Mansouri F.
      • Meng Y.
      • et al.
      Resting-state connectivity biomarkers define neurophysiological subtypes of depression.
      ,
      • Li J.
      • Bolt T.
      • Bzdok D.
      • Nomi J.S.
      • Yeo B.T.T.
      • Spreng R.N.
      • Uddin L.Q.
      Topography and behavioral relevance of the global signal in the human brain.
      ,
      • Marquand A.F.
      • Haak K.V.
      • Beckmann C.F.
      Functional corticostriatal connection topographies predict goal-directed behaviour in humans.
      ,
      • Wang H.T.
      • Bzdok D.
      • Margulies D.
      • Craddock C.
      • Milham M.
      • Jefferies E.
      • Smallwood J.
      Patterns of thought: Population variation in the associations between large-scale network organisation and self-reported experiences at rest.
      ,
      • Smith S.M.
      • Nichols T.E.
      • Vidaurre D.
      • Winkler A.M.
      • Behrens T.E.J.
      • Glasser M.F.
      • et al.
      A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
      ,
      • Popovic D.
      • Ruef A.
      • Dwyer D.B.
      • Antonucci L.A.
      • Eder J.
      • Sanfelici R.
      • et al.
      Traces of Trauma: A Multivariate Pattern Analysis of Childhood Trauma, Brain Structure, and Clinical Phenotypes.
      ,

      Alnaes D, Kaufmann T, Marquand AF, Smith SM, Westlye LT (2020): Patterns of sociocognitive stratification and perinatal risk in the child brain. Proc Natl Acad Sci 202001517.

      ,
      • Mihalik A.
      • Ferreira F.S.
      • Rosa M.J.
      • Moutoussis M.
      • Ziegler G.
      • Monteiro J.M.
      • et al.
      Brain-behaviour modes of covariation in healthy and clinically depressed young people.
      ). Finally, few studies compare different CCA/PLS models and analytic frameworks across different datasets with different dimensionalities (see e.g., (
      • Mihalik A.
      • Ferreira F.S.
      • Rosa M.J.
      • Moutoussis M.
      • Ziegler G.
      • Monteiro J.M.
      • et al.
      Brain-behaviour modes of covariation in healthy and clinically depressed young people.
      ,

      Helmer M, Warrington S, Mohammadi-Nejad A-R, Lisa J, Howell A, Rosand B, et al. (2020): On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations. https://doi.org/10.1101/2020.08.25.265546

      ,
      • Mihalik A.
      • Ferreira F.S.
      • Moutoussis M.
      • Ziegler G.
      • Adams R.A.
      • Rosa M.J.
      • et al.
      Multiple Holdouts With Stability: Improving the Generalizability of Machine Learning Analyses of Brain–Behavior Relationships.
      )).
      Several tutorial papers were recently published on CCA and PLS (
      • Uurtio V.
      • Monteiro J.M.
      • Kandola J.
      • Shawe-Taylor J.
      • Fernandez-Reyes D.
      • Rousu J.
      A Tutorial on Canonical Correlation Methods.
      ,
      • Zhuang X.
      • Yang Z.
      • Cordes D.
      A technical review of canonical correlation analysis for neuroscience applications.
      ,

      Wang H-T, Smallwood J, Mourao-Miranda J, Xia CH, Satterthwaite TD, Bassett DS, Bzdok D (2020): Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists. Neuroimage 116745.

      ,
      • Krishnan A.
      • Williams L.J.
      • McIntosh A.R.
      • Abdi H.
      Partial Least Squares (PLS) methods for neuroimaging: A tutorial and review.
      ). Here, we complement these tutorials by discussing some important conceptual and practical aspects of these methods. These comprise: i) the advantages and disadvantages of the various CCA/PLS models, ii) the impact of PCA and regularization on these models (e.g., on overfitting and stability), and iii) the importance of the analytic framework in optimising the models’ hyperparameters and performing statistical inference.
      In Part 1, we present the theoretical background of these models and discuss the most common strategies to mitigate the problems caused when the ratio between sample size and number of variables is small (e.g., of around ∼0.1-0.01). We also examine the most prevalent analytical frameworks used with CCA/PLS models. In Part 2, we apply the models introduced in Part 1 to simulated data and real data from the Human Connectome Project (HCP) and the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (n>500 in all). We illustrate how the different CCA/PLS models perform with data dimensionalities often used in practice (i.e., ratios between sample size and number of variables in the ranges of ∼1-10 or ∼0.1-0.01). Moreover, we show that regularization can be helpful even when the number of variables in both data modalities is smaller than the sample size. Mathematical details of the CCA/PLS models and their connections are provided in the Supplement.

      Part 1: Technical background of CCA and PLS

      CCA/PLS optimization and nomenclature

      Canonical Correlation Analysis (CCA) (
      • Hotelling H.
      Relations between two sets of variates.
      ) and Partial Least Squares (PLS) (

      Wold H (1985): Partial least squares. In: Kotz S, Johnson N, editors. Encyclopedia of Statistical Sciences. New York: Wiley Online Library, pp 581–591.

      ) are multivariate latent variable models that capture associations across two modalities of data (e.g., brain and behaviour). For example (Figure 1), X contains voxel-level brain variables and Y contains behavioural variables from item-level self-report questionnaires (i.e., X and Y are matrices with rows and columns representing subjects and variables, respectively). Standard CCA/PLS find pairs of brain and behavioural weights wx and wy (column vectors) such that the linear combination (weighted sum) of the brain and behavioural variables maximises the correlation (CCA) or covariance (PLS) between the resulting latent variables, i.e., between ξ=Xwxand ω=Ywy, respectively.
      Figure thumbnail gr1
      Figure 1Overview of Canonical Correlation Analysis/Partial Least Squares (CCA/PLS) models for investigating brain-behaviour associations. CCA/PLS models maximize the correlation (CCA) or covariance (PLS) between latent variables extracted as weighted linear combinations of the brain and behavioural variables (see formulae in text). Note that the weights are column vectors but are represented as rows to highlight that they have the same dimensionality as their respective data modality.
      In the PLS literature, the weights are often referred to as saliences and the latent variables as scores. In the CCA literature, the weights are often referred to as canonical vectors, the latent variables as canonical variates, and the correlation between the latent variables as canonical correlations. The brain and behaviour weights have the same dimensionality as their respective data modality (e.g., number of brain/behavioural variables) and quantify each brain and behavioural variable’s contribution to the identified association. Sometimes the Pearson correlations between the brain and behavioural variables and their respective latent variable are presented instead of the model’s weights, and are called structure correlations (CCA) (
      • Meredith W.
      Canonical correlations with fallible data.
      ) or loadings (PLS) (

      Rosipal R, Krämer N (2006): Overview and Recent Advances in Partial Least Squares. In: Saunders C, Grobelnik M, Gunn S, Shawe-Taylor J, editors. Subspace, Latent Structure and Feature Selection. Berlin, Heidelberg: Springer Berlin Heidelberg, pp 34–51.

      ) (for details, see the Supplement). The latent variables (one latent variable score per data modality and subject) quantify how the associative effect is expressed across the sample. Table 1 summarizes the different nomenclatures used in the CCA and PLS literature.
      Table 1Different nomenclatures in CCA and PLS literature and summary of the corresponding terms.


      Model
      RelationshipModel weightsLatent variableCorrelation between original variables and latent variable
      CCAmode/associationcanonical vector/coefficientcanonical variable/variatestructure correlation
      PLSassociationsaliencescoreloading
      While standard CCA refers to a single method, standard PLS refers to a family of methods with different modelling aims (e.g., assuming a symmetric or asymmetric relationship between the two data modalities; for details, see the Supplement). Standard CCA and PLS can be both solved by iterative (e.g., alternating least squares (
      • Golub G.H.
      • Zha H.
      Perturbation analysis of the canonical correlations of matrix pairs.
      ), non-linear iterative partial least squares (

      Wegelin JA (2000): A Survey on Partial Least Squares (PLS) Methods, with Emphasis on the Two-Block Case. Retrieved from https://stat.uw.edu/sites/default/files/files/reports/2000/tr371.pdf

      )) and non-iterative (e.g., eigenvalue problem (
      • Uurtio V.
      • Monteiro J.M.
      • Kandola J.
      • Shawe-Taylor J.
      • Fernandez-Reyes D.
      • Rousu J.
      A Tutorial on Canonical Correlation Methods.
      ,

      Rosipal R, Krämer N (2006): Overview and Recent Advances in Partial Least Squares. In: Saunders C, Grobelnik M, Gunn S, Shawe-Taylor J, editors. Subspace, Latent Structure and Feature Selection. Berlin, Heidelberg: Springer Berlin Heidelberg, pp 34–51.

      )) methods. In case of iterative methods, once a pair of weights is obtained, the corresponding associative effect is removed from the data (by a process called ‘deflation’) and new associations are sought.
      Since standard CCA maximises correlation between the latent variables, it is more sensitive to the direction of the relationships across modalities, and it is not driven by within-modality variances. On the other hand, standard PLS – which maximises covariance – is less sensitive to the direction of the across-modality relationships as it is also driven by within-modality variances. Formally, we can see this from the optimization of these models. Standard CCA optimizes correlation across modalities:
      maxwx,wycorr(Xwx,Ywy).
      (Eq. 1)


      Standard PLS optimizes covariance across modalities – the product of correlation and standard deviations (i.e., square root of variance):
      maxwx,wycov(Xwx,Ywy)=corr(Xwx,Ywy)var(Xwx)var(Ywy).
      (Eq. 2)


      This also means that standard CCA and PLS are equivalent optimization problems when var(Xwx)=var(Ywy)=1, which is true when the within-modality variances are identity matrices, i.e., XTX=YTY=I.

      Limitations of standard CCA/PLS

      When the ratio between the sample size and the number of variables is similar or smaller than 1, standard CCA/PLS models present limitations. These limitations can exist irrespective of sample size if the number of variables is large, or the variables are highly correlated. In case of standard CCA the key limitations are: i) The optimization is ill-posed (i.e., there is no unique solution) when the number of variables in at least one of the modalities exceeds the sample size; ii) The CCA weights wx and wy are unstable when the variables within one or both modalities are highly correlated, known as the multicollinearity problem (
      • Vounou M.
      • Nichols T.E.
      • Montana G.
      Discovering genetic associations with high-dimensional neuroimaging phenotypes: A sparse reduced-rank regression approach.
      ). These limitations might sound familiar, not surprisingly, as standard CCA can be viewed as a multivariate extension of the univariate General Linear Model (
      • Knapp T.R.
      Canonical correlation analysis: A general parametric significance-testing system.
      ,
      • Izenman A.J.
      Reduced-rank regression for the multivariate linear model.
      ). The standard PLS optimization is never ill-posed and copes with multicollinearity (i.e., PLS weights are stable (

      Wegelin JA (2000): A Survey on Partial Least Squares (PLS) Methods, with Emphasis on the Two-Block Case. Retrieved from https://stat.uw.edu/sites/default/files/files/reports/2000/tr371.pdf

      )), however, standard PLS and CCA cannot perform feature selection (i.e., setting the weights of some variables to 0) and may therefore have low performance where the effects are sparse.
      These limitations can be addressed by dimensionality reduction (i.e., PCA) or regularization. Regularization adds further constraints to the optimization to solve an ill-posed problem or prevent overfitting. For CCA/PLS models, the most common forms of regularization are L1-norm (lasso) (
      • Tibshirani R.
      Regression Shrinkage and Selection via the Lasso.
      ), L2-norm (ridge) (
      • Hoerl A.E.
      • Kennard R.W.
      Ridge Regression: Applications to Nonorthogonal Problems.
      ) and combinations of L1-norm and L2-norm regularization (elastic-net) (
      • Zou H.
      • Hastie T.
      Regularization and variable selection via the elastic net.
      ).

      Standard CCA with PCA dimensionality reduction (PCA-CCA)

      Principal Component Analysis (PCA) transforms one modality of multivariate data into uncorrelated principal components (PC) (it is also related to whitening, see ‘Effects of pre-whitening on CCA/PLS models’). PCA is often used as a naïve dimensionality reduction technique, as PCs explaining little variance are assumed to be ‘noise’ and discarded, and the remaining PCs entered into standard CCA. However, PCA when applied before CCA can be also seen as a technique similar to regularization: it makes the CCA model well-posed and addresses the multicollinearity problem.
      The number of retained PCs can be selected based on their explained variance, e.g., 99% of total variance. In PCA-CCA applications, often the same number of PCs are chosen for both data modalities, based on the lower dimensional data – usually behaviour (e.g., (
      • Li J.
      • Bolt T.
      • Bzdok D.
      • Nomi J.S.
      • Yeo B.T.T.
      • Spreng R.N.
      • Uddin L.Q.
      Topography and behavioral relevance of the global signal in the human brain.
      ,
      • Bijsterbosch J.D.
      • Woolrich M.W.
      • Glasser M.F.
      • Robinson E.C.
      • Beckmann C.F.
      • Van Essen D.C.
      • et al.
      The relationship between spatial configuration and functional connectivity of brain regions.
      ,
      • Smith S.M.
      • Nichols T.E.
      • Vidaurre D.
      • Winkler A.M.
      • Behrens T.E.J.
      • Glasser M.F.
      • et al.
      A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
      ,

      Alnaes D, Kaufmann T, Marquand AF, Smith SM, Westlye LT (2020): Patterns of sociocognitive stratification and perinatal risk in the child brain. Proc Natl Acad Sci 202001517.

      )). Sometimes the same proportion of explained variance – rather than numbers of PCs – is used for both data modalities (e.g., (
      • Modabbernia A.
      • Janiri D.
      • Doucet G.E.
      • Reichenberg A.
      • Frangou S.
      Multivariate Patterns of Brain-Behavior-Environment Associations in the Adolescent Brain and Cognitive Development Study.
      ,

      Helmer M, Warrington S, Mohammadi-Nejad A-R, Lisa J, Howell A, Rosand B, et al. (2020): On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations. https://doi.org/10.1101/2020.08.25.265546

      )). One problem with discarding PCs with low variance is that there is no guarantee that PCs with high variance in either modality are best to link the different data modalities, whilst some discarded PCs might contain useful information. To address this problem, we can use a data-driven approach, by selecting the number of PCs that maximise the correlation across modalities (see ‘CCA with dimensionality reduction vs. regularized CCA’ in Part 2).

      Regularized CCA (RCCA)

      L2-norm regularization is a popular form of regularization for ill-posed problems or for mitigating the effects of multicollinearity, originally used in ridge regression (
      • Hoerl A.E.
      • Kennard R.W.
      Ridge Regression: Applications to Nonorthogonal Problems.
      ). In L2-norm regularization, the added constraint corresponds to the sum of squares of all weight values1, which forces the weights to be small but does not make them zero. L2-norm regularization has been proposed for CCA (
      • Vinod H.D.
      Canonical ridge and econometrics of joint production.
      ), commonly referred to as regularized CCA (RCCA) (

      Rosipal R, Krämer N (2006): Overview and Recent Advances in Partial Least Squares. In: Saunders C, Grobelnik M, Gunn S, Shawe-Taylor J, editors. Subspace, Latent Structure and Feature Selection. Berlin, Heidelberg: Springer Berlin Heidelberg, pp 34–51.

      ,
      • Hardoon D.R.
      • Szedmak S.
      • Shawe-Taylor J.
      Canonical correlation analysis: An overview with application to learning methods.
      ,
      • Tenenhaus A.
      • Tenenhaus M.
      Regularized Generalized Canonical Correlation Analysis.
      ,

      Tuzhilina E, Tozzi L, Hastie T (2020): Canonical Correlation Analysis in high dimensions with structured regularization. Retrieved from http://arxiv.org/abs/2011.01650

      ). Interestingly, in RCCA, the regularization terms added to the CCA problem leads to a mixture of standard CCA and standard PLS optimization. We can see this from the RCCA optimization problem:
      maxwx,wycorr(Xwx,Ywy)var(Xwx)var(Ywy)(1cx)var(Xwx)+cx(1cy)var(Ywy)+cy.
      (Eq. 3)


      where the two hyperparameters (cx, cy) control the amount of regularization and provide a smooth transition between standard CCA (cx=cy=0, not regularized) and standard PLS (cx=cy=1, most regularized) (

      Rosipal R, Krämer N (2006): Overview and Recent Advances in Partial Least Squares. In: Saunders C, Grobelnik M, Gunn S, Shawe-Taylor J, editors. Subspace, Latent Structure and Feature Selection. Berlin, Heidelberg: Springer Berlin Heidelberg, pp 34–51.

      ,
      • Hardoon D.R.
      • Szedmak S.
      • Shawe-Taylor J.
      Canonical correlation analysis: An overview with application to learning methods.
      ). Importantly, as L2-norm regularization mitigates multicollinearity it increases the stability of the RCCA weights. However, it also means that similar to standard PLS, RCCA can be driven by within-modality variances. For additional connections between standard CCA, RCCA, standard PLS and how they are related to PCA-CCA, see the Supplement.
      Sparse PLS (SPLS)
      L1-norm regularization was originally proposed in Lasso regression (
      • Tibshirani R.
      Regression Shrinkage and Selection via the Lasso.
      ). In L1-norm regularization, the added constraint corresponds to the absolute sum of weight values2, which sets some of the weight values to zero resulting in variable selection and promoting sparsity. Sparse solutions facilitate the interpretability of the model and may improve performance when only a subset of variables is relevant (
      • Tibshirani R.
      Regression Shrinkage and Selection via the Lasso.
      ). However, sparsity can also introduce instability to the model if different sets of variables provide similar performance. Elastic net regularization is a mixture of L1-norm and L2-norm regularization which combines the properties of both forms of regularization and can mitigate the instability of L1-norm regularization (
      • Zou H.
      • Hastie T.
      Regularization and variable selection via the elastic net.
      ). In one popular algorithm (
      • Witten D.M.
      • Tibshirani R.
      • Hastie T.
      A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.
      ), which we will refer to as sparse PLS (SPLS), hyperparameters control the amount of L1-norm regularization or sparsity. Since PLS can be seen as CCA with maximal L2-norm regularization (see section before), SPLS can also be viewed as an elastic net regularized CCA (for details, see the Supplement).
      Effects of pre-whitening on CCA/PLS models
      In machine learning, data are often whitened as a pre-processing step. Whitening transforms the original variables into new, uncorrelated features, which are normalized to have unit length (i.e., L2-norm of each feature equals 1). Whitening is not a unique transformation and the most commonly used forms are PCA-, Mahalanobis- and Cholesky-whitening (
      • Kessy A.
      • Lewin A.
      • Strimmer K.
      Optimal Whitening and Decorrelation.
      ). The critical difference between PCA and PCA-whitening is that PCA retains the variance of the original data, i.e., the principal components are not normalized to have unit length.
      Whitening as a pre-processing step has a major drawback in CCA/PLS models: the beneficial effects of L1-norm and L2-norm regularization on the original variables cannot be achieved any more as the whitened data are the new inputs of the model. In case of SPLS, L1-norm regularization will result in sparsity on the whitened variables (instead of the original variables) thus the interpretability of the results will not be facilitated. In case of RCCA, L2-norm regularization is not active on whitened data, which means that CCA, RCCA and PLS will yield the same results. For additional details on whitening, see the Supplement.

      Analytic frameworks for CCA/PLS models

      The statistical significance of the CCA/PLS model (i.e., the number of significant associative effects) can be evaluated using either a descriptive or a predictive (also referred to as a machine learning) framework. The two frameworks have distinct goals: the aim of the descriptive framework is to detect above-chance associations in the current dataset, whereas the aim of the predictive framework is to test whether such associations generalise to new data (
      • Shmueli G.
      To explain or to predict?.
      ,

      Bzdok D, Engemann D, Thirion B (2020): Inference and Prediction Diverge in Biomedicine. Patterns (New York, NY) 1: 100119.

      ,
      • Arbabshirani M.R.
      • Plis S.
      • Sui J.
      • Calhoun V.D.
      Single subject prediction of brain disorders in neuroimaging: Promises and pitfalls.
      ,

      Abdi H (2010): Partial least squares regression and projection on latent structure regression (PLS Regression). Wiley Interdiscip Rev Comput Stat 2: 97–106.

      ).
      In the descriptive framework (Figure 2A), CCA/PLS is fitted on the entire sample, thus the statistical inference is based on in-sample correlation. In this framework, there is usually no hyperparameter optimization (i.e., the number of PCs or regularization parameter is fixed a priori). In the predictive framework (Figure 2B), CCA/PLS is fitted on a training/optimization set and evaluated on a test/holdout set, thus the statistical inference is based on out-of-sample correlation. This procedure assesses the generalizability of the model, i.e., how well the association found in the training set generalizes to an independent test set. In the predictive framework, the hyperparameters are usually optimized, therefore the training/optimization set is further divided into a training and a validation set and the best hyperparameters are selected based on out-of-sample correlation in the validation set. In both descriptive and predictive frameworks, permutation inference (based on in-sample or out-of-sample correlation) is often used to assess the number of significant associative effects (

      Abdi H (2010): Partial least squares regression and projection on latent structure regression (PLS Regression). Wiley Interdiscip Rev Comput Stat 2: 97–106.

      ,
      • Winkler A.M.
      • Renaud O.
      • Smith S.M.
      • Nichols T.E.
      Permutation inference for canonical correlation analysis.
      ).
      Figure thumbnail gr2
      Figure 2Descriptive and predictive (or machine learning) frameworks. (A) The descriptive framework fits CCA/PLS with fixed hyperparameters (i.e., the number of principal components or regularization parameter) on the entire sample, thus the statistical inference is based on in-sample correlation. (B) The predictive (or machine learning) framework fits CCA/PLS on a training set and evaluates the model on a test set, thus the statistical inference is based on out-of-sample correlation. The hyperparameters are usually optimized: the training set is further divided into a training and a validation set and the best hyperparameters are selected based on out-of-sample correlation in the validation set. We note that although not all models maximize correlation (as described in the previous section), typically all CCA/PLS models are evaluated based on the correlation between the latent variables (see ).
      Lastly, an important component of any CCA/PLS framework is testing the stability of the model. Usually a bootstrapping procedure is applied to provide confidence intervals on the model’s weights (

      Abdi H (2010): Partial least squares regression and projection on latent structure regression (PLS Regression). Wiley Interdiscip Rev Comput Stat 2: 97–106.

      ). Recently, stability selection (
      • Lin D.
      • Calhoun V.D.
      • Wang Y.P.
      Correspondence between fMRI and SNP data by group sparse canonical correlation analysis.
      ,
      • Ing A.
      • Sämann P.G.
      • Chu C.
      • Tay N.
      • Biondo F.
      • Robert G.
      • et al.
      Identification of neurobehavioural symptom groups based on shared brain mechanisms.
      ,
      • Lê Cao K.-A.
      • Boitard S.
      • Besse P.
      Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems.
      ,
      • Labus J.S.
      • Van Horn J.D.
      • Gupta A.
      • Alaverdyan M.
      • Torgerson C.
      • Ashe-McNalley C.
      • et al.
      Multivariate morphological brain signatures predict patients with chronic abdominal pain from healthy control subjects.
      ,
      • Olson Hunt M.J.
      • Weissfeld L.
      • Boudreau R.M.
      • Aizenstein H.
      • Newman A.B.
      • Simonsick E.M.
      • et al.
      A variant of sparse partial least squares for variable selection and data exploration.
      ) has been proposed with the aim of selecting the most stable CCA/PLS model in the first place, rather than evaluating the stability of the model post-hoc. Alternatively, the stability of the CCA/PLS models can be measured as the average similarity of weights across different splits of training data, which avoids the additional computational costs of the previous two approaches (
      • Mihalik A.
      • Ferreira F.S.
      • Moutoussis M.
      • Ziegler G.
      • Adams R.A.
      • Rosa M.J.
      • et al.
      Multiple Holdouts With Stability: Improving the Generalizability of Machine Learning Analyses of Brain–Behavior Relationships.
      ). For more details on analytic frameworks, see e.g., (
      • Smith S.M.
      • Nichols T.E.
      • Vidaurre D.
      • Winkler A.M.
      • Behrens T.E.J.
      • Glasser M.F.
      • et al.
      A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
      ,
      • Mihalik A.
      • Ferreira F.S.
      • Moutoussis M.
      • Ziegler G.
      • Adams R.A.
      • Rosa M.J.
      • et al.
      Multiple Holdouts With Stability: Improving the Generalizability of Machine Learning Analyses of Brain–Behavior Relationships.
      ,
      • Monteiro J.M.
      • Rao A.
      • Shawe-Taylor J.
      • Mourão-Miranda J.
      A multiple hold-out framework for Sparse Partial Least Squares.
      ,

      Abdi H (2010): Partial least squares regression and projection on latent structure regression (PLS Regression). Wiley Interdiscip Rev Comput Stat 2: 97–106.

      ).

      Part 2: Demonstrations of CCA and PLS analyses

      Description of experiments

      In order to demonstrate the properties of different CCA and PLS approaches, we applied the models introduced in Part 1 to real and simulated datasets with different dimensionalities and sample sizes. Table 2 gives an overview of all experiments.
      Table 2Summary of CCA/PLS models on high and low-dimensional real and simulated data.


      Model
      Analytical frameworkHyperparameter optimizationModel hyperparameter
      High-dimensional data
      PCA-CCADescriptiveNone (fixed)Number of PCs
      PCA-CCAPredictiveNone (fixed)Number of PCs
      PCA-CCAPredictiveData-drivenNumber of PCs
      RCCAPredictiveData-drivenAmount of L2-norm regularization
      Standard PLSPredictiveNoneNone
      SPLSPredictiveData-drivenAmount of L1-norm regularization
      Low-dimensional data
      Standard CCAPredictiveNoneNone
      RCCAPredictiveData-drivenAmount of L2-norm regularization
      Standard PLSPredictiveNoneNone
      SPLSPredictiveData-drivenAmount of L1-norm regularization
      PC, principal component
      We chose the Human Connectome Project (HCP) and the Alzheimer’s Disease Neuroimaging Initiative (ADNI) datasets based on two recent landmark studies (
      • Smith S.M.
      • Nichols T.E.
      • Vidaurre D.
      • Winkler A.M.
      • Behrens T.E.J.
      • Glasser M.F.
      • et al.
      A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
      ,
      • Monteiro J.M.
      • Rao A.
      • Shawe-Taylor J.
      • Mourão-Miranda J.
      A multiple hold-out framework for Sparse Partial Least Squares.
      ). In the HPC dataset, we used resting-state fMRI connectivity data (19 900 and 300 brain variables in the high- and low-dimensional data, respectively) and 145 non-imaging subject measures (e.g., behavioural, demographic, lifestyle measures) of 1003 healthy subjects. In the ADNI dataset, we used whole-brain grey matter volumes (168 130 and 120 brain variables in the high- and low-dimensional data, respectively) and 31 item-level measures of the Mini-Mental State Examination (MMSE) of 592 elderly subjects. We generated the simulated data with a sparse signal (i.e., 10% of the variables in each modality were relevant to capture the association across modalities) and properties similar to the HCP dataset (in terms of sample size, dimensionality and correlation between latent variables). Table 3 displays the characteristics of the real and simulated datasets. For further details of the datasets and the simulated data generation, see the Supplement.
      Table 3Characteristics of real and simulated data.


      Data
      HCPADNISimulation
      Low-dimensionalHigh-dimensionalLow-dimensionalHigh-dimensionalLow-dimensionalHigh-dimensional
      SubjectsHealthy (N=1001)Healthy (N=1001)Healthy + clinical (N=592)Healthy + clinical (N=592)Not applicable (N=1000)Not applicable (N=1000)
      Brain variablesConnectivity of 25 ICA components (D=300)Connectivity of 200 ICA components (D=19900)ROI-wise grey matter volume (D=120)Voxel-wise grey matter volume (D=168130)Not applicable (D=100)Not applicable (D=20000)
      Behavioural variablesBehaviour, psychometrics, demographics (D=145)Behaviour, psychometrics, demographics (D=145)Items of MMSE questionnaire (D=31)Items of MMSE questionnaire (D=31)Not applicable (D=100)Not applicable (D=100)
      The PCA-CCA model was used both with fixed numbers of PCs within a descriptive framework and with optimized number of PCs within a predictive framework. All the other CCA/PLS models were used within a predictive framework. The predictive framework was based on (
      • Monteiro J.M.
      • Rao A.
      • Shawe-Taylor J.
      • Mourão-Miranda J.
      A multiple hold-out framework for Sparse Partial Least Squares.
      ), which uses multiple test/holdout sets to assess the generalizability and robustness of the CCA/PLS models (detailed in the Supplement). In both frameworks, permutation testing was used to assess the number of statistically significant associative effects based on in-sample and out-of-sample correlations between the latent variables, respectively. Importantly, the family structure of the HCP dataset was respected during the different data splits (training, validation, test/holdout sets) and permutations (
      • Winkler A.M.
      • Webster M.A.
      • Vidaurre D.
      • Nichols T.E.
      • Smith S.M.
      Multi-level block permutation.
      ). We used iterative methods to solve CCA/PLS and applied mode-A deflation for standard PLS and SPLS and generalized deflation for standard CCA, PCA-CCA and RCCA (for details, see the Supplement). For simplicity, we present the results for the first associative effect in most CCA/PLS experiments (for a summary of all associative effects, see Table S1). Throughout the paper, we present the weights (canonical vector for CCA models, salience for PLS models) and latent variables obtained by the model.
      We used linear mixed-effects (LME) models to compare the different CCA/PLS models on the following measures across the outer training or test sets: i) in-sample correlation; ii) out-of-sample correlation; iii) similarity of the model weights (measured by Pearson correlation); iv) variance explained by the model. In addition, we compared the number of PCs between PCA-CCA models with fixed vs. data-driven number of PCs. We report significance at p<0.005 in all LME models. For further details of the LME analyses, see the Supplement. We also quantified the rank-similarity of the weights (measured by Spearman correlation) across the different CCA/PLS models in the real datasets.

      In-sample vs. out-of-sample correlation in high-dimensional data

      Figure 3 and Table 4 display the in-sample and out-of-sample correlations for all experiments using all three high-dimensional datasets. On average the out-of-sample correlations are lower than the in-sample correlations (t(
      • Ziegler G.
      • Dahnke R.
      • Winkler A.D.
      • Gaser C.
      Partial least squares correlation of multivariate cognitive abilities and local brain structure in children and adolescents.
      )=4.51, p=0.0005). In real datasets, CCA/PLS models with dimensionality reduction or regularization provide high out-of-sample correlations in most cases underlining that these models generalize well to unseen data. The only notable exceptions are standard PLS and SPLS, which present significantly lower out-of-sample correlations in the HCP dataset (Figure 3B) (F(2,56)=289.30, p<0.0001). This can be attributed to the different properties of the HCP dataset (e.g., higher noise level and non-sparse associative effect) and the fact that standard PLS and SPLS are especially dominated by within-modality variance in this dataset (Table 4).
      Figure thumbnail gr3
      Figure 3Dot plot of in-sample and out-of-sample correlations for the first associative effects of all experiments in all three high-dimensional datasets. Each dot represents a model trained on the overall data (descriptive framework) or on 10 random subsets of the data (predictive framework). The horizontal jitter is for visualization purposes. (A) High-dimensional ADNI dataset. (B) High-dimensional HCP dataset. Note, that we display the second associative effect for SPLS as it is the most similar to the first associative effects identified by the other models. (C) High-dimensional simulated dataset. fixed PCs, fixed number of principal components; data-driven, data-driven number of principal components; desc, descriptive framework; pred, predictive framework.
      Table 4Main characteristics (mean ± SEM for all values) of the first associative effects in the high-dimensional datasets obtained with the different CCA/PLS models using the predictive framework. Note that we display the second associative effect for standard PLS (PLS-2) and SPLS (SPLS-2) in the HCP dataset as it is the most similar to the first associative effects identified by the other models.
      ModelBrainBehaviourAcross-modality relationship
      Stability of weights1Explained variance2Stability of weights1Explained variance2In-sample correlation3Out-of-sample correlation4
      ADNI dataset
      PCA-CCA (fixed PCs)0.86 (± 0.00)8.47 (± 0.16)0.85 (± 0.01)14.91 (± 0.23)0.70 (± 0.00)0.55 (± 0.01)
      PCA-CCA (data-driven PCs)0.70 (± 0.01)5.26 (± 0.25)0.93 (± 0.00)15.73 (± 0.13)0.83 (± 0.01)0.65 (± 0.01)
      RCCA (L2-reg. opt.)0.82 (± 0.00)5.47 (± 0.06)0.94 (± 0.00)16.63 (± 0.26)0.98 (± 0.00)0.66 (± 0.01)
      Standard PLS0.96 (± 0.00)21.54 (± 0.16)0.94 (± 0.00)18.64 (± 0.21)0.44 (± 0.00)0.43 (± 0.01)
      SPLS (L1-reg. opt.)0.83 (± 0.02)14.05 (± 0.13)0.96 (± 0.01)15.86 (± 0.42)0.60 (± 0.00)0.61 (± 0.01)
      HCP dataset
      PCA-CCA (fixed PCs)0.72 (± 0.01)0.42 (± 0.01)0.78 (± 0.01)2.67 (± 0.10)0.76 (± 0.00)0.47 (± 0.02)
      PCA-CCA (data-driven PCs)0.56 (± 0.02)0.35 (± 0.03)0.53 (± 0.04)3.73 (± 0.39)0.76 (± 0.01)0.45 (± 0.03)
      RCCA (L2-reg. opt.)0.78 (± 0.01)0.29 (± 0.01)0.88 (± 0.01)4.39 (± 0.18)1.00 (± 0.00)0.52 (± 0.02)
      Standard PLS-20.52 (± 0.04)0.50 (± 0.05)0.62 (± 0.05)8.07 (± 0.30)0.79 (± 0.02)0.21 (± 0.02)
      SPLS-2 (L1-reg. opt.)0.25 (± 0.04)0.48 (± 0.07)0.51 (± 0.05)7.23 (± 0.37)0.64 (± 0.04)0.25 (± 0.03)
      Simulated dataset
      PCA-CCA (fixed PCs)0.74 (± 0.01)0.76 (± 0.01)0.90 (± 0.00)1.82 (± 0.01)0.80 (± 0.00)0.67 (± 0.01)
      PCA-CCA (data-driven PCs)0.96 (± 0.00)0.85 (± 0.00)0.91 (± 0.00)1.95 (± 0.02)0.73 (± 0.01)0.70 (± 0.01)
      RCCA (L2-reg. opt.)0.93 (± 0.00)0.77 (± 0.00)0.97 (± 0.00)1.99 (± 0.01)0.83 (± 0.01)0.71 (± 0.01)
      Standard PLS0.94 (± 0.00)0.84 (± 0.00)0.97 (± 0.00)2.07 (± 0.01)0.81 (± 0.00)0.71 (± 0.01)
      SPLS (L1-reg. opt.)0.78 (± 0.03)0.84 (± 0.00)1.00 (± 0.00)1.94 (± 0.01)0.79 (± 0.01)0.73 (± 0.01)
      1similarity of model weights measured by Pearson correlation between each pair of training sets of the outer data splits; 2percent variance explained by the model relative to all within-modality variance in the training sets of the outer data splits; 3correlation between the latent variables in the training sets of the outer data splits; 4correlation between the latent variables in the test sets of the outer data splits; opt, optimized; PC, principal component; L1-reg., L1-norm regularization; L2-reg., L2-norm regularization.
      1 L2-norm: w2=iwi2, where w=(w1,w2,,wn) is a vector of size n
      2 L1-norm: w1=i|wi|, where w=(w1,w2,,wn) is a vector of size n
      In conclusion, we recommend embedding all models in a predictive framework that splits the data into training and test sets to assess the model’s out-of-sample generalizability.
      Standard CCA with PCA dimensionality reduction vs. regularized CCA in high-dimensional data
      In this section, we present the results of applying PCA-CCA and RCCA to all three high-dimensional datasets. We focus on experiments using the predictive framework and compare PCA-CCA with fixed versus data-driven numbers of PCs, as well as both of these models to RCCA.
      Figure 4A-C and Figure 5A-C display the brain and behavioural weights and corresponding latent variables for the three models (note that for the HCP dataset the brain weights were transformed into brain connection strength increases/decreases). Figure 6 compares the brain and behavioural weights using rank-similarity across the models, which indicates that although the weights are similar across the three models, data-driven PCA-CCA and RCCA are more similar to each other. The model weights and latent variables for the simulated dataset can be found in Figure 7A-C, which suggest that all three models recovered sufficiently the true weights of the generative model. Nevertheless, the non-sparse models attributed non-zero weights for many non-relevant variables (for details, see Table S2).
      Figure thumbnail gr4
      Figure 4Brain weights (left column), behavioural weights (middle column) and latent variables (right column) for the high-dimensional ADNI dataset. For visualization purposes, the model weights are normalized (divided by largest absolute value). Scatter plot between the brain and behavioural latent variables is overlaid by a least-squares regression line separately for the training and test data. (A) PCA-CCA with fixed number of principal components. (B) PCA-CCA with data-driven number of principal components. (C) RCCA. (D) Standard PLS. (E) SPLS. L, left hemisphere; R, right hemisphere; corrtraining, in-sample correlation in the training data, corrtest, out-of-sample correlation in the test data.
      Figure thumbnail gr5
      Figure 5Brain connection strengths (left column), behavioural weights (middle columns) and latent variables (right column) for the high-dimensional HCP dataset. For visualization purposes, the brain weights were transformed into brain connection strength (i.e., brain weights multiplied by the sign of the population mean connectivity) increases (red) and decreases (blue), summed across the brain nodes (i.e., ICA components where each brain vertex is assigned to an ICA component it is most likely to belong) and normalized (divided by largest absolute value). Only the top 15 positive (red) and top 15 negative (blue) behavioural weights are shown (secondary (e.g., age adjusted) measures that are highly redundant with those shown here are not displayed). The behavioural model weights are normalized (divided by largest absolute value). Scatter plot between the brain and behavioural latent variables is overlaid by a least-squares regression line separately for the training and test data. (A) PCA-CCA with fixed number of principal components. (B) PCA-CCA with data-driven number of principal components. (C) RCCA. (D) Standard PLS. (E) SPLS. L, left hemisphere; R, right hemisphere; corrtraining, in-sample correlation in the training data, corrtest, out-of-sample correlation in the test data.
      Figure thumbnail gr6
      Figure 6Comparison of brain weights (left column) and behavioural weights (right column) across CCA/PLS models for the high-dimensional ADNI and HCP datasets obtained by the predictive framework. The similarity between the model weights is measured by Spearman correlation. The similarity between SPLS and the other models is measured only for the subset of variables identified by SPLS (the similarity between the two SPLS models was measured for the subset of variables that were present in both models). (A) High-dimensional ADNI dataset. (B) High-dimensional HCP dataset. Note, that the second associative effect identified by standard PLS (PLS-2) and SPLS (SPLS-2) is similar to the first associative effects identified by the other models. Standard PLS-1/2, first/second associative effect identified by PLS; SPLS-1/2, first/second associative effect identified by SPLS; PC, principal component.
      Figure thumbnail gr7
      Figure 7Model weights (left column: high-dimensional modality, middle column: low-dimensional modality) and latent variables (right column) for the high-dimensional simulated dataset. For comparison, the true weights (red) of the generative model are overlaid on the model weights (blue). For visualization purposes, the model weights are normalized (divided by largest value) and only a subset of 100 random weights (out of the total 20000) is displayed for the high dimensional modality. Scatter plot between the brain and behavioural latent variables is overlaid by a least-squares regression line separately for the training and test data. (A) PCA-CCA with fixed number of principal components. (B) PCA-CCA with data-driven number of principal components. (C) RCCA. (D) Standard PLS. (E) SPLS. corrtraining, in-sample correlation in the training data; corrtest, out-of-sample correlation in the test data.
      To further investigate the characteristics of the three models, Table 4 shows the stability of weights and the explained variance by the models. The stability of weights varied significantly across brain and behaviour modalities (F(1,804)=84.51, p<0.0001) and models (F(2,804)=91.63, p<0.0001). Notably, the stability of RCCA weights was consistently high. The explained variance varied significantly only across modalities (F(1,174)=241.55, p<0.0001) but not models (F(2,174)=0.31, p=7303).
      Next, we examined the number of PCs in the two PCA-CCA models. We found a significant interaction between the effect of data modality and model on the number of PCs (F(1,114)=22.63, p<0.0001). Data-driven PCA-CCA yielded more brain PCs and fewer behavioural PCs than PCA-CCA with the fixed number of PCs (Table S3). These results confirm that lower ranked brain PCs might also carry information that links brain and behaviour and should not necessarily be discarded. Moreover, fixing the same number of PCs for both modalities might not be a good choice.
      Based on these results and as the optimal numbers of PCs can vary even across different brain-behaviour associations in the same dataset, we recommend data-driven PCA-CCA over PCA-CCA with fixed numbers of PCs. Furthermore, we found that data-driven PCA-CCA and RCCA gave similar results, both having a similar regularizing effect on the CCA model.
      Sparse vs. non-sparse CCA/PLS models in high-dimensional data
      In this section, we show how SPLS can find associations between subsets of features in all three high-dimensional datasets, and we compare the SPLS results with standard PLS and RCCA.
      Figure 4C-E and Figure 5C-E display the models’ weights and latent variables (note that for the HCP dataset the brain weights were transformed into brain connection strength increases/decreases). The first associative effect found by standard PLS and SPLS is similar to the first found by RCCA in both the ADNI and simulated datasets, but in the HCP dataset, the first associative effect identified by RCCA is more similar to the second effect found by standard PLS and SPLS (Figure 6). This is likely because the within-modality variances in the HCP dataset differ substantially from the identity matrix and therefore the difference between the objectives of CCA and PLS models is more pronounced (see Eqs. (Eq. 1), (Eq. 2)). The brain and behavioural weights were similar across the three models in both real datasets, especially the top-ranked variables (i.e., the variables with the highest weights). Similar to RCCA, standard PLS and SPLS recovered sufficiently the true weights of the generative model, however the SPLS model assigned fewer non-zero weights to non-relevant variables (Figure 7C-E). These results demonstrate that, when the signal is sparse, SPLS can lead to high true positive and high true negative rates of weight recovery (Table S2). Table S4 shows the sparsity of the associative effects identified by SPLS.
      The stability of the weights differed significantly between the brain and behavioural modalities (F(1,804)=75.26, p<0.0001) and the three models (F(2,804)=61.77, p<0.0001) (Table 4). The stability of the SPLS weights was lowest in the HCP dataset, which is likely due to the model’s sparsity and that different sets of variables might provide similar performance. The instability of SPLS could be mitigated by stability selection (
      • Ing A.
      • Sämann P.G.
      • Chu C.
      • Tay N.
      • Biondo F.
      • Robert G.
      • et al.
      Identification of neurobehavioural symptom groups based on shared brain mechanisms.
      ) or a stability criterion during hyperparameter optimization (
      • Mihalik A.
      • Ferreira F.S.
      • Moutoussis M.
      • Ziegler G.
      • Adams R.A.
      • Rosa M.J.
      • et al.
      Multiple Holdouts With Stability: Improving the Generalizability of Machine Learning Analyses of Brain–Behavior Relationships.
      ). The explained variance varied significantly across modalities (F(1,174)=80.00, p<0.0001) and the three models (F(2,174)=28.60, p<0.0001).
      In summary, while RCCA is likely to yield similar or higher out-of-sample correlations than standard PLS and SPLS, SPLS can perform variable selection and may improve the interpretability of the results, however it can also present instabilities. In practice the three models often provide similar weights for the top ranked variables.
      Standard vs. regularized CCA/PLS models in low-dimensional data
      To investigate the effects of regularization in all three low-dimensional datasets, we compared standard CCA, RCCA, standard PLS, and SPLS. The regularized models (RCCA, SPLS) were more stable (F(3,1075)=80.54, p<0.0001) (Table S5) and showed a trend towards higher out-of-sample correlations (F(1,10)=3.35, p=0.0972) (Figure S1) than their non-regularized variants (standard CCA and PLS). The stability of standard PLS and RCCA weights were consistently high, the stability of SPLS varied across datasets, standard CCA was rather unstable (Table S5). SPLS provided sparse results, similar to the high-dimensional datasets (Table S4). As expected, RCCA and standard PLS explained increasingly more within-modality variance than standard CCA. For a detailed description of these results, see the Supplement. Taken together, these results suggest that regularized CCA/PLS models should be preferred even for low-dimensional data.

      Conclusion

      This tutorial compared standard and regularized CCA and PLS models and highlighted the benefits of regularization. Here, we outline the key lessons.
      First, we showed that regularized CCA/PLS models give similar out-of-sample correlations in large datasets (with the exception of standard PLS and SPLS in the high-dimensional HCP dataset) when the sample size is similar or much smaller than the number of variables (i.e., the ratio between examples and variables is ∼1-10 or ∼0.1-0.01). Importantly, RCCA and SPLS outperformed standard CCA and PLS even when the ratio between examples and variables was ∼1-10. Second, we emphasized that it is important to use a predictive framework, since high in-sample correlations do not necessarily imply generalizability to unseen data.
      Going beyond model performance, we demonstrated both in theory and practice that standard CCA is prone to instability (Table S3). L2-norm regularization improves stability, which comes at a cost of the models (RCCA, standard PLS, SPLS) being driven by within-modality variances. PCA-CCA with data-driven selection of PCs improves on a priori selection. Data-driven PCA-CCA has a comparable regularizing effect to RCCA. Sparsity (i.e., L1-norm regularization) can facilitate the interpretability and the generalizability of the models but it can also introduce instability. Sparsity is most useful when the associative effect itself is sparse (e.g., in the ADNI and simulated datasets). Data-driven PCA-CCA, RCCA and SPLS yielded similar model weights and accounted for similar variances.
      We hope that this work together with recent efforts (e.g. (

      Helmer M, Warrington S, Mohammadi-Nejad A-R, Lisa J, Howell A, Rosand B, et al. (2020): On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations. https://doi.org/10.1101/2020.08.25.265546

      ,
      • Mihalik A.
      • Ferreira F.S.
      • Moutoussis M.
      • Ziegler G.
      • Adams R.A.
      • Rosa M.J.
      • et al.
      Multiple Holdouts With Stability: Improving the Generalizability of Machine Learning Analyses of Brain–Behavior Relationships.
      ,
      • Zhuang X.
      • Yang Z.
      • Cordes D.
      A technical review of canonical correlation analysis for neuroscience applications.
      ,

      Wang H-T, Smallwood J, Mourao-Miranda J, Xia CH, Satterthwaite TD, Bassett DS, Bzdok D (2020): Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists. Neuroimage 116745.

      ,
      • Winkler A.M.
      • Renaud O.
      • Smith S.M.
      • Nichols T.E.
      Permutation inference for canonical correlation analysis.
      )) and critical exchanges (e.g. (
      • Dinga R.
      • Schmaal L.
      • Penninx B.W.J.H.
      • van Tol M.J.
      • Veltman D.J.
      • van Velzen L.
      • et al.
      Evaluating the evidence for biotypes of depression: Methodological replication and extension of.
      ,
      • Mihalik A.
      • Adams R.A.
      • Huys Q.
      Canonical Correlation Analysis for Identifying Biotypes of Depression.
      ,
      • Grosenick L.
      • Shi T.C.
      • Gunning F.M.
      • Dubin M.J.
      • Downar J.
      • Liston C.
      Functional and Optogenetic Approaches to Discovering Stable Subtype-Specific Circuit Mechanisms in Depression.
      ,
      • Grosenick L.
      • Liston C.
      Reply to: A Closer Look at Depression Biotypes: Correspondence Relating to Grosenick et al. (2019).
      ,
      • Dinga R.
      • Schmaal L.
      • Marquand A.F.
      A Closer Look at Depression Biotypes: Correspondence Relating to Grosenick et al. (2019).
      )) illuminates these complex methods and facilitates their application to the brain and its disorders.

      Uncited reference

      • Lê Cao K.-A.
      • Rossouw D.
      • Robert-Granié C.
      • Besse P.
      A Sparse PLS for Variable Selection when Integrating Omics Data.
      ,
      • Waaijenborg S.
      • Verselewel de Witt Hamer P.C.
      • Zwinderman A.H.
      Quantifying the Association between Gene Expressions and DNA-Markers by Penalized Canonical Correlation Analysis.
      ,
      • Parkhomenko E.
      • Tritchler D.
      • Beyene J.
      Sparse canonical correlation analysis with application to genomic data integration.
      ,
      • Suo X.
      • Minden V.
      • Nelson B.
      • Tibshirani R.
      • Saunders M.
      Sparse canonical correlation analysis.
      ,
      • Chen M.
      • Gao C.
      • Ren Z.
      • Zhou H.H.
      Sparse CCA via Precision Adjusted Iterative Thresholding.
      ,
      • Gao C.
      • Ma Z.
      • Zhou H.H.
      Sparse CCA: Adaptive estimation and computational barriers.
      ,

      Mai Q, Zhang X (2019): An iterative penalized least squares approach to sparse canonical correlation analysis. Biometrics 734–744.

      ,
      • Rao A.
      • Monteiro J.M.
      • Mourao-Miranda J.
      Predictive modelling using neuroimaging data in the presence of confounds.
      ,

      Dinga R, Schmaal L, Penninx BWJH, Veltman DJ, Marquand AF (2020): Controlling for effects of confounding variables on machine learning predictions. https://doi.org/10.1101/2020.08.17.255034

      ,
      • Song Y.
      • Schreier P.J.
      • Ramírez D.
      • Hasija T.
      Canonical correlation analysis of high-dimensional data with very small sample support.
      ,
      • Rolls E.T.
      • Joliot M.
      • Tzourio-Mazoyer N.
      Implementation of a new parcellation of the orbitofrontal cortex in the automated anatomical labeling atlas.
      ,
      • Folstein M.F.
      • Folstein S.E.
      • McHugh P.R.
      Mini-mental state.
      .

      Acknowledgements

      Agoston Mihalik was funded by the Wellcome Trust (WT102845/Z/13/Z) and by MQ: Transforming Mental Health (MQF17_24). James Chapman was supported by the EPSRC-funded UCL Centre for Doctoral Training in Intelligent, Integrated Imaging in Healthcare (i4health) (EP/S021930/1) and the Department of Health’s NIHR funded Biomedical Research Centre at University College London Hospitals. Rick A. Adams was supported by an MRC Skills Development Fellowship (MR/S007806/1). Nils R. Winter was supported by grants from the German Research Foundation (DFG grants HA7070/2-2, HA7070/3, HA7070/4). Fabio S. Ferreira was funded by a PhD scholarship awarded by Fundacao para a Ciencia e a Tecnologia (SFRH/BD/120640/2016). Janaina Mourão-Miranda was funded by the Wellcome Trust (WT102845/Z/13/Z).

      Supplementary Material

      References

        • Smith S.M.
        • Nichols T.E.
        Statistical Challenges in “Big Data” Human Neuroimaging.
        Neuron. 2018; 97: 263-268
        • Bzdok D.
        • Yeo B.T.T.
        Inference in the age of big data: Future perspectives on neuroscience.
        Neuroimage. 2017; 155: 549-564
        • Bzdok D.
        • Nichols T.E.
        • Smith S.M.
        Towards Algorithmic Analytics for Large-scale Datasets.
        Nat Mach Intell. 2019; 1: 296-306
        • Hotelling H.
        Relations between two sets of variates.
        Biometrika. 1936; 28: 321
      1. Wold H (1985): Partial least squares. In: Kotz S, Johnson N, editors. Encyclopedia of Statistical Sciences. New York: Wiley Online Library, pp 581–591.

        • Kebets V.
        • Holmes A.J.
        • Orban C.
        • Tang S.
        • Li J.
        • Sun N.
        • et al.
        Somatosensory-Motor Dysconnectivity Spans Multiple Transdiagnostic Dimensions of Psychopathology.
        Biol Psychiatry. 2019; 86: 779-791
        • Drysdale A.T.
        • Grosenick L.
        • Downar J.
        • Dunlop K.
        • Mansouri F.
        • Meng Y.
        • et al.
        Resting-state connectivity biomarkers define neurophysiological subtypes of depression.
        Nat Med. 2017; 23: 28-38
        • Moser D.A.
        • Doucet G.E.
        • Lee W.H.
        • Rasgon A.
        • Krinsky H.
        • Leibu E.
        • et al.
        Multivariate associations among behavioral, clinical, and multimodal imaging phenotypes in patients with psychosis.
        JAMA Psychiatry. 2018; 75: 386-395
        • Li J.
        • Bolt T.
        • Bzdok D.
        • Nomi J.S.
        • Yeo B.T.T.
        • Spreng R.N.
        • Uddin L.Q.
        Topography and behavioral relevance of the global signal in the human brain.
        Sci Rep. 2019; 9: 1-10
        • Bijsterbosch J.D.
        • Woolrich M.W.
        • Glasser M.F.
        • Robinson E.C.
        • Beckmann C.F.
        • Van Essen D.C.
        • et al.
        The relationship between spatial configuration and functional connectivity of brain regions.
        Elife. 2018; 7: 1-27
        • Xia C.H.
        • Ma Z.
        • Ciric R.
        • Gu S.
        • Betzel R.F.
        • Kaczkurkin A.N.
        • et al.
        Linked dimensions of psychopathology and connectivity in functional brain networks.
        Nat Commun. 2018; 9: 3003
        • Modabbernia A.
        • Janiri D.
        • Doucet G.E.
        • Reichenberg A.
        • Frangou S.
        Multivariate Patterns of Brain-Behavior-Environment Associations in the Adolescent Brain and Cognitive Development Study.
        Biol Psychiatry. 2021; 89: 510-520
        • Avants B.B.
        • Libon D.J.
        • Rascovsky K.
        • Boller A.
        • McMillan C.T.
        • Massimo L.
        • et al.
        Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population.
        Neuroimage. 2014; 84: 698-711
        • Ziegler G.
        • Dahnke R.
        • Winkler A.D.
        • Gaser C.
        Partial least squares correlation of multivariate cognitive abilities and local brain structure in children and adolescents.
        Neuroimage. 2013; 82: 284-294
        • Jia T.
        • Ing A.
        • Quinlan E.B.
        • Tay N.
        • Luo Q.
        • Francesca B.
        • et al.
        Neurobehavioural characterisation and stratification of reinforcement-related behaviour.
        Nat Hum Behav. 2020; 4: 544-558
        • Le Floch E.
        • Guillemot V.
        • Frouin V.
        • Pinel P.
        • Lalanne C.
        • Trinchera L.
        • et al.
        Significant correlation between a set of genetic polymorphisms and a functional brain network revealed by feature selection and sparse Partial Least Squares.
        Neuroimage. 2012; 63: 11-24
        • Witten D.M.
        • Tibshirani R.
        • Hastie T.
        A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.
        Biostatistics. 2009; 10: 515-534
        • Marquand A.F.
        • Haak K.V.
        • Beckmann C.F.
        Functional corticostriatal connection topographies predict goal-directed behaviour in humans.
        Nat Hum Behav. 2017; 1: 1-9
        • Lin D.
        • Calhoun V.D.
        • Wang Y.P.
        Correspondence between fMRI and SNP data by group sparse canonical correlation analysis.
        Med Image Anal. 2014; 18: 891-902
        • Ing A.
        • Sämann P.G.
        • Chu C.
        • Tay N.
        • Biondo F.
        • Robert G.
        • et al.
        Identification of neurobehavioural symptom groups based on shared brain mechanisms.
        Nat Hum Behav. 2019; 3: 1306-1318
        • Wang H.T.
        • Bzdok D.
        • Margulies D.
        • Craddock C.
        • Milham M.
        • Jefferies E.
        • Smallwood J.
        Patterns of thought: Population variation in the associations between large-scale network organisation and self-reported experiences at rest.
        Neuroimage. 2018; 176: 518-527
        • Smith S.M.
        • Nichols T.E.
        • Vidaurre D.
        • Winkler A.M.
        • Behrens T.E.J.
        • Glasser M.F.
        • et al.
        A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
        Nat Neurosci. 2015; 18: 1565-1567
        • Popovic D.
        • Ruef A.
        • Dwyer D.B.
        • Antonucci L.A.
        • Eder J.
        • Sanfelici R.
        • et al.
        Traces of Trauma: A Multivariate Pattern Analysis of Childhood Trauma, Brain Structure, and Clinical Phenotypes.
        Biol Psychiatry. 2020; 88: 829-842
      2. Alnaes D, Kaufmann T, Marquand AF, Smith SM, Westlye LT (2020): Patterns of sociocognitive stratification and perinatal risk in the child brain. Proc Natl Acad Sci 202001517.

        • Mihalik A.
        • Ferreira F.S.
        • Rosa M.J.
        • Moutoussis M.
        • Ziegler G.
        • Monteiro J.M.
        • et al.
        Brain-behaviour modes of covariation in healthy and clinically depressed young people.
        Sci Rep. 2019; 911536
      3. Helmer M, Warrington S, Mohammadi-Nejad A-R, Lisa J, Howell A, Rosand B, et al. (2020): On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations. https://doi.org/10.1101/2020.08.25.265546

        • Mihalik A.
        • Ferreira F.S.
        • Moutoussis M.
        • Ziegler G.
        • Adams R.A.
        • Rosa M.J.
        • et al.
        Multiple Holdouts With Stability: Improving the Generalizability of Machine Learning Analyses of Brain–Behavior Relationships.
        Biol Psychiatry. 2020; 87: 368-376
        • Dinga R.
        • Schmaal L.
        • Penninx B.W.J.H.
        • van Tol M.J.
        • Veltman D.J.
        • van Velzen L.
        • et al.
        Evaluating the evidence for biotypes of depression: Methodological replication and extension of.
        NeuroImage Clin. 2019; 22101796
        • Uurtio V.
        • Monteiro J.M.
        • Kandola J.
        • Shawe-Taylor J.
        • Fernandez-Reyes D.
        • Rousu J.
        A Tutorial on Canonical Correlation Methods.
        ACM Comput Surv. 2017; 50: 1-33
        • Zhuang X.
        • Yang Z.
        • Cordes D.
        A technical review of canonical correlation analysis for neuroscience applications.
        Hum Brain Mapp. 2020; 41: 3807-3833
      4. Wang H-T, Smallwood J, Mourao-Miranda J, Xia CH, Satterthwaite TD, Bassett DS, Bzdok D (2020): Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists. Neuroimage 116745.

        • Krishnan A.
        • Williams L.J.
        • McIntosh A.R.
        • Abdi H.
        Partial Least Squares (PLS) methods for neuroimaging: A tutorial and review.
        Neuroimage. 2011; 56: 455-475
        • Meredith W.
        Canonical correlations with fallible data.
        Psychometrika. 1964; 29: 55-65
      5. Rosipal R, Krämer N (2006): Overview and Recent Advances in Partial Least Squares. In: Saunders C, Grobelnik M, Gunn S, Shawe-Taylor J, editors. Subspace, Latent Structure and Feature Selection. Berlin, Heidelberg: Springer Berlin Heidelberg, pp 34–51.

        • Golub G.H.
        • Zha H.
        Perturbation analysis of the canonical correlations of matrix pairs.
        Linear Algebra Appl. 1994; 210: 3-28
      6. Wegelin JA (2000): A Survey on Partial Least Squares (PLS) Methods, with Emphasis on the Two-Block Case. Retrieved from https://stat.uw.edu/sites/default/files/files/reports/2000/tr371.pdf

        • Vounou M.
        • Nichols T.E.
        • Montana G.
        Discovering genetic associations with high-dimensional neuroimaging phenotypes: A sparse reduced-rank regression approach.
        Neuroimage. 2010; 53: 1147-1159
        • Knapp T.R.
        Canonical correlation analysis: A general parametric significance-testing system.
        Psychol Bull. 1978; 85: 410-416
        • Izenman A.J.
        Reduced-rank regression for the multivariate linear model.
        J Multivar Anal. 1975; 5: 248-264
        • Tibshirani R.
        Regression Shrinkage and Selection via the Lasso.
        J R Stat Soc Ser B. 1996; 58: 267-288
        • Hoerl A.E.
        • Kennard R.W.
        Ridge Regression: Applications to Nonorthogonal Problems.
        Technometrics. 1970; 12: 69-82
        • Zou H.
        • Hastie T.
        Regularization and variable selection via the elastic net.
        J R Stat Soc Ser B Stat Methodol. 2005; 67: 301-320
        • Vinod H.D.
        Canonical ridge and econometrics of joint production.
        J Econom. 1976; 4: 147-166
        • Hardoon D.R.
        • Szedmak S.
        • Shawe-Taylor J.
        Canonical correlation analysis: An overview with application to learning methods.
        Neural Comput. 2004; 16: 2639-2664
        • Tenenhaus A.
        • Tenenhaus M.
        Regularized Generalized Canonical Correlation Analysis.
        Psychometrika. 2011; 76: 257-284
      7. Tuzhilina E, Tozzi L, Hastie T (2020): Canonical Correlation Analysis in high dimensions with structured regularization. Retrieved from http://arxiv.org/abs/2011.01650

        • Lê Cao K.-A.
        • Rossouw D.
        • Robert-Granié C.
        • Besse P.
        A Sparse PLS for Variable Selection when Integrating Omics Data.
        Stat Appl Genet Mol Biol. 2008; 7
        • Waaijenborg S.
        • Verselewel de Witt Hamer P.C.
        • Zwinderman A.H.
        Quantifying the Association between Gene Expressions and DNA-Markers by Penalized Canonical Correlation Analysis.
        Stat Appl Genet Mol Biol. 2008; 7
        • Parkhomenko E.
        • Tritchler D.
        • Beyene J.
        Sparse canonical correlation analysis with application to genomic data integration.
        Stat Appl Genet Mol Biol. 2009; 8 (Article 1)
        • Monteiro J.M.
        • Rao A.
        • Shawe-Taylor J.
        • Mourão-Miranda J.
        A multiple hold-out framework for Sparse Partial Least Squares.
        J Neurosci Methods. 2016; 271: 182-194
        • Suo X.
        • Minden V.
        • Nelson B.
        • Tibshirani R.
        • Saunders M.
        Sparse canonical correlation analysis.
        Mach Learn. 2017; 83: 331-353
        • Chen M.
        • Gao C.
        • Ren Z.
        • Zhou H.H.
        Sparse CCA via Precision Adjusted Iterative Thresholding.
        Proc Int Conf Artif Intell Stat. 2013; (Retrieved from)
        • Gao C.
        • Ma Z.
        • Zhou H.H.
        Sparse CCA: Adaptive estimation and computational barriers.
        Ann Stat. 2017; 45: 2074-2101
      8. Mai Q, Zhang X (2019): An iterative penalized least squares approach to sparse canonical correlation analysis. Biometrics 734–744.

        • Kessy A.
        • Lewin A.
        • Strimmer K.
        Optimal Whitening and Decorrelation.
        Am Stat. 2018; 72: 309-314
        • Shmueli G.
        To explain or to predict?.
        Stat Sci. 2010; 25: 289-310
      9. Bzdok D, Engemann D, Thirion B (2020): Inference and Prediction Diverge in Biomedicine. Patterns (New York, NY) 1: 100119.

        • Arbabshirani M.R.
        • Plis S.
        • Sui J.
        • Calhoun V.D.
        Single subject prediction of brain disorders in neuroimaging: Promises and pitfalls.
        Neuroimage. 2017; 145: 137-165
      10. Abdi H (2010): Partial least squares regression and projection on latent structure regression (PLS Regression). Wiley Interdiscip Rev Comput Stat 2: 97–106.

        • Winkler A.M.
        • Renaud O.
        • Smith S.M.
        • Nichols T.E.
        Permutation inference for canonical correlation analysis.
        Neuroimage. 2020; 220117065
        • Lê Cao K.-A.
        • Boitard S.
        • Besse P.
        Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems.
        BMC Bioinformatics. 2011; 12: 253
        • Labus J.S.
        • Van Horn J.D.
        • Gupta A.
        • Alaverdyan M.
        • Torgerson C.
        • Ashe-McNalley C.
        • et al.
        Multivariate morphological brain signatures predict patients with chronic abdominal pain from healthy control subjects.
        Pain. 2015; 156: 1545-1554
        • Olson Hunt M.J.
        • Weissfeld L.
        • Boudreau R.M.
        • Aizenstein H.
        • Newman A.B.
        • Simonsick E.M.
        • et al.
        A variant of sparse partial least squares for variable selection and data exploration.
        Front Neuroinform. 2014; 8
        • Winkler A.M.
        • Webster M.A.
        • Vidaurre D.
        • Nichols T.E.
        • Smith S.M.
        Multi-level block permutation.
        Neuroimage. 2015; 123: 253-268
        • Rao A.
        • Monteiro J.M.
        • Mourao-Miranda J.
        Predictive modelling using neuroimaging data in the presence of confounds.
        Neuroimage. 2017; 150: 23-49
      11. Dinga R, Schmaal L, Penninx BWJH, Veltman DJ, Marquand AF (2020): Controlling for effects of confounding variables on machine learning predictions. https://doi.org/10.1101/2020.08.17.255034

        • Mihalik A.
        • Adams R.A.
        • Huys Q.
        Canonical Correlation Analysis for Identifying Biotypes of Depression.
        Biol Psychiatry Cogn Neurosci Neuroimaging. 2020; 5: 478-480
        • Song Y.
        • Schreier P.J.
        • Ramírez D.
        • Hasija T.
        Canonical correlation analysis of high-dimensional data with very small sample support.
        Signal Processing. 2016; 128: 449-458
        • Grosenick L.
        • Shi T.C.
        • Gunning F.M.
        • Dubin M.J.
        • Downar J.
        • Liston C.
        Functional and Optogenetic Approaches to Discovering Stable Subtype-Specific Circuit Mechanisms in Depression.
        Biol Psychiatry Cogn Neurosci Neuroimaging. 2019; 4: 554-566
        • Grosenick L.
        • Liston C.
        Reply to: A Closer Look at Depression Biotypes: Correspondence Relating to Grosenick et al. (2019).
        Biol Psychiatry Cogn Neurosci Neuroimaging. 2020; 5: 556
        • Dinga R.
        • Schmaal L.
        • Marquand A.F.
        A Closer Look at Depression Biotypes: Correspondence Relating to Grosenick et al. (2019).
        Biol Psychiatry Cogn Neurosci Neuroimaging. 2020; 5: 554-555
        • Rolls E.T.
        • Joliot M.
        • Tzourio-Mazoyer N.
        Implementation of a new parcellation of the orbitofrontal cortex in the automated anatomical labeling atlas.
        Neuroimage. 2015; 122: 1-5
        • Folstein M.F.
        • Folstein S.E.
        • McHugh P.R.
        Mini-mental state.
        J Psychiatr Res. 1975; 12: 189-198