Advertisement

Canonical Correlation Analysis and Partial Least Squares for Identifying Brain–Behavior Associations: A Tutorial and a Comparative Study

  • Agoston Mihalik
    Correspondence
    Address correspondence to Agoston Mihalik, Ph.D.
    Affiliations
    Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom

    Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom

    Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
    Search for articles by this author
  • James Chapman
    Affiliations
    Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom

    Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom
    Search for articles by this author
  • Rick A. Adams
    Affiliations
    Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom

    Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom

    Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
    Search for articles by this author
  • Nils R. Winter
    Affiliations
    Institute of Translational Psychiatry, University of Münster, Münster, Germany
    Search for articles by this author
  • Fabio S. Ferreira
    Affiliations
    Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom

    Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom
    Search for articles by this author
  • John Shawe-Taylor
    Affiliations
    Department of Computer Science, University College London, London, United Kingdom
    Search for articles by this author
  • Janaina Mourão-Miranda
    Affiliations
    Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom

    Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom
    Search for articles by this author
  • for theAlzheimer’s Disease Neuroimaging Initiative
Open AccessPublished:August 08, 2022DOI:https://doi.org/10.1016/j.bpsc.2022.07.012

      Abstract

      Canonical correlation analysis (CCA) and partial least squares (PLS) are powerful multivariate methods for capturing associations across 2 modalities of data (e.g., brain and behavior). However, when the sample size is similar to or smaller than the number of variables in the data, standard CCA and PLS models may overfit, i.e., find spurious associations that generalize poorly to new data. Dimensionality reduction and regularized extensions of CCA and PLS have been proposed to address this problem, yet most studies using these approaches have some limitations. This work gives a theoretical and practical introduction into the most common CCA/PLS models and their regularized variants. We examine the limitations of standard CCA and PLS when the sample size is similar to or smaller than the number of variables. We discuss how dimensionality reduction and regularization techniques address this problem and explain their main advantages and disadvantages. We highlight crucial aspects of the CCA/PLS analysis framework, including optimizing the hyperparameters of the model and testing the identified associations for statistical significance. We apply the described CCA/PLS models to simulated data and real data from the Human Connectome Project and Alzheimer’s Disease Neuroimaging Initiative (both of n > 500). We use both low- and high-dimensionality versions of these data (i.e., ratios between sample size and variables in the range of ∼1–10 and ∼0.1–0.01, respectively) to demonstrate the impact of data dimensionality on the models. Finally, we summarize the key lessons of the tutorial.

      Keywords

      Neuroimaging datasets with sample sizes of n > 1000 (e.g., UK Biobank, Human Connectome Project [HCP], Alzheimer’s Disease Neuroimaging Initiative [ADNI]) represent a unique opportunity to advance population neuroscience and mental health (
      • Smith S.M.
      • Nichols T.E.
      Statistical challenges in “big data” human neuroimaging.
      ,
      • Bzdok D.
      • Yeo B.T.T.
      Inference in the age of big data: Future perspectives on neuroscience.
      ,
      • Bzdok D.
      • Nichols T.E.
      • Smith S.M.
      Towards algorithmic analytics for large-scale datasets.
      ). These datasets comprise multiple data modalities (e.g., structural magnetic resonance imaging, resting-state functional magnetic resonance imaging, mental health, cognition, environmental factors and genetics), several of which can be high-dimensional, meaning that there are hundreds or thousands of variables per subject. Understanding the links across these different modalities is fundamental for enabling new discoveries; however, analyzing multimodal datasets with more variables than samples poses technical challenges.
      The most established methods to find associations across multiple modalities of multivariate data are canonical correlation analysis (CCA) (
      • Hotelling H.
      Relations between two sets of variates.
      ) and partial least squares (PLS) (
      • Wold H.
      Partial least squares.
      ). CCA and PLS have recently become very popular, with numerous applications linking brain imaging to behavior or genetics [e.g., (
      • Kebets V.
      • Holmes A.J.
      • Orban C.
      • Tang S.
      • Li J.
      • Sun N.
      • et al.
      Somatosensory-motor dysconnectivity spans multiple transdiagnostic dimensions of psychopathology.
      ,
      • Drysdale A.T.
      • Grosenick L.
      • Downar J.
      • Dunlop K.
      • Mansouri F.
      • Meng Y.
      • et al.
      Resting-state connectivity biomarkers define neurophysiological subtypes of depression.
      ,
      • Moser D.A.
      • Doucet G.E.
      • Lee W.H.
      • Rasgon A.
      • Krinsky H.
      • Leibu E.
      • et al.
      Multivariate associations among behavioral, clinical, and multimodal imaging phenotypes in patients with psychosis.
      ,
      • Li J.
      • Bolt T.
      • Bzdok D.
      • Nomi J.S.
      • Yeo B.T.T.
      • Spreng R.N.
      • Uddin L.Q.
      Topography and behavioral relevance of the global signal in the human brain.
      ,
      • Bijsterbosch J.D.
      • Woolrich M.W.
      • Glasser M.F.
      • Robinson E.C.
      • Beckmann C.F.
      • Van Essen D.C.
      • et al.
      The relationship between spatial configuration and functional connectivity of brain regions.
      ,
      • Xia C.H.
      • Ma Z.
      • Ciric R.
      • Gu S.
      • Betzel R.F.
      • Kaczkurkin A.N.
      • et al.
      Linked dimensions of psychopathology and connectivity in functional brain networks.
      ,
      • Modabbernia A.
      • Janiri D.
      • Doucet G.E.
      • Reichenberg A.
      • Frangou S.
      Multivariate patterns of brain-behavior-environment associations in the Adolescent Brain and Cognitive Development Study.
      ,
      • Avants B.B.
      • Libon D.J.
      • Rascovsky K.
      • Boller A.
      • McMillan C.T.
      • Massimo L.
      • et al.
      Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population.
      ,
      • Ziegler G.
      • Dahnke R.
      • Winkler A.D.
      • Gaser C.
      Partial least squares correlation of multivariate cognitive abilities and local brain structure in children and adolescents.
      ,
      • Jia T.
      • Ing A.
      • Quinlan E.B.
      • Tay N.
      • Luo Q.
      • Francesca B.
      • et al.
      Neurobehavioural characterisation and stratification of reinforcement-related behaviour.
      ,
      • Le Floch E.
      • Guillemot V.
      • Frouin V.
      • Pinel P.
      • Lalanne C.
      • Trinchera L.
      • et al.
      Significant correlation between a set of genetic polymorphisms and a functional brain network revealed by feature selection and sparse partial least squares.
      ,
      • Witten D.M.
      • Tibshirani R.
      • Hastie T.
      A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.
      ,
      • Marquand A.F.
      • Haak K.V.
      • Beckmann C.F.
      Functional corticostriatal connection topographies predict goal-directed behaviour in humans.
      ,
      • Lin D.
      • Calhoun V.D.
      • Wang Y.P.
      Correspondence between fMRI and SNP data by group sparse canonical correlation analysis.
      ,
      • Ing A.
      • Sämann P.G.
      • Chu C.
      • Tay N.
      • Biondo F.
      • Robert G.
      • et al.
      Identification of neurobehavioural symptom groups based on shared brain mechanisms.
      ,
      • Wang H.T.
      • Bzdok D.
      • Margulies D.
      • Craddock C.
      • Milham M.
      • Jefferies E.
      • Smallwood J.
      Patterns of thought: Population variation in the associations between large-scale network organisation and self-reported experiences at rest.
      ,
      • Smith S.M.
      • Nichols T.E.
      • Vidaurre D.
      • Winkler A.M.
      • Behrens T.E.J.
      • Glasser M.F.
      • et al.
      A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
      ,
      • Popovic D.
      • Ruef A.
      • Dwyer D.B.
      • Antonucci L.A.
      • Eder J.
      • Sanfelici R.
      • et al.
      Traces of trauma: A multivariate pattern analysis of childhood trauma, brain structure, and clinical phenotypes.
      ,
      • Alnæs D.
      • Kaufmann T.
      • Marquand A.F.
      • Smith S.M.
      • Westlye L.T.
      Patterns of sociocognitive stratification and perinatal risk in the child brain.
      ,
      • Mihalik A.
      • Ferreira F.S.
      • Rosa M.J.
      • Moutoussis M.
      • Ziegler G.
      • Monteiro J.M.
      • et al.
      Brain-behaviour modes of covariation in healthy and clinically depressed young people.
      ,
      • Helmer M.
      • Warrington S.
      • Mohammadi-Nejad A.-R.
      • Lisa J.
      • Howell A.
      • Rosand B.
      • et al.
      On stability of canonical correlation analysis and partial least squares with application to brain-behavior associations.
      )]. However, when the variables in at least one modality (e.g., brain) outnumber the sample size, standard CCA and PLS models may overfit, i.e., they are more likely to find spurious associations that generalize poorly to independent samples [e.g., (
      • Helmer M.
      • Warrington S.
      • Mohammadi-Nejad A.-R.
      • Lisa J.
      • Howell A.
      • Rosand B.
      • et al.
      On stability of canonical correlation analysis and partial least squares with application to brain-behavior associations.
      ,
      • Mihalik A.
      • Ferreira F.S.
      • Moutoussis M.
      • Ziegler G.
      • Adams R.A.
      • Rosa M.J.
      • et al.
      Multiple holdouts with stability: Improving the generalizability of machine learning analyses of brain–behavior relationships.
      ,
      • Dinga R.
      • Schmaal L.
      • Penninx B.W.J.H.
      • van Tol M.J.
      • Veltman D.J.
      • van Velzen L.
      • et al.
      Evaluating the evidence for biotypes of depression: Methodological replication and extension of.
      )]. Moreover, there is no unique standard CCA solution when the number of variables exceeds the sample size. Two approaches have been proposed to address this problem: 1) reducing the dimensionality of the data with principal component analysis (PCA) (
      • Li J.
      • Bolt T.
      • Bzdok D.
      • Nomi J.S.
      • Yeo B.T.T.
      • Spreng R.N.
      • Uddin L.Q.
      Topography and behavioral relevance of the global signal in the human brain.
      ,
      • Bijsterbosch J.D.
      • Woolrich M.W.
      • Glasser M.F.
      • Robinson E.C.
      • Beckmann C.F.
      • Van Essen D.C.
      • et al.
      The relationship between spatial configuration and functional connectivity of brain regions.
      ,
      • Modabbernia A.
      • Janiri D.
      • Doucet G.E.
      • Reichenberg A.
      • Frangou S.
      Multivariate patterns of brain-behavior-environment associations in the Adolescent Brain and Cognitive Development Study.
      ,
      • Smith S.M.
      • Nichols T.E.
      • Vidaurre D.
      • Winkler A.M.
      • Behrens T.E.J.
      • Glasser M.F.
      • et al.
      A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
      ,
      • Alnæs D.
      • Kaufmann T.
      • Marquand A.F.
      • Smith S.M.
      • Westlye L.T.
      Patterns of sociocognitive stratification and perinatal risk in the child brain.
      ,
      • Helmer M.
      • Warrington S.
      • Mohammadi-Nejad A.-R.
      • Lisa J.
      • Howell A.
      • Rosand B.
      • et al.
      On stability of canonical correlation analysis and partial least squares with application to brain-behavior associations.
      ) and 2) using regularized extensions of CCA and PLS (
      • Xia C.H.
      • Ma Z.
      • Ciric R.
      • Gu S.
      • Betzel R.F.
      • Kaczkurkin A.N.
      • et al.
      Linked dimensions of psychopathology and connectivity in functional brain networks.
      ,
      • Ing A.
      • Sämann P.G.
      • Chu C.
      • Tay N.
      • Biondo F.
      • Robert G.
      • et al.
      Identification of neurobehavioural symptom groups based on shared brain mechanisms.
      ,
      • Popovic D.
      • Ruef A.
      • Dwyer D.B.
      • Antonucci L.A.
      • Eder J.
      • Sanfelici R.
      • et al.
      Traces of trauma: A multivariate pattern analysis of childhood trauma, brain structure, and clinical phenotypes.
      ,
      • Mihalik A.
      • Ferreira F.S.
      • Moutoussis M.
      • Ziegler G.
      • Adams R.A.
      • Rosa M.J.
      • et al.
      Multiple holdouts with stability: Improving the generalizability of machine learning analyses of brain–behavior relationships.
      ). However, most studies using these approaches have potential limitations. For instance, 1) they usually do not optimize the hyperparameters (e.g., the number of principal components [PCs] or amount of regularization) (
      • Li J.
      • Bolt T.
      • Bzdok D.
      • Nomi J.S.
      • Yeo B.T.T.
      • Spreng R.N.
      • Uddin L.Q.
      Topography and behavioral relevance of the global signal in the human brain.
      ,
      • Bijsterbosch J.D.
      • Woolrich M.W.
      • Glasser M.F.
      • Robinson E.C.
      • Beckmann C.F.
      • Van Essen D.C.
      • et al.
      The relationship between spatial configuration and functional connectivity of brain regions.
      ,
      • Modabbernia A.
      • Janiri D.
      • Doucet G.E.
      • Reichenberg A.
      • Frangou S.
      Multivariate patterns of brain-behavior-environment associations in the Adolescent Brain and Cognitive Development Study.
      ,
      • Jia T.
      • Ing A.
      • Quinlan E.B.
      • Tay N.
      • Luo Q.
      • Francesca B.
      • et al.
      Neurobehavioural characterisation and stratification of reinforcement-related behaviour.
      ,
      • Smith S.M.
      • Nichols T.E.
      • Vidaurre D.
      • Winkler A.M.
      • Behrens T.E.J.
      • Glasser M.F.
      • et al.
      A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
      ,
      • Alnæs D.
      • Kaufmann T.
      • Marquand A.F.
      • Smith S.M.
      • Westlye L.T.
      Patterns of sociocognitive stratification and perinatal risk in the child brain.
      ,
      • Helmer M.
      • Warrington S.
      • Mohammadi-Nejad A.-R.
      • Lisa J.
      • Howell A.
      • Rosand B.
      • et al.
      On stability of canonical correlation analysis and partial least squares with application to brain-behavior associations.
      ); 2) many studies do not test the significance of the associations using hold-out data (e.g., out-of-sample correlation) (
      • Drysdale A.T.
      • Grosenick L.
      • Downar J.
      • Dunlop K.
      • Mansouri F.
      • Meng Y.
      • et al.
      Resting-state connectivity biomarkers define neurophysiological subtypes of depression.
      ,
      • Li J.
      • Bolt T.
      • Bzdok D.
      • Nomi J.S.
      • Yeo B.T.T.
      • Spreng R.N.
      • Uddin L.Q.
      Topography and behavioral relevance of the global signal in the human brain.
      ,
      • Bijsterbosch J.D.
      • Woolrich M.W.
      • Glasser M.F.
      • Robinson E.C.
      • Beckmann C.F.
      • Van Essen D.C.
      • et al.
      The relationship between spatial configuration and functional connectivity of brain regions.
      ,
      • Xia C.H.
      • Ma Z.
      • Ciric R.
      • Gu S.
      • Betzel R.F.
      • Kaczkurkin A.N.
      • et al.
      Linked dimensions of psychopathology and connectivity in functional brain networks.
      ,
      • Smith S.M.
      • Nichols T.E.
      • Vidaurre D.
      • Winkler A.M.
      • Behrens T.E.J.
      • Glasser M.F.
      • et al.
      A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
      ); and 3) they often do not assess the stability of the CCA/PLS model (
      • Drysdale A.T.
      • Grosenick L.
      • Downar J.
      • Dunlop K.
      • Mansouri F.
      • Meng Y.
      • et al.
      Resting-state connectivity biomarkers define neurophysiological subtypes of depression.
      ,
      • Li J.
      • Bolt T.
      • Bzdok D.
      • Nomi J.S.
      • Yeo B.T.T.
      • Spreng R.N.
      • Uddin L.Q.
      Topography and behavioral relevance of the global signal in the human brain.
      ,
      • Marquand A.F.
      • Haak K.V.
      • Beckmann C.F.
      Functional corticostriatal connection topographies predict goal-directed behaviour in humans.
      ,
      • Wang H.T.
      • Bzdok D.
      • Margulies D.
      • Craddock C.
      • Milham M.
      • Jefferies E.
      • Smallwood J.
      Patterns of thought: Population variation in the associations between large-scale network organisation and self-reported experiences at rest.
      ,
      • Smith S.M.
      • Nichols T.E.
      • Vidaurre D.
      • Winkler A.M.
      • Behrens T.E.J.
      • Glasser M.F.
      • et al.
      A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
      ,
      • Popovic D.
      • Ruef A.
      • Dwyer D.B.
      • Antonucci L.A.
      • Eder J.
      • Sanfelici R.
      • et al.
      Traces of trauma: A multivariate pattern analysis of childhood trauma, brain structure, and clinical phenotypes.
      ,
      • Alnæs D.
      • Kaufmann T.
      • Marquand A.F.
      • Smith S.M.
      • Westlye L.T.
      Patterns of sociocognitive stratification and perinatal risk in the child brain.
      ,
      • Mihalik A.
      • Ferreira F.S.
      • Rosa M.J.
      • Moutoussis M.
      • Ziegler G.
      • Monteiro J.M.
      • et al.
      Brain-behaviour modes of covariation in healthy and clinically depressed young people.
      ). Finally, few studies compare different CCA/PLS models and analytic frameworks across different datasets with different dimensionalities [e.g., (
      • Mihalik A.
      • Ferreira F.S.
      • Rosa M.J.
      • Moutoussis M.
      • Ziegler G.
      • Monteiro J.M.
      • et al.
      Brain-behaviour modes of covariation in healthy and clinically depressed young people.
      ,
      • Helmer M.
      • Warrington S.
      • Mohammadi-Nejad A.-R.
      • Lisa J.
      • Howell A.
      • Rosand B.
      • et al.
      On stability of canonical correlation analysis and partial least squares with application to brain-behavior associations.
      ,
      • Mihalik A.
      • Ferreira F.S.
      • Moutoussis M.
      • Ziegler G.
      • Adams R.A.
      • Rosa M.J.
      • et al.
      Multiple holdouts with stability: Improving the generalizability of machine learning analyses of brain–behavior relationships.
      )].
      Several tutorial papers were recently published on CCA and PLS (
      • Uurtio V.
      • Monteiro J.M.
      • Kandola J.
      • Shawe-Taylor J.
      • Fernandez-Reyes D.
      • Rousu J.
      A tutorial on canonical correlation methods.
      ,
      • Zhuang X.
      • Yang Z.
      • Cordes D.
      A technical review of canonical correlation analysis for neuroscience applications.
      ,
      • Wang H.-T.
      • Smallwood J.
      • Mourao-Miranda J.
      • Xia C.H.
      • Satterthwaite T.D.
      • Bassett D.S.
      • Bzdok D.
      Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists.
      ,
      • Krishnan A.
      • Williams L.J.
      • McIntosh A.R.
      • Abdi H.
      Partial least squares (PLS) methods for neuroimaging: A tutorial and review.
      ). Here, we complement these tutorials by discussing some important conceptual and practical aspects of these methods. These comprise 1) the advantages and disadvantages of the various CCA/PLS models, 2) the impact of PCA and regularization on these models (e.g., on overfitting and stability), and 3) the importance of the analytic framework in optimizing the models’ hyperparameters and performing statistical inference.
      In Part 1, we present the theoretical background of these models and discuss the most common strategies to mitigate the problems caused when the ratio between sample size and number of variables is small (e.g., around ∼0.1–0.01). We also examine the most prevalent analytical frameworks used with CCA/PLS models. In Part 2, we apply the models introduced in Part 1 to simulated data and real data from the HCP and ADNI (n > 500 in all). We illustrate how the different CCA/PLS models perform with data dimensionalities often used in practice (i.e., ratios between sample size and number of variables in the ranges of ∼1–10 or ∼0.1–0.01). Moreover, we show that regularization can be helpful even when the number of variables in both data modalities is smaller than the sample size. Mathematical details of the CCA/PLS models and their connections are provided in the Supplement.

      Part 1: Technical Background of CCA and PLS

      CCA/PLS Optimization and Nomenclature

      CCA (
      • Hotelling H.
      Relations between two sets of variates.
      ) and PLS (
      • Wold H.
      Partial least squares.
      ) are multivariate latent variable models that capture associations across 2 modalities of data (e.g., brain and behavior). For example (Figure 1), X contains voxel-level brain variables and Y contains behavioral variables from item-level self-report questionnaires (and are matrices with rows and columns representing subjects and variables, respectively). Standard CCA/PLS models find pairs of brain and behavioral weights wx and wy (column vectors) such that the linear combination (weighted sum) of the brain and behavioral variables maximizes the correlation (CCA) or covariance (PLS) between the resulting latent variables, i.e., between ξ=Xwx and ω=Ywy, respectively.
      Figure thumbnail gr1
      Figure 1Overview of canonical correlation analysis/partial least squares (CCA/PLS) models for investigating brain–behavior associations. CCA/PLS models maximize the correlation (CCA) or covariance (PLS) between latent variables extracted as weighted linear combinations of the brain and behavioral variables (see formulae in text). Note that the weights are column vectors but are represented as rows to highlight that they have the same dimensionality as their respective data modality.
      In the PLS literature, the weights are often referred to as saliences and the latent variables as scores. In the CCA literature, the weights are often referred to as canonical vectors, the latent variables as canonical variates, and the correlation between the latent variables as canonical correlations. The brain and behavior weights have the same dimensionality as their respective data modality (e.g., number of brain/behavioral variables) and quantify each brain and behavioral variable’s contribution to the identified association. Sometimes, Pearson's correlations between the brain and behavioral variables and their respective latent variable are presented instead of the model’s weights and are called structure correlations (CCA) (
      • Meredith W.
      Canonical correlations with fallible data.
      ) or loadings (PLS) (
      • Rosipal R.
      • Krämer N.
      Overview and recent advances in partial least squares.
      ) (for details, see the Supplement). The latent variables (one latent variable score per data modality and subject) quantify how the associative effect is expressed across the sample. Table 1 summarizes the different nomenclatures used in the CCA and PLS literature.
      Table 1Different Nomenclatures in CCA and PLS Literature and Summary of the Corresponding Terms
      ModelRelationshipModel WeightsLatent VariableCorrelation Between Original Variables and Latent Variable
      CCAMode/associationCanonical vector/coefficientCanonical variable/variateStructure correlation
      PLSAssociationSalienceScoreLoading
      CCA, canonical correlation analysis; PLS, partial least squares.
      While standard CCA refers to a single method, standard PLS refers to a family of methods with different modeling aims (e.g., assuming a symmetric or asymmetric relationship between the 2 data modalities; for details, see the Supplement). Both standard CCA and PLS can be solved by iterative [e.g., alternating least squares (
      • Golub G.H.
      • Zha H.
      Perturbation analysis of the canonical correlations of matrix pairs.
      ), nonlinear iterative PLS (
      • Wegelin J.A.
      A survey on partial least squares (PLS) methods, with emphasis on the two-block case. Technical Report No. 371.
      )] and noniterative [e.g., eigenvalue problem (
      • Uurtio V.
      • Monteiro J.M.
      • Kandola J.
      • Shawe-Taylor J.
      • Fernandez-Reyes D.
      • Rousu J.
      A tutorial on canonical correlation methods.
      ,
      • Rosipal R.
      • Krämer N.
      Overview and recent advances in partial least squares.
      )] methods. In the case of iterative methods, once a pair of weights is obtained, the corresponding associative effect is removed from the data (by a process called deflation) and new associations are sought.
      Because standard CCA maximizes the correlation between the latent variables, it is more sensitive to the direction of the relationships across modalities, and it is not driven by within-modality variances. On the other hand, standard PLS—which maximizes covariance—is less sensitive to the direction of the across-modality relationships, as it is also driven by within-modality variances. Formally, we can see this from the optimization of these models. Standard CCA optimizes correlation across modalities:
      maxwx,wycorrXwx,Ywy
      (1)


      Standard PLS optimizes covariance across modalities—the product of correlation and standard deviations (i.e., square root of variance):
      maxwx,wycovXwx,Ywy=corrXwx,Ywyvar(Xwx)var(Ywy)
      (2)


      This also means that standard CCA and PLS are equivalent optimization problems when varXwx=varYwy=1, which is true when the within-modality variances are identity matrices, i.e., XTX=YTY=I.

      Limitations of Standard CCA/PLS Models

      When the ratio between the sample size and the number of variables is similar to or smaller than 1, standard CCA/PLS models present limitations. These limitations can exist irrespective of sample size if the number of variables is large or if the variables are highly correlated. In the case of standard CCA, the key limitations are that 1) the optimization is ill-posed (i.e., there is no unique solution) when the number of variables in at least one of the modalities exceeds the sample size and 2) the CCA weights wx and wy are unstable when the variables within one or both modalities are highly correlated, known as the multicollinearity problem (
      • Vounou M.
      • Nichols T.E.
      • Montana G.
      Discovering genetic associations with high-dimensional neuroimaging phenotypes: A sparse reduced-rank regression approach.
      ). Not surprisingly, these limitations might sound familiar, as standard CCA can be viewed as a multivariate extension of the univariate general linear model (
      • Knapp T.R.
      Canonical correlation analysis: A general parametric significance-testing system.
      ,
      • Izenman A.J.
      Reduced-rank regression for the multivariate linear model.
      ). The standard PLS optimization is never ill-posed and copes with multicollinearity [i.e., standard PLS weights are stable (
      • Wegelin J.A.
      A survey on partial least squares (PLS) methods, with emphasis on the two-block case. Technical Report No. 371.
      )]; however, standard PLS and CCA cannot perform feature selection (i.e., setting the weights of some variables to zero) and may therefore have low performance in cases in which the effects are sparse.
      These limitations can be addressed by dimensionality reduction (i.e., PCA) or regularization. Regularization adds further constraints to the optimization to solve an ill-posed problem or prevent overfitting. For CCA/PLS models, the most common forms of regularization are L1-norm (lasso) (
      • Tibshirani R.
      Regression shrinkage and selection via the Lasso.
      ), L2-norm (ridge) (
      • Hoerl A.E.
      • Kennard R.W.
      Ridge regression: Applications to nonorthogonal problems.
      ), and combinations of L1-norm and L2-norm regularization (elastic-net) (
      • Zou H.
      • Hastie T.
      Regularization and variable selection via the elastic net.
      ).

      CCA With PCA Dimensionality Reduction

      PCA transforms one modality of multivariate data into uncorrelated PCs (it is also related to whitening, see Effects of Prewhitening on CCA/PLS Models). PCA is often used as a naïve dimensionality reduction technique, as PCs explaining little variance are assumed to be noise and discarded, and the remaining PCs are entered into standard CCA. However, PCA when applied before CCA (PCA-CCA) can be also seen as a technique similar to regularization: It makes the CCA model well posed and addresses the multicollinearity problem.
      The number of retained PCs can be selected based on their explained variance, e.g., 99% of total variance. In PCA-CCA applications, often the same number of PCs are chosen for both data modalities, based on the lower-dimensional data, usually behavior [e.g., (
      • Li J.
      • Bolt T.
      • Bzdok D.
      • Nomi J.S.
      • Yeo B.T.T.
      • Spreng R.N.
      • Uddin L.Q.
      Topography and behavioral relevance of the global signal in the human brain.
      ,
      • Bijsterbosch J.D.
      • Woolrich M.W.
      • Glasser M.F.
      • Robinson E.C.
      • Beckmann C.F.
      • Van Essen D.C.
      • et al.
      The relationship between spatial configuration and functional connectivity of brain regions.
      ,
      • Smith S.M.
      • Nichols T.E.
      • Vidaurre D.
      • Winkler A.M.
      • Behrens T.E.J.
      • Glasser M.F.
      • et al.
      A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
      ,
      • Alnæs D.
      • Kaufmann T.
      • Marquand A.F.
      • Smith S.M.
      • Westlye L.T.
      Patterns of sociocognitive stratification and perinatal risk in the child brain.
      )]. Sometimes, the same proportion of explained variance—rather than numbers of PCs—is used for both data modalities [e.g., (
      • Modabbernia A.
      • Janiri D.
      • Doucet G.E.
      • Reichenberg A.
      • Frangou S.
      Multivariate patterns of brain-behavior-environment associations in the Adolescent Brain and Cognitive Development Study.
      ,
      • Helmer M.
      • Warrington S.
      • Mohammadi-Nejad A.-R.
      • Lisa J.
      • Howell A.
      • Rosand B.
      • et al.
      On stability of canonical correlation analysis and partial least squares with application to brain-behavior associations.
      )]. One problem with discarding PCs with low variance is that there is no guarantee that PCs with high variance in either modality are best to link the different data modalities, while some discarded PCs might contain useful information. To address this problem, we can use a data-driven approach, by selecting the number of PCs that maximize the correlation across modalities (see CCA With PCA Dimensionality Reduction Versus RCCA in High-Dimensional Data in Part 2).

      Regularized CCA

      L2-norm regularization is a popular form of regularization for ill-posed problems or for mitigating the effects of multicollinearity, originally used in ridge regression (
      • Hoerl A.E.
      • Kennard R.W.
      Ridge regression: Applications to nonorthogonal problems.
      ). In L2-norm regularization, the added constraint corresponds to the sum of squares of all weight values,
      L2-norm:w2=iwi2, where w=(w1,w2,,wn) is a vector of size n.
      which forces the weights to be small but does not make them zero. L2-norm regularization has been proposed for CCA (
      • Vinod H.D.
      Canonical ridge and econometrics of joint production.
      ), commonly referred to as regularized CCA (RCCA) (
      • Rosipal R.
      • Krämer N.
      Overview and recent advances in partial least squares.
      ,
      • Hardoon D.R.
      • Szedmak S.
      • Shawe-Taylor J.
      Canonical correlation analysis: An overview with application to learning methods.
      ,
      • Tenenhaus A.
      • Tenenhaus M.
      Regularized generalized canonical correlation analysis.
      ,
      • Tuzhilina E.
      • Tozzi L.
      • Hastie T.
      Canonical correlation analysis in high dimensions with structured regularization.
      ). Interestingly, in RCCA, the regularization terms added to the CCA problem lead to a mixture of standard CCA and standard PLS optimization. We can see this from the RCCA optimization problem:
      maxwx,wycorrXwx,YwyvarXwxvarYwy1cxvarXwx+cx1cyvarYwy+cy
      (3)


      where the 2 hyperparameters (cx, cy) control the amount of regularization and provide a smooth transition between standard CCA (cx=cy=0, not regularized) and standard PLS (cx=cy=1, most regularized) (
      • Rosipal R.
      • Krämer N.
      Overview and recent advances in partial least squares.
      ,
      • Hardoon D.R.
      • Szedmak S.
      • Shawe-Taylor J.
      Canonical correlation analysis: An overview with application to learning methods.
      ). Importantly, as L2-norm regularization mitigates multicollinearity, it increases the stability of the RCCA weights. However, it also means that similar to standard PLS, RCCA can be driven by within-modality variances. For additional connections between standard CCA, RCCA, standard PLS, and how they are related to PCA-CCA, see the Supplement.

      Sparse PLS

      L1-norm regularization was originally proposed in lasso regression (
      • Tibshirani R.
      Regression shrinkage and selection via the Lasso.
      ). In L1-norm regularization, the added constraint corresponds to the absolute sum of weight values,
      L1-norm:w1=i|wi|, where w=(w1,w2,,wn) is a vector of size n.
      which sets some of the weight values to zero, resulting in variable selection and promoting sparsity. Sparse solutions facilitate the interpretability of the model and may improve performance when only a subset of variables is relevant (
      • Tibshirani R.
      Regression shrinkage and selection via the Lasso.
      ). However, sparsity can also introduce instability to the model if different sets of variables provide similar performance. Elastic-net regularization is a mixture of L1-norm and L2-norm regularization that combines the properties of both forms of regularization and can mitigate the instability of L1-norm regularization (
      • Zou H.
      • Hastie T.
      Regularization and variable selection via the elastic net.
      ). In one popular algorithm (
      • Witten D.M.
      • Tibshirani R.
      • Hastie T.
      A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.
      ), which we will refer to as sparse PLS (SPLS), hyperparameters control the amount of L1-norm regularization or sparsity. Because standard PLS can be seen as CCA with maximal L2-norm regularization (see previous section), SPLS can also be viewed as an elastic-net regularized CCA (for details, see the Supplement).

      Effects of Prewhitening on CCA/PLS Models

      In machine learning, data are often whitened as a preprocessing step. Whitening transforms the original variables into new, uncorrelated features, which are normalized to have unit length (i.e., the L2-norm of each feature equals 1). Whitening is not a unique transformation, and the most commonly used forms are PCA, Mahalanobis whitening, and Cholesky whitening (
      • Kessy A.
      • Lewin A.
      • Strimmer K.
      Optimal whitening and decorrelation.
      ). The critical difference between PCA and PCA whitening is that PCA retains the variance of the original data, i.e., the PCs are not normalized to have unit length.
      Whitening as a preprocessing step has a major drawback in CCA/PLS models: The beneficial effects of L1-norm and L2-norm regularization on the original variables cannot be achieved anymore, as the whitened data are the new inputs of the model. In the case of SPLS, L1-norm regularization will result in sparsity on the whitened variables (instead of on the original variables); thus, the interpretability of the results will not be facilitated. In the case of RCCA, L2-norm regularization is not active on whitened data, which means that standard CCA, RCCA, and standard PLS will yield the same results. For additional details on whitening, see the Supplement.

      Analytic Frameworks for CCA/PLS Models

      The statistical significance of the CCA/PLS model (i.e., the number of significant associative effects) can be evaluated using either a descriptive or a predictive (also referred to as a machine learning) framework. The 2 frameworks have distinct goals: The aim of the descriptive framework is to detect above-chance associations in the current dataset, whereas the aim of the predictive framework is to test whether such associations generalize to new data (
      • Kessy A.
      • Lewin A.
      • Strimmer K.
      Optimal whitening and decorrelation.
      ,
      • Shmueli G.
      To explain or to predict?.
      ,
      • Bzdok D.
      • Engemann D.
      • Thirion B.
      ,
      • Arbabshirani M.R.
      • Plis S.
      • Sui J.
      • Calhoun V.D.
      Single subject prediction of brain disorders in neuroimaging: Promises and pitfalls.
      ,
      • Abdi H.
      ).
      In the descriptive framework (Figure 2A), the CCA/PLS model is fitted on the entire sample; thus, the statistical inference is based on in-sample correlation. In this framework, there is usually no hyperparameter optimization (i.e., the number of PCs or regularization parameter is fixed a priori). In the predictive framework (Figure 2B), the CCA/PLS model is fitted on a training/optimization set and evaluated on a test/holdout set; thus, the statistical inference is based on out-of-sample correlation. This procedure assesses the generalizability of the model, i.e., how well the association found in the training set generalizes to an independent test set. In the predictive framework, the hyperparameters are usually optimized; therefore, the training/optimization set is further divided into a training set and a validation set, and the best hyperparameters are selected based on out-of-sample correlation in the validation set. In both descriptive and predictive frameworks, permutation inference (based on in-sample or out-of-sample correlation) is often used to assess the number of significant associative effects (
      • Abdi H.
      ,
      • Winkler A.M.
      • Renaud O.
      • Smith S.M.
      • Nichols T.E.
      Permutation inference for canonical correlation analysis.
      ).
      Figure thumbnail gr2
      Figure 2Descriptive and predictive (or machine learning) frameworks. (A) The descriptive framework fits canonical correlation analysis/partial least squares (CCA/PLS) model with fixed hyperparameters (i.e., the number of principal components or regularization parameter) on the entire sample; thus, the statistical inference is based on in-sample correlation. (B) The predictive (or machine learning) framework fits CCA/PLS model on a training set and evaluates the model on a test set; thus, the statistical inference is based on out-of-sample correlation. The hyperparameters are usually optimized: The training set is further divided into a training set and a validation set, and the best hyperparameters are selected based on out-of-sample correlation in the validation set. We note that although not all models maximize correlation, typically all CCA/PLS models are evaluated based on the correlation between the latent variables (see ).
      Last, an important component of any CCA/PLS framework is testing the stability of the model. Usually a bootstrapping procedure is applied to provide confidence intervals on the model’s weights (
      • Abdi H.
      ). Recently, stability selection (
      • Lin D.
      • Calhoun V.D.
      • Wang Y.P.
      Correspondence between fMRI and SNP data by group sparse canonical correlation analysis.
      ,
      • Ing A.
      • Sämann P.G.
      • Chu C.
      • Tay N.
      • Biondo F.
      • Robert G.
      • et al.
      Identification of neurobehavioural symptom groups based on shared brain mechanisms.
      ,
      • Lê Cao K.-A.
      • Boitard S.
      • Besse P.
      Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems.
      ,
      • Labus J.S.
      • Van Horn J.D.
      • Gupta A.
      • Alaverdyan M.
      • Torgerson C.
      • Ashe-McNalley C.
      • et al.
      Multivariate morphological brain signatures predict patients with chronic abdominal pain from healthy control subjects.
      ,
      • Olson Hunt M.J.
      • Weissfeld L.
      • Boudreau R.M.
      • Aizenstein H.
      • Newman A.B.
      • Simonsick E.M.
      • et al.
      A variant of sparse partial least squares for variable selection and data exploration.
      ) has been proposed with the aim of selecting the most stable CCA/PLS model in the first place, rather than evaluating the stability of the model post hoc. Alternatively, the stability of the CCA/PLS models can be measured as the average similarity of weights across different splits of training data, which avoids the additional computational costs of the previous 2 approaches (
      • Mihalik A.
      • Ferreira F.S.
      • Moutoussis M.
      • Ziegler G.
      • Adams R.A.
      • Rosa M.J.
      • et al.
      Multiple holdouts with stability: Improving the generalizability of machine learning analyses of brain–behavior relationships.
      ). For more details on analytic frameworks, see for example (
      • Smith S.M.
      • Nichols T.E.
      • Vidaurre D.
      • Winkler A.M.
      • Behrens T.E.J.
      • Glasser M.F.
      • et al.
      A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
      ,
      • Mihalik A.
      • Ferreira F.S.
      • Moutoussis M.
      • Ziegler G.
      • Adams R.A.
      • Rosa M.J.
      • et al.
      Multiple holdouts with stability: Improving the generalizability of machine learning analyses of brain–behavior relationships.
      ,
      • Monteiro J.M.
      • Rao A.
      • Shawe-Taylor J.
      • Mourão-Miranda J.
      A multiple hold-out framework for Sparse partial least squares.
      ,
      • Abdi H.
      ).

      Part 2: Demonstrations of CCA and PLS Analyses

      Description of Experiments

      In order to demonstrate the properties of different CCA and PLS approaches, we applied the models introduced in Part 1 to real and simulated datasets with different dimensionalities and sample sizes. Table 2 gives an overview of all experiments.
      Table 2Summary of CCA/PLS Models on High- and Low-Dimensional Real and Simulated Data


      Model
      Analytical FrameworkHyperparameter OptimizationModel Hyperparameter
      High-Dimensional Data
      PCA-CCADescriptiveNone (fixed)Number of PCs
      PCA-CCAPredictiveNone (fixed)Number of PCs
      PCA-CCAPredictiveData-drivenNumber of PCs
      RCCAPredictiveData-drivenAmount of L2-norm regularization
      Standard PLSPredictiveNoneNone
      SPLSPredictiveData-drivenAmount of L1-norm regularization
      Low-Dimensional Data
      Standard CCAPredictiveNoneNone
      RCCAPredictiveData-drivenAmount of L2-norm regularization
      Standard PLSPredictiveNoneNone
      SPLSPredictiveData-drivenAmount of L1-norm regularization
      CCA, canonical correlation analysis; PC, principal component; PCA, principal component analysis; PLS, partial least squares; RCCA, regularized canonical correlation analysis; SPLS, sparse partial least squares.
      We chose the HCP and the ADNI datasets based on 2 recent landmark studies (
      • Smith S.M.
      • Nichols T.E.
      • Vidaurre D.
      • Winkler A.M.
      • Behrens T.E.J.
      • Glasser M.F.
      • et al.
      A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
      ,
      • Monteiro J.M.
      • Rao A.
      • Shawe-Taylor J.
      • Mourão-Miranda J.
      A multiple hold-out framework for Sparse partial least squares.
      ). In the HCP dataset, we used resting-state functional magnetic resonance imaging connectivity data (19,900 and 300 brain variables in the high- and low-dimensional data, respectively) and 145 nonimaging subject measures (e.g., behavioral, demographic, lifestyle measures) of 1003 healthy subjects. In the ADNI dataset, we used whole-brain gray matter volumes (168,130 and 120 brain variables in the high- and low-dimensional data, respectively) and 31 item-level measures of the Mini-Mental State Examination of 592 elderly subjects. We generated the simulated data with a sparse signal (i.e., 10% of the variables in each modality were relevant to capture the association across modalities) and properties similar to the HCP dataset (in terms of sample size, dimensionality, and correlation between latent variables). Table 3 displays the characteristics of the real and simulated datasets. For further details of the datasets and the simulated data generation, see the Supplement.
      Table 3Characteristics of Real and Simulated Datasets


      Data
      HCPADNISimulation
      Low DimensionalHigh DimensionalLow DimensionalHigh DimensionalLow DimensionalHigh Dimensional
      SubjectsHealthy (n = 1001)Healthy (n = 1001)Healthy + clinical (n = 592)Healthy + clinical (n = 592)Not applicable (n = 1000)Not applicable (n = 1000)
      Brain VariablesConnectivity of 25 ICA
      Data-driven brain parcellation.
      components (d = 300)
      Connectivity of 200 ICA
      Data-driven brain parcellation.
      components (d = 19900)
      ROI-wise
      Brain parcellation using the Automated Anatomical Labeling 2 atlas (62).
      gray matter volume (d = 120)
      Voxelwise gray matter volume (d = 168130)Not applicable (d = 100)Not applicable (d = 20000)
      Behavioral VariablesBehavior, psychometrics, demographics (d = 145)Behavior, psychometrics, demographics (d = 145)Items of MMSE
      Screening questionnaire for dementia (63).
      questionnaire (d = 31)
      Items of MMSE
      Screening questionnaire for dementia (63).
      questionnaire (d = 31)
      Not applicable (d = 100)Not applicable (d = 100)
      ADNI, Alzheimer’s Disease Neuroimaging Initiative; d, number of variables; HCP, Human Connectome Project; ICA, independent component analysis; ROI, region of interest; MMSE, Mini-Mental State Examination.
      a Data-driven brain parcellation.
      b Brain parcellation using the Automated Anatomical Labeling 2 atlas (
      • Rolls E.T.
      • Joliot M.
      • Tzourio-Mazoyer N.
      Implementation of a new parcellation of the orbitofrontal cortex in the automated anatomical labeling atlas.
      ).
      c Screening questionnaire for dementia (
      • Folstein M.F.
      • Folstein S.E.
      • McHugh P.R.
      “Mini-mental state”. A practical method for grading the cognitive state of patients for the clinician.
      ).
      The PCA-CCA model was used both with fixed numbers of PCs within a descriptive framework and with optimized number of PCs within a predictive framework. All the other CCA/PLS models were used within a predictive framework. The predictive framework was based on Monteiro et al. (
      • Shmueli G.
      To explain or to predict?.
      ), who used multiple test/holdout sets to assess the generalizability and robustness of the CCA/PLS models (detailed in the Supplement). In both frameworks, permutation testing was used to assess the number of statistically significant associative effects based on in-sample and out-of-sample correlations between the latent variables, respectively. Importantly, the family structure of the HCP dataset was respected during the different data splits (training, validation, test/holdout sets) and permutations (
      • Winkler A.M.
      • Webster M.A.
      • Vidaurre D.
      • Nichols T.E.
      • Smith S.M.
      Multi-level block permutation.
      ). We used iterative methods to solve the CCA/PLS model and applied mode-A deflation for standard PLS and SPLS and generalized deflation for standard CCA, PCA-CCA, and RCCA (for details, see the Supplement). For simplicity, we present the results for the first associative effect in most CCA/PLS experiments (for a summary of all associative effects, see Table S1). Throughout the article, we present the weights (canonical vector for CCA models, salience for PLS models) and latent variables obtained by the model.
      We used linear mixed-effects models to compare the different CCA/PLS models on the following measures across the outer training or test sets: 1) in-sample correlation, 2) out-of-sample correlation, 3) similarity of the model weights (measured by Pearson’s correlation), and 4) variance explained by the model. In addition, we compared the number of PCs between PCA-CCA models with fixed versus data-driven numbers of PCs. We report significance at p < .005 in all linear mixed-effects models. For further details of the linear mixed-effects analyses, see the Supplement. We also quantified the rank similarity of the weights (measured by Spearman’s correlation) across the different CCA/PLS models in the real datasets.

      In-Sample Versus Out-of-Sample Correlation in High-Dimensional Data

      Figure 3 and Table 4 display the in-sample and out-of-sample correlations for all experiments using all 3 high-dimensional datasets. On average the out-of-sample correlations are lower than the in-sample correlations (t14 = 4.51, p = .0005). In real datasets, CCA/PLS models with dimensionality reduction or regularization provide high out-of-sample correlations in most cases, underlining that these models generalize well to unseen data. The only notable exceptions are standard PLS and SPLS, which presented significantly lower out-of-sample correlations in the HCP dataset (F2,56 = 289.30, p < .0001) (Figure 3B). This can be attributed to the different properties of the HCP dataset (e.g., higher noise level and nonsparse associative effect) and the fact that standard PLS and SPLS are especially dominated by within-modality variance in this dataset (Table 4).
      Figure thumbnail gr3
      Figure 3Dot plot of in-sample and out-of-sample correlations for the first associative effects of all experiments in all 3 high-dimensional datasets. Each dot represents a model trained on the overall data (descriptive framework) or on 10 random subsets of the data (predictive framework). The horizontal jitter is for visualization purposes. (A) High-dimensional Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset. (B) High-dimensional Human Connectome Project (HCP) dataset. Note that we display the second associative effect for standard partial least squares (PLS) and sparse PLS (SPLS), as it is the most similar to the first associative effects identified by the other models. (C) High-dimensional simulated dataset. CCA, canonical correlation analysis; data-driven, data-driven number of principal components; DESC, descriptive framework; PC, principal component; PCA, principal component analysis; PRED, predictive framework; RCCA, regularized canonical correlation analysis.
      Table 4Main Characteristics of the First Associative Effects in the High-Dimensional Datasets Obtained With the Different CCA/PLS Models Using the Predictive Framework
      ModelBrainBehaviorAcross-Modality Relationship
      Stability of Weights
      Similarity of model weights measured by Pearson’s correlation between each pair of training sets of the outer data splits.
      Explained Variance
      Percent variance explained by the model relative to all within-modality variance in the training sets of the outer data splits.
      Stability of Weights
      Similarity of model weights measured by Pearson’s correlation between each pair of training sets of the outer data splits.
      Explained Variance
      Similarity of model weights measured by Pearson’s correlation between each pair of training sets of the outer data splits.
      In-Sample Correlation
      Correlation between the latent variables in the training sets of the outer data splits.
      Out-of-Sample Correlation
      Correlation between the latent variables in the test sets of the outer data splits.
      ADNI Dataset
      PCA-CCA (Fixed PCs)0.86 ± 0.008.47 ± 0.160.85 ± 0.0114.91 ± 0.230.70 ± 0.000.55 ± 0.01
      PCA-CCA (Data-Driven PCs)0.70 ± 0.015.26 ± 0.250.93 ± 0.0015.73 ± 0.130.83 ± 0.010.65 ± 0.01
      RCCA (L2-Reg. Opt.)0.82 ± 0.005.47 ± 0.060.94 ± 0.0016.63 ± 0.260.98 ± 0.000.66 ± 0.01
      Standard PLS0.96 ± 0.0021.54 ± 0.160.94 ± 0.0018.64 ± 0.210.44 ± 0.000.43 ± 0.01
      SPLS (L1-Reg. Opt.)0.83 ± 0.0214.05 ± 0.130.96 ± 0.0115.86 ± 0.420.60 ± 0.000.61 ± 0.01
      HCP Dataset
      PCA-CCA (Fixed PCs)0.72 ± 0.010.42 ± 0.010.78 ± 0.012.67 ± 0.100.76 ± 0.000.47 ± 0.02
      PCA-CCA (Data-Driven PCs)0.56 ± 0.020.35 ± 0.030.53 ± 0.043.73 ± 0.390.76 ± 0.010.45 ± 0.03
      RCCA (L2-Reg. Opt.)0.78 ± 0.010.29 ± 0.010.88 ± 0.014.39 ± 0.181.00 ± 0.000.52 ± 0.02
      Standard PLS-20.52 ± 0.040.50 ± 0.050.62 ± 0.058.07 ± 0.300.79 ± 0.020.21 ± 0.02
      SPLS-2 (L1-Reg. Opt.)0.25 ± 0.040.48 ± 0.070.51 ± 0.057.23 ± 0.370.64 ± 0.040.25 ± 0.03
      Simulated Dataset
      PCA-CCA (Fixed PCs)0.74 ± 0.010.76 ± 0.010.90 ± 0.001.82 ± 0.010.80 ± 0.000.67 ± 0.01
      PCA-CCA (Data-Driven PCs)0.96 ± 0.000.85 ± 0.000.91 ± 0.001.95 ± 0.020.73 ± 0.010.70 ± 0.01
      RCCA (L2-Reg. Opt.)0.93 ± 0.000.77 ± 0.000.97 ± 0.001.99 ± 0.010.83 ± 0.010.71 ± 0.01
      Standard PLS0.94 ± 0.000.84 ± 0.000.97 ± 0.002.07 ± 0.010.81 ± 0.000.71 ± 0.01
      SPLS (L1-Reg. Opt.)0.78 ± 0.030.84 ± 0.001.00 ± 0.001.94 ± 0.010.79 ± 0.010.73 ± 0.01
      Values are mean ± SEM. Note that we display the second associative effect for standard PLS (PLS-2) and SPLS (SPLS-2) in the HCP dataset, as it is the most similar to the first associative effects identified by the other models.
      ADNI, Alzheimer’s Disease Neuroimaging Initiative; CCA, canonical correlation analysis; HCP, Human Connectome Project; L1-reg., L1-norm regularization; L2-reg., L2-norm regularization; opt., optimized; PC, principal component; PCA, principal component analysis; PLS, partial least squares; RCCA, regularized canonical correlation analysis; SPLS, sparse partial least squares;
      a Similarity of model weights measured by Pearson’s correlation between each pair of training sets of the outer data splits.
      b Percent variance explained by the model relative to all within-modality variance in the training sets of the outer data splits.
      c Correlation between the latent variables in the training sets of the outer data splits.
      d Correlation between the latent variables in the test sets of the outer data splits.
      In conclusion, we recommend embedding all models in a predictive framework that splits the data into training and test sets to assess the model’s out-of-sample generalizability.

      CCA With PCA Dimensionality Reduction Versus RCCA in High-Dimensional Data

      In this section, we present the results of applying PCA-CCA and RCCA to all 3 high-dimensional datasets. We focus on experiments using the predictive framework, compare PCA-CCA with fixed versus data-driven numbers of PCs, and compare both of these models with RCCA.
      Figures 4A–C and 5A–C display the brain and behavioral weights and corresponding latent variables for the 3 models (note that for the HCP dataset, the brain weights were transformed into brain connection strength increases/decreases). Figure 6 compares the brain and behavioral weights using rank similarity across the models, which indicates that although the weights are similar across the 3 models, data-driven PCA-CCA and RCCA are more similar to each other. The model weights and latent variables for the simulated dataset can be found in Figures 7A–C, which suggest that all 3 models recovered sufficiently the true weights of the generative model. Nevertheless, the nonsparse models attributed nonzero weights for many nonrelevant variables (for details, see Table S2).
      Figure thumbnail gr4
      Figure 4Brain weights (left), behavioral weights (middle), and latent variables (right) for the high-dimensional Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset. For visualization purposes, the model weights are normalized (divided by largest absolute value). The scatter plot between the brain and behavioral latent variables is overlaid by a least-squares regression line separately for the training and test data. (A) Principal component analysis–canonical correlation analysis (PCA-CCA) with fixed number of principal components (PCs). (B) PCA-CCA with data-driven number of PCs. (C) Regularized CCA (RCCA). (D) Standard partial least squares (PLS). (E) Sparse PLS (SPLS). corrtest, out-of-sample correlation in the test data; corrtraining, in-sample correlation in the training data; L, left hemisphere; R, right hemisphere.
      Figure thumbnail gr5
      Figure 5Brain connection strengths (left), behavioral weights (text in the middle), and latent variables (right) for the high-dimensional Human Connectome Project (HCP) dataset. For visualization purposes, the brain weights were transformed into brain connection strength (i.e., brain weights multiplied by the sign of the population mean connectivity) increases (red) and decreases (blue), summed across the brain nodes (i.e., independent component analysis [ICA] components in which each brain vertex is assigned to an ICA component it is most likely to belong) and normalized (divided by largest absolute value). Only the top 15 positive (red) and top 15 negative (blue) behavioral weights are shown (secondary [e.g., age adjusted] measures that are highly redundant with those shown here are not displayed). The behavioral model weights are normalized (divided by largest absolute value). The scatter plot between the brain and behavioral latent variables is overlaid by a least-squares regression line separately for the training and test data. (A) Principal component analysis–canonical correlation analysis (PCA-CCA) with fixed number of principal components (PCs). (B) PCA-CCA with data-driven number of PCs. (C) Regularized CCA (RCCA). (D) Standard partial least squares (PLS). (E) Sparse PLS (SPLS). ASR, Achenbach Adult Self Report; AUC, area under curve; corrtest, out-of-sample correlation in the test data; corrtraining, in-sample correlation in the training data; CPT, continuous performance test; L, left hemisphere; R, right hemisphere; THC, Δ9-tetrahydrocannabinol.
      Figure thumbnail gr6
      Figure 6Comparison of brain weights (left) and behavioral weights (right) across canonical correlation analysis/partial least squares (CCA/PLS) models for the high-dimensional Alzheimer’s Disease Neuroimaging Initiative (ADNI) and Human Connectome Project (HCP) datasets obtained by the predictive framework. The similarity between the model weights was measured by Spearman’s correlation. The similarity between sparse PLS (SPLS) and the other models was measured only for the subset of variables identified by SPLS (the similarity between the 2 SPLS models was measured for the subset of variables that were present in both models). (A) High-dimensional ADNI dataset. (B) High-dimensional HCP dataset. Note that the second associative effect identified by standard PLS (PLS-2) and SPLS (SPLS-2) was similar to the first associative effects identified by the other models. PC, principal component; PCA, principal component analysis; RCCA, regularized canonical correlation analysis.
      Figure thumbnail gr7
      Figure 7Model weights (left: high-dimensional modality; middle: low-dimensional modality) and latent variables (right) for the high-dimensional simulated dataset. For comparison, the true weights (red) of the generative model are overlaid on the model weights (blue). For visualization purposes, the model weights are normalized (divided by largest value), and only a subset of 100 random weights (out of the total 20,000) is displayed for the high-dimensional modality. The scatter plot between the brain and behavioral latent variables is overlaid by a least-squares regression line separately for the training and test data. (A) Principal component analysis–canonical correlation analysis (PCA-CCA) with fixed number of principal components (PCs). (B) PCA-CCA with data-driven number of PCs. (C) Regularized CCA (RCCA). (D) Standard partial least squares (PLS). (E) Sparse PLS (SPLS). corrtest, out-of-sample correlation in the test data; corrtraining, in-sample correlation in the training data.
      To further investigate the characteristics of the 3 models, Table 4 shows the stability of weights and the explained variance by the models. The stability of weights varied significantly across brain and behavior modalities (F1,804 = 84.51, p < .0001) and models (F2,804 = 91.63, p < .0001). Notably, the stability of RCCA weights was consistently high. The explained variance varied significantly only across modalities (F1,174 = 241.55, p < .0001) but not across models (F2,174 = 0.31, p = .7303).
      Next, we examined the number of PCs in the 2 PCA-CCA models. We found a significant interaction between the effect of data modality and model on the number of PCs (F1,114 = 22.63, p < .0001). Data-driven PCA-CCA yielded more brain PCs and fewer behavioral PCs than PCA-CCA with fixed number of PCs (Table S3). These results confirm that lower-ranked brain PCs might also carry information that links brain and behavior and should not necessarily be discarded. Moreover, fixing the same number of PCs for both modalities might not be a good choice.
      Based on these results, and as the optimal numbers of PCs can vary even across different brain–behavior associations in the same dataset, we recommend data-driven PCA-CCA over PCA-CCA with fixed numbers of PCs. Furthermore, we found that data-driven PCA-CCA and RCCA gave similar results, both having a similar regularizing effect on the CCA model.

      Sparse Versus Nonsparse CCA/PLS Models in High-Dimensional Data

      In this section, we show how SPLS found associations between subsets of features in all 3 high-dimensional datasets, and we compare the SPLS results with standard PLS and RCCA.
      Figures 4C–E and 5C–E display the models’ weights and latent variables (note that for the HCP dataset, the brain weights were transformed into brain connection strength increases/decreases). The first associative effect found by standard PLS and SPLS was similar to the first found by RCCA in both the ADNI and simulated datasets, but in the HCP dataset, the first associative effect identified by RCCA was more similar to the second effect found by standard PLS and SPLS (Figure 6). This is likely because the within-modality variances in the HCP dataset differ substantially from the identity matrix, and therefore the difference between the objectives of CCA and PLS models is more pronounced (see equations 1 and 2). The brain and behavioral weights were similar across the 3 models in both real datasets, especially the top-ranked variables (i.e., the variables with the highest weights). Similar to RCCA, standard PLS and SPLS recovered sufficiently the true weights of the generative model; however, the SPLS model assigned fewer nonzero weights to nonrelevant variables (Figure 7C–E). These results demonstrate that when the signal is sparse, SPLS can lead to high true positive and high true negative rates of weight recovery (Table S2). Table S4 shows the sparsity of the associative effects identified by SPLS.
      The stability of the weights differed significantly between the brain and behavioral modalities (F1,804 = 75.26, p < .0001) and the 3 models (F2,804 = 61.77, p < .0001) (Table 4). The stability of the SPLS weights was lowest in the HCP dataset, which is likely due to the model’s sparsity and that different sets of variables might provide similar performance. The instability of SPLS could be mitigated by stability selection (
      • Ing A.
      • Sämann P.G.
      • Chu C.
      • Tay N.
      • Biondo F.
      • Robert G.
      • et al.
      Identification of neurobehavioural symptom groups based on shared brain mechanisms.
      ) or a stability criterion during hyperparameter optimization (
      • Mihalik A.
      • Ferreira F.S.
      • Moutoussis M.
      • Ziegler G.
      • Adams R.A.
      • Rosa M.J.
      • et al.
      Multiple holdouts with stability: Improving the generalizability of machine learning analyses of brain–behavior relationships.
      ). The explained variance varied significantly across modalities (F1,174 = 80.00, p < .0001) and the 3 models (F2,174 = 28.60, p < .0001).
      In summary, while RCCA is likely to yield similar or higher out-of-sample correlations than standard PLS and SPLS, SPLS can perform variable selection and may improve the interpretability of the results; however, it can also present instabilities. In practice, the 3 models often provide similar weights for the top-ranked variables.

      Standard Versus Regularized Extension of CCA/PLS Models in Low-Dimensional Data

      To investigate the effects of regularization in all 3 low-dimensional datasets, we compared standard CCA, RCCA, standard PLS, and SPLS. The regularized models (RCCA, SPLS) were more stable (F3,1075 = 80.54, p < .0001) (Table S5) and showed a trend toward higher out-of-sample correlations (F1,10 = 3.35, p = .0972) (Figure S1) than their nonregularized variants (standard CCA and PLS). The stability of standard PLS and RCCA weights was consistently high, the stability of SPLS varied across datasets, and standard CCA was rather unstable (Table S5). SPLS provided sparse results, similar to the high-dimensional datasets (Table S4). As expected, RCCA and standard PLS explained increasingly more within-modality variance than standard CCA. For a detailed description of these results, see the Supplement. Taken together, these results suggest that RCCA/SPLS models should be preferred even for low-dimensional data.

      Conclusions

      This tutorial compared standard and regularized extensions of CCA and PLS models and highlighted the benefits of regularization. Here, we outline the key lessons.
      First, we showed that regularized extensions of CCA/PLS models give similar out-of-sample correlations in large datasets (with the exception of standard PLS and SPLS in the high-dimensional HCP dataset) when the sample size is similar to or much smaller than the number of variables (i.e., when the ratio between examples and variables is ∼1–10 or ∼0.1–0.01). Importantly, RCCA and SPLS outperformed standard CCA and PLS even when the ratio between examples and variables was ∼1 to 10. Second, we emphasized that it is important to use a predictive framework, as high in-sample correlations do not necessarily imply generalizability to unseen data.
      Going beyond model performance, we demonstrated both in theory and in practice that standard CCA is prone to instability (Table S3). L2-norm regularization improves stability, which comes at a cost of the models (RCCA, standard PLS, SPLS) being driven by within-modality variances. PCA-CCA with data-driven selection of PCs improves on a priori selection. Data-driven PCA-CCA has a comparable regularizing effect to RCCA. Sparsity (i.e., L1-norm regularization) can facilitate the interpretability and the generalizability of the models, but it can also introduce instability. Sparsity is most useful when the associative effect itself is sparse (e.g., in the ADNI and simulated datasets). Data-driven PCA-CCA, RCCA, and SPLS yielded similar model weights and accounted for similar variances.
      We hope that this work, together with recent efforts [e.g., (
      • Helmer M.
      • Warrington S.
      • Mohammadi-Nejad A.-R.
      • Lisa J.
      • Howell A.
      • Rosand B.
      • et al.
      On stability of canonical correlation analysis and partial least squares with application to brain-behavior associations.
      ,
      • Mihalik A.
      • Ferreira F.S.
      • Moutoussis M.
      • Ziegler G.
      • Adams R.A.
      • Rosa M.J.
      • et al.
      Multiple holdouts with stability: Improving the generalizability of machine learning analyses of brain–behavior relationships.
      ,
      • Zhuang X.
      • Yang Z.
      • Cordes D.
      A technical review of canonical correlation analysis for neuroscience applications.
      ,
      • Wang H.-T.
      • Smallwood J.
      • Mourao-Miranda J.
      • Xia C.H.
      • Satterthwaite T.D.
      • Bassett D.S.
      • Bzdok D.
      Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists.
      ,
      • Winkler A.M.
      • Renaud O.
      • Smith S.M.
      • Nichols T.E.
      Permutation inference for canonical correlation analysis.
      )] and critical exchanges [e.g., (
      • Dinga R.
      • Schmaal L.
      • Penninx B.W.J.H.
      • van Tol M.J.
      • Veltman D.J.
      • van Velzen L.
      • et al.
      Evaluating the evidence for biotypes of depression: Methodological replication and extension of.
      ,
      • Mihalik A.
      • Adams R.A.
      • Huys Q.
      Canonical correlation analysis for identifying biotypes of depression.
      ,
      • Grosenick L.
      • Shi T.C.
      • Gunning F.M.
      • Dubin M.J.
      • Downar J.
      • Liston C.
      Functional and optogenetic approaches to discovering stable subtype-specific circuit mechanisms in depression.
      ,
      • Grosenick L.
      • Liston C.
      Reply to: A closer look at depression biotypes: Correspondence relating to Grosenick et al. (2019).
      ,
      • Dinga R.
      • Schmaal L.
      • Marquand A.F.
      A closer look at depression biotypes: Correspondence relating to Grosenick et al. (2019).
      )], illuminates these complex methods and facilitates their application to the brain and its disorders.

      Acknowledgments and Disclosures

      AM was funded by the Wellcome Trust (Grant No. WT102845/Z/13/Z) and by MQ: Transforming Mental Health (Grant No. MQF17_24). JC was supported by the Engineering and Physical Sciences Research Council–funded University College London Centre for Doctoral Training in Intelligent, Integrated Imaging in Healthcare (Grant No. EP/S021930/1) and the Department of Health’s National Institute for Health and Care Research–funded Biomedical Research Centre at University College London Hospitals. RAA was supported by a Medical Research Council Skills Development Fellowship (Grant No. MR/S007806/1). NRW was supported by grants from the German Research Foundation (Grant Nos. HA7070/2-2, HA7070/3, and HA7070/4). FSF was funded by a PhD scholarship awarded by Fundação para a Ciência e a Tecnologia (Grant No. SFRH/BD/120640/2016). JM-M was funded by the Wellcome Trust (Grant No. WT102845/Z/13/Z).
      A complete listing of Alzheimer’s Disease Neuroimaging Initiative (ADNI) investigators can be found at http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.
      Some data used in preparation of this article were obtained from the ADNI database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. The code used for the different CCA/PLS analyses is implemented in a CCA/PLS toolkit that is available at http://www.mlnl.cs.ucl.ac.uk/resources/cca_pls_toolkit together with a demo demonstrating how to use the toolkit for generating the SPLS results for the low-dimensional simulated dataset. The CCA/PLS toolkit is also available on Zenodo (https://doi.org/10.5281/zenodo.7153571)(
      • Mihalik A.
      • Winter N.R.
      • Ferreira F.S.
      • Chapman J.
      • Mourao-Miranda J.
      MLNL/cca_pls_toolkit: CCA/PLS Toolkit (v1.0.0). Zenodo.
      ).
      The authors report no biomedical financial interests or potential conflicts of interest.

      Supplementary Material

      References

        • Smith S.M.
        • Nichols T.E.
        Statistical challenges in “big data” human neuroimaging.
        Neuron. 2018; 97: 263-268
        • Bzdok D.
        • Yeo B.T.T.
        Inference in the age of big data: Future perspectives on neuroscience.
        Neuroimage. 2017; 155: 549-564
        • Bzdok D.
        • Nichols T.E.
        • Smith S.M.
        Towards algorithmic analytics for large-scale datasets.
        Nat Mach Intell. 2019; 1: 296-306
        • Hotelling H.
        Relations between two sets of variates.
        Biometrika. 1936; 28: 321-377
        • Wold H.
        Partial least squares.
        in: Kotz S. Johnson N. Encyclopedia of Statistical Sciences. Wiley, New York1985: 581-591
        • Kebets V.
        • Holmes A.J.
        • Orban C.
        • Tang S.
        • Li J.
        • Sun N.
        • et al.
        Somatosensory-motor dysconnectivity spans multiple transdiagnostic dimensions of psychopathology.
        Biol Psychiatry. 2019; 86: 779-791
        • Drysdale A.T.
        • Grosenick L.
        • Downar J.
        • Dunlop K.
        • Mansouri F.
        • Meng Y.
        • et al.
        Resting-state connectivity biomarkers define neurophysiological subtypes of depression.
        Nat Med. 2017; 23: 28-38
        • Moser D.A.
        • Doucet G.E.
        • Lee W.H.
        • Rasgon A.
        • Krinsky H.
        • Leibu E.
        • et al.
        Multivariate associations among behavioral, clinical, and multimodal imaging phenotypes in patients with psychosis.
        JAMA Psychiatry. 2018; 75: 386-395
        • Li J.
        • Bolt T.
        • Bzdok D.
        • Nomi J.S.
        • Yeo B.T.T.
        • Spreng R.N.
        • Uddin L.Q.
        Topography and behavioral relevance of the global signal in the human brain.
        Sci Rep. 2019; 914286
        • Bijsterbosch J.D.
        • Woolrich M.W.
        • Glasser M.F.
        • Robinson E.C.
        • Beckmann C.F.
        • Van Essen D.C.
        • et al.
        The relationship between spatial configuration and functional connectivity of brain regions.
        Elife. 2018; 7e32992
        • Xia C.H.
        • Ma Z.
        • Ciric R.
        • Gu S.
        • Betzel R.F.
        • Kaczkurkin A.N.
        • et al.
        Linked dimensions of psychopathology and connectivity in functional brain networks.
        Nat Commun. 2018; 9: 3003
        • Modabbernia A.
        • Janiri D.
        • Doucet G.E.
        • Reichenberg A.
        • Frangou S.
        Multivariate patterns of brain-behavior-environment associations in the Adolescent Brain and Cognitive Development Study.
        Biol Psychiatry. 2021; 89: 510-520
        • Avants B.B.
        • Libon D.J.
        • Rascovsky K.
        • Boller A.
        • McMillan C.T.
        • Massimo L.
        • et al.
        Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population.
        Neuroimage. 2014; 84: 698-711
        • Ziegler G.
        • Dahnke R.
        • Winkler A.D.
        • Gaser C.
        Partial least squares correlation of multivariate cognitive abilities and local brain structure in children and adolescents.
        Neuroimage. 2013; 82: 284-294
        • Jia T.
        • Ing A.
        • Quinlan E.B.
        • Tay N.
        • Luo Q.
        • Francesca B.
        • et al.
        Neurobehavioural characterisation and stratification of reinforcement-related behaviour.
        Nat Hum Behav. 2020; 4: 544-558
        • Le Floch E.
        • Guillemot V.
        • Frouin V.
        • Pinel P.
        • Lalanne C.
        • Trinchera L.
        • et al.
        Significant correlation between a set of genetic polymorphisms and a functional brain network revealed by feature selection and sparse partial least squares.
        Neuroimage. 2012; 63: 11-24
        • Witten D.M.
        • Tibshirani R.
        • Hastie T.
        A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.
        Biostatistics. 2009; 10: 515-534
        • Marquand A.F.
        • Haak K.V.
        • Beckmann C.F.
        Functional corticostriatal connection topographies predict goal-directed behaviour in humans.
        Nat Hum Behav. 2017; 10146
        • Lin D.
        • Calhoun V.D.
        • Wang Y.P.
        Correspondence between fMRI and SNP data by group sparse canonical correlation analysis.
        Med Image Anal. 2014; 18: 891-902
        • Ing A.
        • Sämann P.G.
        • Chu C.
        • Tay N.
        • Biondo F.
        • Robert G.
        • et al.
        Identification of neurobehavioural symptom groups based on shared brain mechanisms.
        Nat Hum Behav. 2019; 3: 1306-1318
        • Wang H.T.
        • Bzdok D.
        • Margulies D.
        • Craddock C.
        • Milham M.
        • Jefferies E.
        • Smallwood J.
        Patterns of thought: Population variation in the associations between large-scale network organisation and self-reported experiences at rest.
        Neuroimage. 2018; 176: 518-527
        • Smith S.M.
        • Nichols T.E.
        • Vidaurre D.
        • Winkler A.M.
        • Behrens T.E.J.
        • Glasser M.F.
        • et al.
        A positive-negative mode of population covariation links brain connectivity, demographics and behavior.
        Nat Neurosci. 2015; 18: 1565-1567
        • Popovic D.
        • Ruef A.
        • Dwyer D.B.
        • Antonucci L.A.
        • Eder J.
        • Sanfelici R.
        • et al.
        Traces of trauma: A multivariate pattern analysis of childhood trauma, brain structure, and clinical phenotypes.
        Biol Psychiatry. 2020; 88: 829-842
        • Alnæs D.
        • Kaufmann T.
        • Marquand A.F.
        • Smith S.M.
        • Westlye L.T.
        Patterns of sociocognitive stratification and perinatal risk in the child brain.
        Proc Natl Acad Sci U S A. 2020; 117: 12419-12427
        • Mihalik A.
        • Ferreira F.S.
        • Rosa M.J.
        • Moutoussis M.
        • Ziegler G.
        • Monteiro J.M.
        • et al.
        Brain-behaviour modes of covariation in healthy and clinically depressed young people.
        Sci Rep. 2019; 911536
        • Helmer M.
        • Warrington S.
        • Mohammadi-Nejad A.-R.
        • Lisa J.
        • Howell A.
        • Rosand B.
        • et al.
        On stability of canonical correlation analysis and partial least squares with application to brain-behavior associations.
        bioRxiv. 2020; https://doi.org/10.1101/2020.08.25.265546
        • Mihalik A.
        • Ferreira F.S.
        • Moutoussis M.
        • Ziegler G.
        • Adams R.A.
        • Rosa M.J.
        • et al.
        Multiple holdouts with stability: Improving the generalizability of machine learning analyses of brain–behavior relationships.
        Biol Psychiatry. 2020; 87: 368-376
        • Dinga R.
        • Schmaal L.
        • Penninx B.W.J.H.
        • van Tol M.J.
        • Veltman D.J.
        • van Velzen L.
        • et al.
        Evaluating the evidence for biotypes of depression: Methodological replication and extension of.
        Neuroimage Clin. 2019; 22101796
        • Uurtio V.
        • Monteiro J.M.
        • Kandola J.
        • Shawe-Taylor J.
        • Fernandez-Reyes D.
        • Rousu J.
        A tutorial on canonical correlation methods.
        ACM Comput Surv. 2017; 50: 1-33
        • Zhuang X.
        • Yang Z.
        • Cordes D.
        A technical review of canonical correlation analysis for neuroscience applications.
        Hum Brain Mapp. 2020; 41: 3807-3833
        • Wang H.-T.
        • Smallwood J.
        • Mourao-Miranda J.
        • Xia C.H.
        • Satterthwaite T.D.
        • Bassett D.S.
        • Bzdok D.
        Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists.
        Neuroimage. 2020; 216116745
        • Krishnan A.
        • Williams L.J.
        • McIntosh A.R.
        • Abdi H.
        Partial least squares (PLS) methods for neuroimaging: A tutorial and review.
        Neuroimage. 2011; 56: 455-475
        • Meredith W.
        Canonical correlations with fallible data.
        Psychometrika. 1964; 29: 55-65
        • Rosipal R.
        • Krämer N.
        Overview and recent advances in partial least squares.
        in: Saunders C. Grobelnik M. Gunn S. Shawe-Taylor J. Subspace, Latent Structure and Feature Selection. Springer, Berlin2006: 34-51
        • Golub G.H.
        • Zha H.
        Perturbation analysis of the canonical correlations of matrix pairs.
        Linear Algebra Appl. 1994; 210: 3-28
        • Wegelin J.A.
        A survey on partial least squares (PLS) methods, with emphasis on the two-block case. Technical Report No. 371.
        • Vounou M.
        • Nichols T.E.
        • Montana G.
        Discovering genetic associations with high-dimensional neuroimaging phenotypes: A sparse reduced-rank regression approach.
        Neuroimage. 2010; 53: 1147-1159
        • Knapp T.R.
        Canonical correlation analysis: A general parametric significance-testing system.
        Psychol Bull. 1978; 85: 410-416
        • Izenman A.J.
        Reduced-rank regression for the multivariate linear model.
        J Multivar Anal. 1975; 5: 248-264
        • Tibshirani R.
        Regression shrinkage and selection via the Lasso.
        J R Stat Soc Ser B. 1996; 58: 267-288
        • Hoerl A.E.
        • Kennard R.W.
        Ridge regression: Applications to nonorthogonal problems.
        Technometrics. 1970; 12: 69-82
        • Zou H.
        • Hastie T.
        Regularization and variable selection via the elastic net.
        J R Stat Soc Ser B Stat Methodol. 2005; 67: 301-320
        • Vinod H.D.
        Canonical ridge and econometrics of joint production.
        J Econom. 1976; 4: 147-166
        • Hardoon D.R.
        • Szedmak S.
        • Shawe-Taylor J.
        Canonical correlation analysis: An overview with application to learning methods.
        Neural Comput. 2004; 16: 2639-2664
        • Tenenhaus A.
        • Tenenhaus M.
        Regularized generalized canonical correlation analysis.
        Psychometrika. 2011; 76: 257-284
        • Tuzhilina E.
        • Tozzi L.
        • Hastie T.
        Canonical correlation analysis in high dimensions with structured regularization.
        arXiv. 2020; https://doi.org/10.48550/arXiv.2011.01650
        • Kessy A.
        • Lewin A.
        • Strimmer K.
        Optimal whitening and decorrelation.
        Am Stat. 2018; 72: 309-314
        • Shmueli G.
        To explain or to predict?.
        Stat Sci. 2010; 25: 289-310
        • Bzdok D.
        • Engemann D.
        • Thirion B.
        Inference and prediction diverge in biomedicine. 1. Patterns, New York, NY2020100119
        • Arbabshirani M.R.
        • Plis S.
        • Sui J.
        • Calhoun V.D.
        Single subject prediction of brain disorders in neuroimaging: Promises and pitfalls.
        Neuroimage. 2017; 145: 137-165
        • Abdi H.
        Partial least squares regression and projection on latent structure regression (PLS regression). 2. Wiley Interdiscip Rev Comput Stat, 2010: 97-106
        • Winkler A.M.
        • Renaud O.
        • Smith S.M.
        • Nichols T.E.
        Permutation inference for canonical correlation analysis.
        Neuroimage. 2020; 220117065
        • Lê Cao K.-A.
        • Boitard S.
        • Besse P.
        Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems.
        BMC Bioinformatics. 2011; 12: 253
        • Labus J.S.
        • Van Horn J.D.
        • Gupta A.
        • Alaverdyan M.
        • Torgerson C.
        • Ashe-McNalley C.
        • et al.
        Multivariate morphological brain signatures predict patients with chronic abdominal pain from healthy control subjects.
        Pain. 2015; 156: 1545-1554
        • Olson Hunt M.J.
        • Weissfeld L.
        • Boudreau R.M.
        • Aizenstein H.
        • Newman A.B.
        • Simonsick E.M.
        • et al.
        A variant of sparse partial least squares for variable selection and data exploration.
        Front Neuroinform. 2014; 8: 18
        • Monteiro J.M.
        • Rao A.
        • Shawe-Taylor J.
        • Mourão-Miranda J.
        A multiple hold-out framework for Sparse partial least squares.
        J Neurosci Methods. 2016; 271: 182-194
        • Winkler A.M.
        • Webster M.A.
        • Vidaurre D.
        • Nichols T.E.
        • Smith S.M.
        Multi-level block permutation.
        Neuroimage. 2015; 123: 253-268
        • Mihalik A.
        • Adams R.A.
        • Huys Q.
        Canonical correlation analysis for identifying biotypes of depression.
        Biol Psychiatry Cogn Neurosci Neuroimaging. 2020; 5: 478-480
        • Grosenick L.
        • Shi T.C.
        • Gunning F.M.
        • Dubin M.J.
        • Downar J.
        • Liston C.
        Functional and optogenetic approaches to discovering stable subtype-specific circuit mechanisms in depression.
        Biol Psychiatry Cogn Neurosci Neuroimaging. 2019; 4: 554-566
        • Grosenick L.
        • Liston C.
        Reply to: A closer look at depression biotypes: Correspondence relating to Grosenick et al. (2019).
        Biol Psychiatry Cogn Neurosci Neuroimaging. 2020; 5: 556
        • Dinga R.
        • Schmaal L.
        • Marquand A.F.
        A closer look at depression biotypes: Correspondence relating to Grosenick et al. (2019).
        Biol Psychiatry Cogn Neurosci Neuroimaging. 2020; 5: 554-555
        • Rolls E.T.
        • Joliot M.
        • Tzourio-Mazoyer N.
        Implementation of a new parcellation of the orbitofrontal cortex in the automated anatomical labeling atlas.
        Neuroimage. 2015; 122: 1-5
        • Folstein M.F.
        • Folstein S.E.
        • McHugh P.R.
        “Mini-mental state”. A practical method for grading the cognitive state of patients for the clinician.
        J Psychiatr Res. 1975; 12: 189-198
        • Mihalik A.
        • Winter N.R.
        • Ferreira F.S.
        • Chapman J.
        • Mourao-Miranda J.
        MLNL/cca_pls_toolkit: CCA/PLS Toolkit (v1.0.0). Zenodo.