Advanced Search
To read this article in full, please review your options for gaining access at the bottom of the page.
Article in Press

To view the full text, please login as a subscribed user or purchase a subscription. Click here to view the full text on ScienceDirect.

Figures

Figure 1

Strategies for detecting double dipping. (A) Results of random data test generated using a dataset of entirely random numbers representing a varying number of “predictor variables” (first column), and a random binary “outcome,” evenly distributed in 136 “subjects.” Because the data are random noise, model performance should be ≤50% and should not improve dramatically with an increasing number of random predictors, as in the fair model with all variables (second column). However, with a 2-step random forest procedure that includes double dipping to select a subset of variables (third column), the model based on fully random data shows high accuracy, especially with a large number of predictors (final column). (B) Results of a permutation test on a random forest analysis procedure that included double dipping. The red line indicates expected average accuracy of permuted outcome data if no double dipping were present (outcome base rate). The blue line indicates average accuracy of permuted data using double-dipped analysis procedure. The green line indicates observed accuracy in double-dipped analysis with real data. The black line indicates range of accuracy with 2-tailed p < .05.

Machine learning refers to an increasingly popular set of tools can that can be used to make predictions using complex data, such as using neuroimaging data to predict psychiatric outcomes. However, the complexity of machine learning techniques can sometimes obscure methodological problems that are clear in other contexts, such as double dipping (1). In this commentary we define double dipping, explain why it is a problem, and give five recommendations for detecting and avoiding double dipping.

To access this article, please choose from the options below

Purchase access to this article

Claim Access

If you are a current subscriber with Society Membership or an Account Number, claim your access now.

Subscribe to this title

Purchase a subscription to gain access to this and all other articles in this journal.

Institutional Access

Visit ScienceDirect to see if you have access via your institution.

 

Related Articles

Searching for related articles..

Advertisement