If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Virginia Tech Carilion Research Institute, Roanoke, VirginiaDepartment of Psychology, Virginia Tech, VirginiaVirginia Tech–Wake Forest University School of Biomedical Engineering and Science, Blacksburg, Virginia
In substance-dependent individuals, drug deprivation and drug use trigger divergent behavioral responses to environmental cues. These divergent responses are consonant with data showing that short- and long-term adaptations in dopamine signaling are similarly sensitive to state of drug use. The literature suggests a drug state–dependent role of learning in maintaining substance use; evidence linking dopamine to both reinforcement learning and addiction provides a framework to test this possibility.
In a randomized crossover design, 22 participants with current cocaine use disorder completed a probabilistic loss-learning task during functional magnetic resonance imaging while on and off cocaine (44 sessions). Another 54 participants without Axis I psychopathology served as a secondary reference group. Within-drug state and paired-subjects’ learning effects were assessed with computational model–derived individual learning parameters. Model-based neuroimaging analyses evaluated effects of drug use state on neural learning signals. Relationships among model-derived behavioral learning rates (α+, α−), neural prediction error signals (δ+, δ−), cocaine use, and desire to use were assessed.
During cocaine deprivation, cocaine-dependent individuals exhibited heightened positive learning rates (α+), heightened neural positive prediction error (δ+) responses, and heightened association of α+ with neural δ+ responses. The deprivation-enhanced neural learning signals were specific to successful loss avoidance, comparable to participants without psychiatric conditions, and mediated a relationship between chronicity of drug use and desire to use cocaine.
Neurocomputational learning signals are sensitive to drug use status and suggest that heightened reinforcement by successful avoidance of negative outcomes may contribute to drug seeking during deprivation. More generally, attention to drug use state is important for delineating substrates of addiction.
In substance-dependent individuals, responses to negative environmental cues appear to both vary with state of drug use and contribute to continued drug seeking. Specifically, when drug deprived, substance-dependent individuals are adept at avoiding negative states, such as withdrawal and isolation, through drug use (
). The divergence between these behavioral responses to negative consequences suggests a drug state–dependent role for loss learning (i.e., learning about negative outcomes) in maintaining substance use. In particular, these clinical and behavioral data suggest the hypothesis that heightened reinforcement from avoiding negative states may facilitate drug seeking during deprivation relative to during substance use. A largely parallel but related literature linking neural dopamine (DA) systems in reinforcement learning (
)]. In addition, related evidence indicates that during healthy reinforcement learning, DA release encodes prediction errors (signaling better or worse than expected) that have detectable correlates in the human striatum (
). Together, this literature suggests that in the case of substance dependence, DA-related learning signals are likely to be enhanced with drug deprivation because DA receptors are relatively freed, allowing the detection of prediction errors. A few previous studies that examined neural substrates of contingency learning in cocaine dependence primarily focused on reward learning and found decreased prediction error signaling in cocaine-dependent individuals compared with control individuals [(
), who reported greater baseline neural win responses in successful future abstainers].
To evaluate cocaine state modulation of learning signals and assess the potential drug state–dependent role of loss-learning mechanisms in maintaining substance dependence, we tested cocaine-dependent individuals in a loss-learning task both during cocaine deprivation and when using cocaine as usual. Using a computational psychiatry approach, we assessed behavioral and neural learning substrates, in the form of model-derived learning rate parameters and striatal encoding of prediction errors, respectively, and tested the relationship of neural learning signals to measures of drug use and dependence.
Methods and Materials
Participants and Experimental Design
A total of 22 right-handed, non-treatment-seeking male individuals who met criteria only for current cocaine use disorder without other substance dependencies or comorbid Axis I psychopathology were enrolled from a larger study on biomarkers of substance use (see Tables 1 and 2 for demographic information and self-reported craving, and see Supplemental Methods for inclusion/exclusion criteria). Following an initial lab visit to assess cocaine use and entrance criteria, eligible individuals participated in two subsequent scanning sessions in a within-subject design. In one session participants were instructed to use cocaine as usual (C+), and in a second session participants were instructed to abstain from cocaine use for at least 72 hours (C−). Cocaine use state was verified at each lab visit with urine testing for cocaine metabolites (National Institute on Drug Abuse 5-panel drug test; Alere Toxicology, Waltham, MA); C+ and C− sessions were counterbalanced for order. All participants provided informed consent, and all procedures were approved by the institutional review boards of Baylor College of Medicine and Virginia Tech.
Table 1Participant Characteristics
Cocaine-Dependent Individuals (N = 22)
Cocaine Use, Years
Values are mean (SD).
WTAR, Wechsler Test of Adult Reading, a standardized score representing verbal IQ.
Participants completed a probabilistic loss-learning task (Figure 1A) during functional magnetic resonance imaging (fMRI) scanning in two separate lab sessions (N = 22, with each participant scanned in both states of cocaine use; see Supplemental Methods for scanning parameters and preprocessing procedures). The task entailed learning from repeated choices between two losing options [two-arm bandit in the loss domain; see details in Supplemental Methods; adapted from (
)], with one having a higher probability of producing the better outcome (i.e., smaller loss). On each trial, subjects chose between two abstract stimuli and subsequently observed the outcome (Figure 1A). Participants were instructed that one option was better than the other and that payment was related to their choices, but they were not explicitly informed of the outcome probabilities or loss framework. Trials were presented for a maximum of 36 trials per block or when sufficient learning had occurred (see Supplemental Methods for learning criteria). Each block consisted of novel stimuli that required participants to learn the contingencies between stimuli and outcomes within each block.
An additional group of 54 male individuals (see Supplemental Table S1A for demographics) with no history of Axis I psychopathology was used as an independent nonpsychiatric control sample to identity learning-related striatum activation that could be used as a reference against which to interpret any neural effects observed in the individuals with cocaine use disorder and to compute nonpsychiatric individual parameter estimates for model validation and parameter recovery.
Model-Agnostic Behavioral Analyses
To verify that participants learned during the task, the behavioral choices of each individual in each drug use state were examined over time and quantified as the percentage of trials that the objectively better choice was selected. Within-sample and paired t tests on performance were implemented in MATLAB (The MathWorks, Inc., Natick, MA).
Computational Model–Based Behavioral Analyses
To assess model-derived behavioral learning effects for each participant in each drug use state, participants’ behavioral choices were fit to a basic reinforcement Q-learning model that included two learning rates (α) that provided separate update rules for positive (δ+) and negative (δ−) prediction errors (better or worse than expectations, mapping onto successful and unsuccessful loss avoidance [δ+ and δ−, adapted from previous studies (
)] and assess the possibility that cocaine’s effects on DA systems differentially affect these components. The model was a good fit of participants’ choices during the loss-learning task (Figure 1C), it was a better fit than a single learning rate model, and model and parameter recovery using simulated data further verified the fit of the model to the observed behavior (see Behavioral Results in Results section). A third estimated parameter, inverse temperature (τ), provided a measure of exploration and indicated the sensitivity of choice probabilities to differences in values. See Model Fitting and Selection in Supplemental Methods for additional descriptions of model selection, model validation, and parameter recovery procedures.
For the two-learning-rate model, the initial expected values Q(0) for the possible choices a and b were set to 0 because participants were not instructed a priori about the range of possible outcomes. For trial number t, the outcome for the chosen option a was represented by Ra(t), with the expected value represented by Qa(t). The prediction error δ(t), which measures the difference in outcome Ra(t) and expectation Qa(t) for a trial, was defined as follows:
The parameter estimation procedures included separate update rules for positive and negative prediction errors δ(t) in the form of positive (α+) and negative (α−) learning rates, respectively (Figure 1B). The learning rate parameters quantified how much weight the prediction error δ(t) from current trials was given in updating the following trials’ expected value Qa(t + 1):
A standard softmax action selection function was used to calculate the probability of selecting choice a at time t and was implemented as follows:
Positive and negative learning rates (α+ and α−) and inverse temperature (τ) were free parameters, iteratively estimated in MATLAB using the function fminsearch, that were evaluated to have the maximum log likelihood (
). Learning rates were bounded between 0 and 1, and inverse temperature was bounded between 0 and ∞. For the unchosen option b, the expected value of the subsequent trial Qb(t + 1) was set to the current trial’s expected value Qb(t) multiplied by an additional decay parameter (φ, bounded between 0 and ∞), similar to previous studies (
Individual variances in learning rates (α− and α+) as an effect of drug state (C− or C+) were estimated for second-level fMRI analysis (see Imaging Analysis section below). First, the prior mean and distribution of learning rates for participants in each drug state (C− or C+) were estimated using bootstrapped maximum likelihood created via sampling, approximating integration, around a bootstrapped maximum likelihood estimation across subjects (
). Individual learning rates were subsequently estimated by conditioning individuals’ behavioral data on the respective drug state (C− or C+) group’s prior distribution to account for drug use status differences. For each participant, individual C+ learning rates were then subtracted from the individual C− learning rates to compute a cocaine deprivation-enhanced learning rate for each participant. Group-specific bootstrapped estimates were used for the inverse temperature and decay parameters during individual estimation of learning rates.
To examine neural substrates of loss learning associated with cocaine use state in dependent individuals, model-derived learning variables fit across all participants (as described above) were first correlated with fMRI data collected during the loss-learning task. Next, the model-based neural prediction error signals were related to participants’ self-reported cocaine use measures.
) was used to perform neuroimaging analyses at the individual and group levels. For the first-level analyses, onset times for stimuli, outcome events for δ+ outcomes, and outcome events for δ− outcomes for each trial were modeled as separate punctate events. The outcomes were categorized based on the sign of the prediction error (δ > 0 or δ < 0, indicating δ+ or δ−, respectively), using the fitted estimates of the two-learning-rate model, in which trial-by-trial δs were generated (see procedures in Supplemental Methods). To examine the first-level effects of drug use status on neural representation of learning and valuation, cocaine-positive (C+, urine positive for cocaine metabolites) and cocaine-negative (C−, urine negative for cocaine metabolites) drug use states for each individual were modeled as separate first-level general linear models. Trial-by-trial expected values (Q) were modeled as parametric regressors onto the response events. Trial-by-trial δ+ and δ− and the actual outcomes were modeled as parametric regressors onto separate δ+ and δ− outcome events, respectively. Effects due to run number, time in scanner, and head movement parameters were modeled as nuisance covariates for each time point.
Within and Paired Drug State Analyses
The within-drug state (C− or C+) and paired-subjects (C− > C+) effects of cocaine use were compared using one-sample and paired-subjects’ second-level contrasts in SPM8. The effects of interest were neural responses to δ+ and δ−. In line with previous data demonstrating the role of the striatum and DA in learning (
)]. Results were thresholded with a voxel-level uncorrected p < .001 unless otherwise noted, and significant clusters were defined using familywise error correction.
Correlation Analyses Between Cocaine State–Modulated Learning Rate and Neural Prediction Error Signals
To relate drug state effects on behavioral learning rates (α+ and α−) with the corresponding neural δ signals, separate first-level and second-level general linear models were created to correlate within-subject drug-modulated α+ and α− differences (C− > C+ for α+ and C− > C+ for α−) with the corresponding neural differences for positive and negative δ (C− > C+ for δ+ and C− > C+ for δ−). Results were again thresholded with a voxel-level uncorrected p < .001, and significant clusters were defined using familywise error correction. In addition, leave-one-out cross-validation analyses were performed in regions of interest (see Supplemental Methods) to reduce bias due to nonindependence (
Relationships Between Neural Prediction Error Responses and Behavioral Cocaine Use Measures
To test relationships between the observed neural learning signals and cocaine use measures, questionnaire data characterizing individual drug use history and current cocaine craving were tested against subjects’ C− neural prediction error responses (given the primary results of interest involving enhanced δ+ from drug deprivation). Again, using leave-one-out cross-validation analysis, neural signals from trials with δ+ were correlated with years of drug use and subscales of the Cocaine Craving Questionnaire (
). The analyses identified relationships among years of drug use, δ+ neural signal, and the desire to use cocaine. Based on the results of the correlation analysis, a mediation analysis was performed testing whether neural learning signals mediated the relationship between duration of drug use and individuals’ desire to use cocaine or expected positive outcome from cocaine use (C− measures). A bootstrap approach to mediation (
) was implemented in R to calculate a 95% confidence interval with 10,000 bootstrapped resamples.
Model-Agnostic Behavioral Results
In both drug use states, participants demonstrated learning and performed significantly above chance (percentage of trials on which the “better” option was chosen; C−: 62.76%, SE = 3.22, t17 = 3.91, d = 0.96, p < .01; C+: 73.61%, SE = 2.69, t17 = 8.70, d = 2.12, p < .01; chance: 50%) (Figure 1D). In addition, participants in the C− state showed diminished accuracy relative to C+ participants (t17 = 2.62, d = 0.30, p = .01).
Computational Model–Derived Behavioral Results
Computational model–based analyses, using bootstrapped group parameters [per (
); 200 estimation iterations within each drug use state with subjects drawn with replacement for each iteration] for positive and negative learning rate (α+ and α−, respectively) and inverse temperature (τ), suggested increased positive learning rates (α+) and decreased τ in C− participants relative to C+ participants; α− did not differ between participants in the C− and C+ states (Figure 1E). To clarify whether cocaine state modulation of α+ or τ was associated with the diminished behavioral accuracy in C− participants, we simulated behavioral choices holding α+ constant (iterating through the ranges of the observed parameter values) while allowing τ to vary and similarly holding τ constant and allowing α+ to vary. As shown in Figure 1F, these simulations revealed increased α+ to be associated with decreased performance and no relationship between τ and performance (see simulation details in Supplemental Methods). Together, these data provide initial evidence of drug state modulation of learning, where cocaine deprivation–related increases in positive learning rates are associated with diminished behavioral performance.
Effects of Cocaine Use State on Neural Prediction Error Signals
Significant neural correlates of positive prediction error were observed in the striatum for C− participants (δ+ for C−) (Figure 2A and Supplemental Table S2A) but not for C+ participants. In addition, no significant neural correlates of negative prediction error (δ−) were found during either drug use state. Positive prediction error (δ+) responses were verified in the nonpsychiatric participants (Supplemental Figure S1 and Supplemental Table S2B), and post hoc analyses using an independent striatum region of interest indicated that δ+ in C− participants was comparable to this control cohort, whereas C+ participants showed significantly diminished δ+ responses (C− vs. nonpsychiatric control participants, t70 = 0.15, d = 0.004, p = .87; C+ vs. nonpsychiatric control participants, t70 = 3.22, d = 0.09, p = .001) (Supplemental Figure S1; see analytic details in Supplemental Methods).
The specificity of the neural encoding of positive prediction errors in the C− participants (Figure 2A) was striking in its parallel with the increased positive learning rate in these participants. Thus, to test for a neural instantiation of the deprivation-increased positive learning rate, we first computed individual behavioral learning rate estimates for each participant in the C− and C+ states [see Methods and Materials and (
)] and generated for each participant deprivation-enhanced positive and negative learning rate metrics (i.e., C− > C+ for α− and α+, respectively, for each individual). For positive learning rate, the degree of participants’ deprivation enhancement was significantly associated with the degree of deprivation enhancement of positive prediction error responses in the striatum (C− > C+ for α+ and neural C− > C+ for δ+) (Figure 2B and Supplemental Table S2C) (r = .79, p < .01; using leave-one-out cross-validation to avoid potential bias due to nonindependence) (
). No relationship between drug state modulation of negative learning rates and their associated neural prediction error signals was observed (C− > C+ for α− and neural C− > C+ for δ−; r = −.08, p = .72) (Figure 2C). For C− > C+, contrasts further show deprivation enhancement (see Supplemental Figure S2A and Supplemental Table S2D). Supplemental Figure S3 shows similar imaging results when using group estimates from within-status behavioral estimates. In addition, no effects of cocaine deprivation on neural expected value signals were detected (Supplemental Figure S2B), indicating generally intact outcome valuation unaffected by drug use status.
Results Relating Neural Prediction Error Signals and Behavioral Cocaine Use Measures
As described above, the specificity of drug state modulation and deprivation enhancement to positive (i.e., successful loss avoidance) prediction errors (δ+) was consistent with the hypothesis that reinforcement from successfully avoiding negative states contributes to continued drug seeking in addiction. In this case, successful loss avoidance in cocaine-deprived participants should be further related to aspects of real-world cocaine use. To test this possibility, we regressed C− individuals’ neural δ+ responses (β values from outcomes with δ+) against self-reported drug craving [subscales of Cocaine Craving Questionnaire (
)] (Supplemental Figure S4A and Supplemental Table S3) and observed that neural δ+ responses were related specifically to the desire to use cocaine (Figure 3A and Supplemental Table S3) [r = .70, p < .01; correlations again performed using neural signals obtained from leave-one-out cross-validation analyses and Bonferroni corrected for multiple comparisons as described in Methods and Materials (
)]. These relationships were also present using the deprivation-enhanced neural δ+ signal (i.e., C− > C+, extracted from outcomes with δ+) (r = .67, p < .01) and not observed in the C+ state (i.e., signal from outcomes with δ+ while C+) (Supplemental Figure S4B) (r = −.20, p = .42). Greater neural δ+ responses during cocaine deprivation were also associated with greater years of cocaine use (C−) (Figure 3A) (r = .64, p < .01); no relationship between neural δ+ and chronicity of use was observed for participants in the C+ state (Supplemental Figure S4C) (r = −.09, p = .71). No other subscales of the Cocaine Craving Questionnaire were correlated with striatal δ+ signals (Supplemental Table S3). Lastly, desire to use cocaine (Figure 3A) (r = .61, p < .01) was also positively correlated with participants’ years of cocaine use. Following these observed relationships, a mediation analysis [(
); see Methods and Materials] revealed that the deprivation-enhanced neural δ+ signal fully mediated the relationship between years of cocaine use and desire to use cocaine while deprived (Figure 3B) (path c: β = .09, p < .01; path a: β = .06, p = .04; path b: β = .66, p = .01; path c′: β = .05, p = .07; mediation effect a × b: 95% confidence interval = 0.0008–0.0953).
), we show drug state modulation of learning signals in cocaine-dependent participants, such that successful loss avoidance signals are greater during deprivation and the neural responses are associated with both longer history of drug use and greater desire for cocaine. The specificity of the deprivation enhancement to positive neural prediction error signals during loss avoidance appears to parallel clinical descriptions of addiction as a cycle maintained by negative reinforcement where drug-deprived dependent individuals seek drugs and thus successfully avoid negative states (e.g., withdrawal, isolation); such successful loss avoidance has been posited to reinforce continued drug seeking [for relevant discussions, see (
). In particular, the enhanced neural positive prediction error (δ+) encoding in C− participants relative to C+ participants is consistent with studies showing that humans with long-term cocaine dependency have decreased density of striatal DA receptors and lower tonic DA levels (
). Following from these studies, δ+ signals ought to be more evident during drug deprivation (as observed here) than during drug use because DA receptors, although diminished in density, are free in the deprived state to detect δ+ fluctuations. We note that in the current cocaine-dependent participants, neural δ+ responses in the drug-deprived state are comparable to the δ+ observed in nonpsychiatric control participants, whereas δ+ signaling in the drug-using state was diminished relative to the control participants. Together, these data suggest that although learning signal impairments appear to be restored by cocaine deprivation in dependent participants, such intact learning can have increasingly detrimental consequences in the context of unhealthy reinforcers, negative environmental states, and adverse outcomes (e.g., when dependent individuals are faced with withdrawal avoidance, drug-available environments, and drug use).
The current data are also relevant for closely related reports of significant increase in prediction error correlates following DA agonist administration (
). A key difference between these previous reports and the current findings is the incorporation of the consequences of long-term drug dependence (i.e., diminished DA functioning) into an understanding of learning in addiction (
). In addition, the current diminished δ+ signaling in C+ individuals and enhancement in C− individuals is consistent with related work showing DA drug state modulation of learning signals in participants with Parkinson’s disease (who are known to have impaired DA function); these participants similarly show reduced prediction error-related blood oxygen level–dependent responses when on DA-enhancing medication (levodopa) and greater prediction error responses while off medication, specifically to positive prediction errors [δ+; (
Finally, we show that greater neural loss-learning δ+ (signaling successful loss avoidance) during deprivation mediates a relationship between chronicity of drug use and desire for cocaine. This relationship supports the hypothesis that drug state–dependent learning signals play a role in maintaining drug use. The current data thus emphasize that both drug use chronicity and the context in which learning is assessed (e.g., loss, gain) may be critical for identifying neurobehavioral mechanisms that maintain drug use [for related data indicating differences in neural substrates of loss and gain learning, see (
The limitations of the current work provide avenues for further study. First, a relatively small number of male participants were included in this study (N = 22). While the within-subjects design and advantages of sample homogeneity partially mitigate the sample size, replication in a larger, more diverse sample would address questions regarding generalizability. In addition, the current study identified drug state modulation of responses to negative outcomes but did not evaluate the degree to which the physical consequences per se (i.e., small or large monetary loss), emotions associated with the consequences, or other aspects of the outcomes contribute to the reinforcement provided by successful loss avoidance. Clarifying the role of components of negative outcomes in maintaining substance use in dependence ought to be a focus of future studies. Finally, we focused our neural analyses primarily on regions of the striatum, given previous work linking learning mechanisms and cocaine pharmacodynamics to these regions (
). Supplemental analyses found no effects of cocaine state on encoding of expected value (see Supplemental Figure S2B), indicating generally intact outcome valuation unaffected by drug use status; nonetheless, other neural regions implicated in learning may be of interest in future investigations.
In summary, in cocaine-dependent participants, we show that drug deprivation enhances neural signaling of successful loss avoidance, which in turn predicts increased desire to use cocaine. The deprivation-enhanced neural prediction error is in line with prior reports of DA adaptations associated with chronic substance use (
) and also points to a potential mechanism by which drug seeking is maintained. That is, when dependent individuals are at their most vulnerable (i.e., during drug deprivation), reward signals associated with successful avoidance of negative states are at their greatest and may contribute to a pernicious cycle of drug seeking in the face of quit attempts. Of note, DA dysregulation has been associated with poor response to behavioral treatments in addiction (
). More generally, the current results support targeting learning-based therapies to identify goal-driven behaviors that provide relief from the negative outcomes of drug deprivation and indicate that attention to drug state may be critical for understanding neural mechanisms of addiction and refining learning-based therapies.
Acknowledgments and Disclosures
This work was supported in part by the National Institutes of Health (Grant Nos. R01MH091872 and R21DA042274 [to PHC], Grant No. R01DA036017 to [BK-C], and Grant Nos. RC1DA028387 and R01DA023624 [to RDLG]).
PHC, BK-C, RDLG, and TN designed the experiments. JMW analyzed the data with input from LZ, VMB, PHC, and BK-C. PHC, BK-C, RDLG, and TN supervised this work. JMW and PHC drafted the manuscript with input from all authors. All authors edited and approved the final version.
We acknowledge the technical assistance of George Christopoulos, Dongil Chung, Jacob Lee, James Mahoney, Dharol Tankersley, Katherine McCurry, Nina Lauharatanahirun, and members of the Chiu, De La Garza, King-Casas, and Newton Labs.
The authors report no biomedical financial interests or potential conflicts of interest.
An action (drug use) repeatedly followed by addition of a positive outcome (euphoria or high) or removal of a negative outcome (withdrawal symptoms or negative affect) will increase the likelihood of this action in the future. The former case is an example of positive reinforcement and the latter case is an example of negative reinforcement. Motivation to take this action (drug use) may depend on the evaluation of the user’s current internal bodily state and how much it differs from the expected bodily state, otherwise known as prediction error (for example, “I expected to feel good but I don’t” or “I’m starting to feel bad when I shouldn’t” versus “I feel better than I thought I would”) (1,2).