In Cocaine Dependence, Neural Prediction Errors During Loss Avoidance Are Increased With Cocaine Deprivation and Predict Drug Use

Open AccessPublished:August 02, 2018DOI:https://doi.org/10.1016/j.bpsc.2018.07.009

      Abstract

      Background

      In substance-dependent individuals, drug deprivation and drug use trigger divergent behavioral responses to environmental cues. These divergent responses are consonant with data showing that short- and long-term adaptations in dopamine signaling are similarly sensitive to state of drug use. The literature suggests a drug state–dependent role of learning in maintaining substance use; evidence linking dopamine to both reinforcement learning and addiction provides a framework to test this possibility.

      Methods

      In a randomized crossover design, 22 participants with current cocaine use disorder completed a probabilistic loss-learning task during functional magnetic resonance imaging while on and off cocaine (44 sessions). Another 54 participants without Axis I psychopathology served as a secondary reference group. Within-drug state and paired-subjects’ learning effects were assessed with computational model–derived individual learning parameters. Model-based neuroimaging analyses evaluated effects of drug use state on neural learning signals. Relationships among model-derived behavioral learning rates (α+, α−), neural prediction error signals (δ+, δ−), cocaine use, and desire to use were assessed.

      Results

      During cocaine deprivation, cocaine-dependent individuals exhibited heightened positive learning rates (α+), heightened neural positive prediction error (δ+) responses, and heightened association of α+ with neural δ+ responses. The deprivation-enhanced neural learning signals were specific to successful loss avoidance, comparable to participants without psychiatric conditions, and mediated a relationship between chronicity of drug use and desire to use cocaine.

      Conclusions

      Neurocomputational learning signals are sensitive to drug use status and suggest that heightened reinforcement by successful avoidance of negative outcomes may contribute to drug seeking during deprivation. More generally, attention to drug use state is important for delineating substrates of addiction.

      Keywords

      In substance-dependent individuals, responses to negative environmental cues appear to both vary with state of drug use and contribute to continued drug seeking. Specifically, when drug deprived, substance-dependent individuals are adept at avoiding negative states, such as withdrawal and isolation, through drug use (
      • West R.
      • Hardy A.
      Theory of Addiction.
      ,
      • Potenza M.N.
      • Sofuoglu M.
      • Carroll K.M.
      • Rounsaville B.J.
      Neuroscience of behavioral and pharmacological treatments for addictions.
      ). At the same time, when drug using, dependent individuals ignore negative outcomes, including serious social, health, and economic costs (
      • West R.
      • Hardy A.
      Theory of Addiction.
      ,
      • Lucantonio F.
      • Stalnaker T.A.
      • Shaham Y.
      • Niv Y.
      • Schoenbaum G.
      The impact of orbitofrontal dysfunction on cocaine addiction.
      ,
      • Hyman S.E.
      • Malenka R.C.
      • Nestler E.J.
      Neural mechanisms of addiction: The role of reward-related learning and memory.
      ). The divergence between these behavioral responses to negative consequences suggests a drug state–dependent role for loss learning (i.e., learning about negative outcomes) in maintaining substance use. In particular, these clinical and behavioral data suggest the hypothesis that heightened reinforcement from avoiding negative states may facilitate drug seeking during deprivation relative to during substance use. A largely parallel but related literature linking neural dopamine (DA) systems in reinforcement learning (
      • Montague P.R.
      • Hyman S.E.
      • Cohen J.D.
      Computational roles for dopamine in behavioural control.
      ,
      • Schultz W.
      • Dayan P.
      • Montague P.R.
      A neural substrate of prediction and reward.
      ,
      • Pessiglione M.
      • Seymour B.
      • Flandin G.
      • Dolan R.J.
      • Frith C.D.
      Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans.
      ) and cocaine addiction (
      • Acevedo-Rodriguez A.
      • Zhang L.
      • Zhou F.
      • Gong S.
      • Gu H.
      • De Biasi M.
      • et al.
      Cocaine inhibition of nicotinic acetylcholine receptors influences dopamine release.
      ,
      • Volkow N.D.
      • Fowler J.S.
      • Wang G.
      • Swanson J.M.
      Dopamine in drug abuse and addiction: Results from imaging studies and treatment implications.
      ,
      • Martinez D.
      • Carpenter K.M.
      • Liu F.
      • Slifstein M.
      • Broft A.
      • Friedman A.C.
      • et al.
      Imaging dopamine transmission in cocaine dependence: Link between neurochemistry and response to treatment.
      ) provides a framework within which to examine this possibility.
      Extant data show that while acute cocaine use increases striatal DA (
      • Volkow N.D.
      • Fowler J.S.
      • Wang G.
      • Swanson J.M.
      Dopamine in drug abuse and addiction: Results from imaging studies and treatment implications.
      ), chronic cocaine use decreases postsynaptic DA receptor availability (
      • Volkow N.D.
      • Fowler J.S.
      • Wang G.
      • Swanson J.M.
      Dopamine in drug abuse and addiction: Results from imaging studies and treatment implications.
      ). These short- and long-term neurophysiological adaptations together contribute to changes in DA signaling and detection that are sensitive to state of drug use (
      • Acevedo-Rodriguez A.
      • Zhang L.
      • Zhou F.
      • Gong S.
      • Gu H.
      • De Biasi M.
      • et al.
      Cocaine inhibition of nicotinic acetylcholine receptors influences dopamine release.
      ,
      • Volkow N.D.
      • Fowler J.S.
      • Wang G.
      • Swanson J.M.
      Dopamine in drug abuse and addiction: Results from imaging studies and treatment implications.
      ,
      • Martinez D.
      • Carpenter K.M.
      • Liu F.
      • Slifstein M.
      • Broft A.
      • Friedman A.C.
      • et al.
      Imaging dopamine transmission in cocaine dependence: Link between neurochemistry and response to treatment.
      ,
      • Park K.
      • Volkow N.D.
      • Pan Y.
      • Du C.
      Chronic cocaine dampens dopamine signaling during cocaine intoxication and unbalances D1 over D2 receptor signaling.
      ) [for reviews, see (
      • Keramati M.
      • Durand A.
      • Girardeau P.
      • Gutkin B.
      • Ahmed S.H.
      Cocaine addiction as a homeostatic reinforcement learning disorder.
      ,
      • Willuhn I.
      • Wanat M.J.
      • Clark J.J.
      • Phillips P.E.M.
      Dopamine signaling in the nucleus accumbens of animals self-administering drugs of abuse.
      )]. In addition, related evidence indicates that during healthy reinforcement learning, DA release encodes prediction errors (signaling better or worse than expected) that have detectable correlates in the human striatum (
      • Montague P.R.
      • King-Casas B.
      • Cohen J.D.
      Imaging valuation models in human choice.
      ,
      • Jocham G.
      • Klein T.A.
      • Ullsperger M.
      Differential modulation of reinforcement learning by D2 dopamine and NMDA glutamate receptor antagonism.
      ). Together, this literature suggests that in the case of substance dependence, DA-related learning signals are likely to be enhanced with drug deprivation because DA receptors are relatively freed, allowing the detection of prediction errors. A few previous studies that examined neural substrates of contingency learning in cocaine dependence primarily focused on reward learning and found decreased prediction error signaling in cocaine-dependent individuals compared with control individuals [(
      • Parvaz M.A.
      • Konova A.B.
      • Proudfit G.H.
      • Dunning J.P.
      • Malaker P.
      • Moeller S.J.
      • et al.
      Impaired neural response to negative prediction errors in cocaine addiction.
      ,
      • Rose E.J.
      • Salmeron B.J.
      • Ross T.J.
      • Waltz J.
      • Schweitzer J.B.
      • McClure S.M.
      • Stein E.A.
      Temporal difference error prediction signal dysregulation in cocaine dependence.
      ); see also Stewart et al. (
      • Stewart J.L.
      • Connolly C.G.
      • May A.C.
      • Tapert S.F.
      • Wittmann M.
      • Paulus M.P.
      Cocaine dependent individuals with attenuated striatal activation during reinforcement learning are more susceptible to relapse.
      ), who reported greater baseline neural win responses in successful future abstainers].
      To evaluate cocaine state modulation of learning signals and assess the potential drug state–dependent role of loss-learning mechanisms in maintaining substance dependence, we tested cocaine-dependent individuals in a loss-learning task both during cocaine deprivation and when using cocaine as usual. Using a computational psychiatry approach, we assessed behavioral and neural learning substrates, in the form of model-derived learning rate parameters and striatal encoding of prediction errors, respectively, and tested the relationship of neural learning signals to measures of drug use and dependence.

      Methods and Materials

       Participants and Experimental Design

      A total of 22 right-handed, non-treatment-seeking male individuals who met criteria only for current cocaine use disorder without other substance dependencies or comorbid Axis I psychopathology were enrolled from a larger study on biomarkers of substance use (see Tables 1 and 2 for demographic information and self-reported craving, and see Supplemental Methods for inclusion/exclusion criteria). Following an initial lab visit to assess cocaine use and entrance criteria, eligible individuals participated in two subsequent scanning sessions in a within-subject design. In one session participants were instructed to use cocaine as usual (C+), and in a second session participants were instructed to abstain from cocaine use for at least 72 hours (C−). Cocaine use state was verified at each lab visit with urine testing for cocaine metabolites (National Institute on Drug Abuse 5-panel drug test; Alere Toxicology, Waltham, MA); C+ and C− sessions were counterbalanced for order. All participants provided informed consent, and all procedures were approved by the institutional review boards of Baylor College of Medicine and Virginia Tech.
      Table 1Participant Characteristics
      VariableCocaine-Dependent Individuals (N = 22)
      Age45.7 (7.0)
      Education, Years12.9 (1.0)
      WTAR98.4 (10.8)
      Cocaine Use, Years17.6 (7.9)
      Values are mean (SD).
      WTAR, Wechsler Test of Adult Reading, a standardized score representing verbal IQ.
      Table 2Participant Self-reported Cocaine Use and Craving
      Drug Use InformationDeprived (C−)Using as Usual (C+)
      Estimated Cocaine Intake Last 48 Hours, g0 (0)1.5 (0.5)
      CCQ Grand Total137.9 (8.7)145.3 (8.7)
      CCQ Anticipated Positive Outcome2.4 (0.3)2.7 (0.3)
      CCQ Desire to Use3.2 (0.2)3.4 (0.2)
      CCQ Intention to Use3.0 (0.3)3.0 (0.3)
      CCQ Anticipated Withdrawal Relief3.9 (0.4)3.5 (0.4)
      CCQ No Control3.8 (0.4)3.5 (0.4)
      Values are mean (SD). Grand total is composed of total raw score from individual items; subscale scores are averaged across items within subscale.
      CCQ, Cocaine Craving Questionnaire.
      Participants completed a probabilistic loss-learning task (Figure 1A) during functional magnetic resonance imaging (fMRI) scanning in two separate lab sessions (N = 22, with each participant scanned in both states of cocaine use; see Supplemental Methods for scanning parameters and preprocessing procedures). The task entailed learning from repeated choices between two losing options [two-arm bandit in the loss domain; see details in Supplemental Methods; adapted from (
      • Pessiglione M.
      • Seymour B.
      • Flandin G.
      • Dolan R.J.
      • Frith C.D.
      Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans.
      ,
      • Brown V.M.
      • Zhu L.
      • Wang J.M.
      • Frueh B.C.
      • King-Casas B.
      • Chiu P.H.
      Associability-modulated loss learning is increased in posttraumatic stress disorder.
      )], with one having a higher probability of producing the better outcome (i.e., smaller loss). On each trial, subjects chose between two abstract stimuli and subsequently observed the outcome (Figure 1A). Participants were instructed that one option was better than the other and that payment was related to their choices, but they were not explicitly informed of the outcome probabilities or loss framework. Trials were presented for a maximum of 36 trials per block or when sufficient learning had occurred (see Supplemental Methods for learning criteria). Each block consisted of novel stimuli that required participants to learn the contingencies between stimuli and outcomes within each block.
      Figure thumbnail gr1
      Figure 1Experimental design and cocaine modulation of learning. (A) Participants performed a probabilistic loss-learning task where they made a series of choices between two options and were shown the outcome of each choice. In this example, the selected option has a 75% chance of losing $0.25 and a 25% chance of losing $0.75 and is the better of the two options (less loss). Participants completed trials until learning occurred or up to 36 trials per block. (B) The reinforcement learning model incorporated positive and negative prediction errors (δ+ and δ−, respectively), which updated the subsequent expected value (Q) with separately estimated learning rates (α+ and α−, respectively). Trial-by-trial prediction errors are computed as the difference between the outcome and the expected value (R − Q). (C) The model-predicted probability of selecting the better option was a good fit with both C− and C+ participants’ actual behavior (percentage better option selected across subjects). Model prediction and actual selection for participants in the C− (blue) and C+ (red) drug use states were similar (no difference of average log likelihood per trial in a paired comparison across states; t17 = 1.47, p = .14). (D) In both states of drug use, individuals showed learning, improving in choosing the better option as trials progressed. C− individuals relative to C+ individuals showed diminished overall accuracy (t17 = 2.62, p = .01). (E) Bootstrapped group estimates for positive learning rate (α+), negative learning rate (α−), and inverse temperature (τ) suggested higher α+ and lower τ during cocaine deprivation. (F) To clarify whether the drug state modulation of α+ and/or τ was associated with the cocaine state differences in model-free behavioral performance (panel D), we simulated behavioral choices iterating through the observed ranges of α+ and τ parameter values and show that increasing α+ was associated with diminished total earnings, while performance did not vary with changes in τ.
      An additional group of 54 male individuals (see Supplemental Table S1A for demographics) with no history of Axis I psychopathology was used as an independent nonpsychiatric control sample to identity learning-related striatum activation that could be used as a reference against which to interpret any neural effects observed in the individuals with cocaine use disorder and to compute nonpsychiatric individual parameter estimates for model validation and parameter recovery.

       Behavioral Analyses

       Model-Agnostic Behavioral Analyses

      To verify that participants learned during the task, the behavioral choices of each individual in each drug use state were examined over time and quantified as the percentage of trials that the objectively better choice was selected. Within-sample and paired t tests on performance were implemented in MATLAB (The MathWorks, Inc., Natick, MA).

       Computational Model–Based Behavioral Analyses

      To assess model-derived behavioral learning effects for each participant in each drug use state, participants’ behavioral choices were fit to a basic reinforcement Q-learning model that included two learning rates (α) that provided separate update rules for positive (δ+) and negative (δ−) prediction errors (better or worse than expectations, mapping onto successful and unsuccessful loss avoidance [δ+ and δ−, adapted from previous studies (
      • Niv Y.
      • Edlund J.A.
      • Dayan P.
      • O’Doherty J.P.
      Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain.
      ,
      • Christakou A.
      • Gershman S.J.
      • Niv Y.
      • Simmons A.
      • Brammer M.
      • Rubia K.
      Neural and psychological maturation of decision-making in adolescence and young adulthood.
      ,
      • Li J.
      • Delgado M.R.
      • Phelps E.A.
      How instructed knowledge modulates the neural systems of reward learning.
      ,
      • Palminteri S.
      • Lefebvre G.
      • Kilford E.J.
      • Blakemore S.
      Confirmation bias in human reinforcement learning.
      )]) in the form of positive (α+) and negative (α−) learning rates (Figure 1B). The separate learning rates allowed us to evaluate asymmetries in learning from δ+ and δ− [e.g., (
      • Parvaz M.A.
      • Konova A.B.
      • Proudfit G.H.
      • Dunning J.P.
      • Malaker P.
      • Moeller S.J.
      • et al.
      Impaired neural response to negative prediction errors in cocaine addiction.
      ,
      • Tanabe J.
      • Reynolds J.
      • Krmpotich T.
      • Claus E.
      • Thompson L.L.
      • Du Y.P.
      • Banich M.T.
      Reduced neural tracking of prediction error in substance-dependent individuals.
      ,
      • Rose E.J.
      • Salmeron B.J.
      • Ross T.J.
      • Waltz J.
      • Schweitzer J.B.
      • McClure S.M.
      • Stein E.A.
      Temporal difference error prediction signal dysregulation in cocaine dependence.
      ,
      • Stewart J.L.
      • Connolly C.G.
      • May A.C.
      • Tapert S.F.
      • Wittmann M.
      • Paulus M.P.
      Cocaine dependent individuals with attenuated striatal activation during reinforcement learning are more susceptible to relapse.
      ,
      • Brown V.M.
      • Zhu L.
      • Wang J.M.
      • Frueh B.C.
      • King-Casas B.
      • Chiu P.H.
      Associability-modulated loss learning is increased in posttraumatic stress disorder.
      )] and assess the possibility that cocaine’s effects on DA systems differentially affect these components. The model was a good fit of participants’ choices during the loss-learning task (Figure 1C), it was a better fit than a single learning rate model, and model and parameter recovery using simulated data further verified the fit of the model to the observed behavior (see Behavioral Results in Results section). A third estimated parameter, inverse temperature (τ), provided a measure of exploration and indicated the sensitivity of choice probabilities to differences in values. See Model Fitting and Selection in Supplemental Methods for additional descriptions of model selection, model validation, and parameter recovery procedures.
      For the two-learning-rate model, the initial expected values Q(0) for the possible choices a and b were set to 0 because participants were not instructed a priori about the range of possible outcomes. For trial number t, the outcome for the chosen option a was represented by Ra(t), with the expected value represented by Qa(t). The prediction error δ(t), which measures the difference in outcome Ra(t) and expectation Qa(t) for a trial, was defined as follows:
      δ(t)=Ra(t)Qa(t)


      The parameter estimation procedures included separate update rules for positive and negative prediction errors δ(t) in the form of positive (α+) and negative (α−) learning rates, respectively (Figure 1B). The learning rate parameters quantified how much weight the prediction error δ(t) from current trials was given in updating the following trials’ expected value Qa(t + 1):
      Qa(t+1)={Qa(t)+α+δ(t)ifδ(t)<0Qa(t)+αδ(t)ifδ(t)0


      A standard softmax action selection function was used to calculate the probability of selecting choice a at time t and was implemented as follows:
      Pa(t)=eQa(t)τeQa(t)τ+eQb(t)τ


      Positive and negative learning rates (α+ and α−) and inverse temperature (τ) were free parameters, iteratively estimated in MATLAB using the function fminsearch, that were evaluated to have the maximum log likelihood (
      • Sutton R.S.
      • Barto A.G.
      ). Learning rates were bounded between 0 and 1, and inverse temperature was bounded between 0 and ∞. For the unchosen option b, the expected value of the subsequent trial Qb(t + 1) was set to the current trial’s expected value Qb(t) multiplied by an additional decay parameter (φ, bounded between 0 and ∞), similar to previous studies (
      • Niv Y.
      • Daniel R.
      • Geana A.
      • Gershman S.J.
      • Leong Y.C.
      • Radulescu A.
      • Wilson R.C.
      Reinforcement learning in multidimensional environments relies on attention mechanisms.
      ,
      • Boorman E.D.
      • Behrens T.E.J.
      • Woolrich M.W.
      • Rushworth M.F.S.
      How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action.
      ,
      • Cavanagh J.F.
      Cortical delta activity reflects reward prediction error and related behavioral adjustments, but at different times.
      ,
      • Collins A.G.E.
      • Frank M.J.
      Neural signature of hierarchically structured expectations predicts clustering and transfer of rule sets in reinforcement learning.
      ).
      Individual variances in learning rates (α− and α+) as an effect of drug state (C− or C+) were estimated for second-level fMRI analysis (see Imaging Analysis section below). First, the prior mean and distribution of learning rates for participants in each drug state (C− or C+) were estimated using bootstrapped maximum likelihood created via sampling, approximating integration, around a bootstrapped maximum likelihood estimation across subjects (
      • Daw N.D.
      Trial-by-trial data analysis using computational models.
      ,
      • Ahn W.-Y.W.
      • Krawitz A.
      • Kim W.
      • Busemeyer J.R.
      • Brown J.W.
      A model-based fMRI analysis with hierarchical Bayesian parameter estimation.
      ). Individual learning rates were subsequently estimated by conditioning individuals’ behavioral data on the respective drug state (C− or C+) group’s prior distribution to account for drug use status differences. For each participant, individual C+ learning rates were then subtracted from the individual C− learning rates to compute a cocaine deprivation-enhanced learning rate for each participant. Group-specific bootstrapped estimates were used for the inverse temperature and decay parameters during individual estimation of learning rates.

       Imaging Analyses

      To examine neural substrates of loss learning associated with cocaine use state in dependent individuals, model-derived learning variables fit across all participants (as described above) were first correlated with fMRI data collected during the loss-learning task. Next, the model-based neural prediction error signals were related to participants’ self-reported cocaine use measures.

       First-Level fMRI Processing

      The general linear model implemented in SPM8 (
      • Friston K.J.
      • Holmes A.P.
      • Worsley K.J.
      • Poline J.-P.
      • Frith C.D.
      • Frackowiak R.S.J.
      Statistical parametric maps in functional imaging: A general linear approach.
      ) was used to perform neuroimaging analyses at the individual and group levels. For the first-level analyses, onset times for stimuli, outcome events for δ+ outcomes, and outcome events for δ− outcomes for each trial were modeled as separate punctate events. The outcomes were categorized based on the sign of the prediction error (δ > 0 or δ < 0, indicating δ+ or δ−, respectively), using the fitted estimates of the two-learning-rate model, in which trial-by-trial δs were generated (see procedures in Supplemental Methods). To examine the first-level effects of drug use status on neural representation of learning and valuation, cocaine-positive (C+, urine positive for cocaine metabolites) and cocaine-negative (C−, urine negative for cocaine metabolites) drug use states for each individual were modeled as separate first-level general linear models. Trial-by-trial expected values (Q) were modeled as parametric regressors onto the response events. Trial-by-trial δ+ and δ− and the actual outcomes were modeled as parametric regressors onto separate δ+ and δ− outcome events, respectively. Effects due to run number, time in scanner, and head movement parameters were modeled as nuisance covariates for each time point.

       Within and Paired Drug State Analyses

      The within-drug state (C− or C+) and paired-subjects (C− > C+) effects of cocaine use were compared using one-sample and paired-subjects’ second-level contrasts in SPM8. The effects of interest were neural responses to δ+ and δ−. In line with previous data demonstrating the role of the striatum and DA in learning (
      • Pessiglione M.
      • Seymour B.
      • Flandin G.
      • Dolan R.J.
      • Frith C.D.
      Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans.
      ,
      • Montague P.R.
      • King-Casas B.
      • Cohen J.D.
      Imaging valuation models in human choice.
      ,
      • Jocham G.
      • Klein T.A.
      • Ullsperger M.
      Differential modulation of reinforcement learning by D2 dopamine and NMDA glutamate receptor antagonism.
      ), the imaging analyses were masked for the striatum. Anatomical masks were constructed using WFU PickAtlas (
      • Maldjian J.A.
      • Laurienti P.J.
      • Kraft R.A.
      • Burdette J.H.
      An automated method for neuroanatomic and cytoarchitectonic atlas-based interrogation of fMRI data sets.
      ) including the structures of the caudate, putamen, and globus pallidus. Also included in the striatum mask was the nucleus accumbens [per Garrison et al. (
      • Garrison J.
      • Erdeniz B.
      • Done J.
      Prediction error in reinforcement learning: A meta-analysis of neuroimaging studies.
      )]. Results were thresholded with a voxel-level uncorrected p < .001 unless otherwise noted, and significant clusters were defined using familywise error correction.

       Correlation Analyses Between Cocaine State–Modulated Learning Rate and Neural Prediction Error Signals

      To relate drug state effects on behavioral learning rates (α+ and α−) with the corresponding neural δ signals, separate first-level and second-level general linear models were created to correlate within-subject drug-modulated α+ and α− differences (C− > C+ for α+ and C− > C+ for α−) with the corresponding neural differences for positive and negative δ (C− > C+ for δ+ and C− > C+ for δ−). Results were again thresholded with a voxel-level uncorrected p < .001, and significant clusters were defined using familywise error correction. In addition, leave-one-out cross-validation analyses were performed in regions of interest (see Supplemental Methods) to reduce bias due to nonindependence (
      • Esterman M.
      • Tamber-Rosenau B.J.
      • Chiu Y.
      • Yantis S.
      Avoiding non-independence in fMRI data analysis: Leave one subject out.
      ).

       Relationships Between Neural Prediction Error Responses and Behavioral Cocaine Use Measures

      To test relationships between the observed neural learning signals and cocaine use measures, questionnaire data characterizing individual drug use history and current cocaine craving were tested against subjects’ C− neural prediction error responses (given the primary results of interest involving enhanced δ+ from drug deprivation). Again, using leave-one-out cross-validation analysis, neural signals from trials with δ+ were correlated with years of drug use and subscales of the Cocaine Craving Questionnaire (
      • Tiffany S.T.
      • Singleton E.
      • Haertzen C.A.
      • Henningfield J.E.
      The development of a Cocaine Craving Questionnaire.
      ). The analyses identified relationships among years of drug use, δ+ neural signal, and the desire to use cocaine. Based on the results of the correlation analysis, a mediation analysis was performed testing whether neural learning signals mediated the relationship between duration of drug use and individuals’ desire to use cocaine or expected positive outcome from cocaine use (C− measures). A bootstrap approach to mediation (
      • Efron B.
      • Tibshirani R.
      Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy.
      ) was implemented in R to calculate a 95% confidence interval with 10,000 bootstrapped resamples.

      Results

       Behavioral Results

       Model-Agnostic Behavioral Results

      In both drug use states, participants demonstrated learning and performed significantly above chance (percentage of trials on which the “better” option was chosen; C−: 62.76%, SE = 3.22, t17 = 3.91, d = 0.96, p < .01; C+: 73.61%, SE = 2.69, t17 = 8.70, d = 2.12, p < .01; chance: 50%) (Figure 1D). In addition, participants in the C− state showed diminished accuracy relative to C+ participants (t17 = 2.62, d = 0.30, p = .01).

       Computational Model–Derived Behavioral Results

      Computational model–based analyses, using bootstrapped group parameters [per (
      • Martinez W.L.
      • Martinez A.R.
      Computational Statistics Handbook with MATLAB.
      ); 200 estimation iterations within each drug use state with subjects drawn with replacement for each iteration] for positive and negative learning rate (α+ and α−, respectively) and inverse temperature (τ), suggested increased positive learning rates (α+) and decreased τ in C− participants relative to C+ participants; α− did not differ between participants in the C− and C+ states (Figure 1E). To clarify whether cocaine state modulation of α+ or τ was associated with the diminished behavioral accuracy in C− participants, we simulated behavioral choices holding α+ constant (iterating through the ranges of the observed parameter values) while allowing τ to vary and similarly holding τ constant and allowing α+ to vary. As shown in Figure 1F, these simulations revealed increased α+ to be associated with decreased performance and no relationship between τ and performance (see simulation details in Supplemental Methods). Together, these data provide initial evidence of drug state modulation of learning, where cocaine deprivation–related increases in positive learning rates are associated with diminished behavioral performance.

       Imaging Results

       Effects of Cocaine Use State on Neural Prediction Error Signals

      Significant neural correlates of positive prediction error were observed in the striatum for C− participants (δ+ for C−) (Figure 2A and Supplemental Table S2A) but not for C+ participants. In addition, no significant neural correlates of negative prediction error (δ−) were found during either drug use state. Positive prediction error (δ+) responses were verified in the nonpsychiatric participants (Supplemental Figure S1 and Supplemental Table S2B), and post hoc analyses using an independent striatum region of interest indicated that δ+ in C− participants was comparable to this control cohort, whereas C+ participants showed significantly diminished δ+ responses (C− vs. nonpsychiatric control participants, t70 = 0.15, d = 0.004, p = .87; C+ vs. nonpsychiatric control participants, t70 = 3.22, d = 0.09, p = .001) (Supplemental Figure S1; see analytic details in Supplemental Methods).
      Figure thumbnail gr2
      Figure 2Cocaine use status modulates neural learning signals and reveals increased positive prediction error signaling during deprivation. (A) Positive prediction error signal (δ+) was found in the right striatum for cocaine-deprived participants (C−; peak voxel at t = 5.33; cluster familywise error corrected p = .005; thresholded at t = 2.9 for visualization) (see ). No significant neural δ+ signals were found for participants using cocaine as usual (C+). Neither C− nor C+ participants showed significant neural responses to negative prediction errors (δ−). See also and for C− > C+ contrasts that further show deprivation enhancement of learning signals. (B) Deprivation enhancement of positive learning rate (α+) was correlated with deprivation enhancement of positive prediction error (δ+) in the striatum (C− > C+ for α+ and neural C− > C+ for δ+; left striatum peak at t = 7.20, right striatum peak at t = 5.61; both ps < .0001; thresholded at t = 2.9 with striatum mask for visualization) (see ). (C) Drug state–dependent (i.e., C− > C+) neural β values were extracted for both δ+ and δ− from the bilateral striatum and correlated with their corresponding deprivation-enhanced learning rates. For positive learning rate, the degree of participants’ deprivation enhancement was significantly associated with the degree of deprivation enhancement of positive prediction error responses in the striatum (C− > C+ for α+ and neural C− > C+ for δ+) () (r = .79, p < .01)
      (
      • Boorman E.D.
      • Behrens T.E.J.
      • Woolrich M.W.
      • Rushworth M.F.S.
      How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action.
      )
      . No relationship between drug state modulation of negative learning rates and their associated neural prediction error signals was observed (C− > C+ for α− and neural C− > C+ for δ−; r = −.08, p = .72). β value differences (C− > C+; whole brain normalized) are plotted. ∗p < .05, relative to zero. ROI, region of interest.
      The specificity of the neural encoding of positive prediction errors in the C− participants (Figure 2A) was striking in its parallel with the increased positive learning rate in these participants. Thus, to test for a neural instantiation of the deprivation-increased positive learning rate, we first computed individual behavioral learning rate estimates for each participant in the C− and C+ states [see Methods and Materials and (
      • Daw N.D.
      Trial-by-trial data analysis using computational models.
      )] and generated for each participant deprivation-enhanced positive and negative learning rate metrics (i.e., C− > C+ for α− and α+, respectively, for each individual). For positive learning rate, the degree of participants’ deprivation enhancement was significantly associated with the degree of deprivation enhancement of positive prediction error responses in the striatum (C− > C+ for α+ and neural C− > C+ for δ+) (Figure 2B and Supplemental Table S2C) (r = .79, p < .01; using leave-one-out cross-validation to avoid potential bias due to nonindependence) (
      • Esterman M.
      • Tamber-Rosenau B.J.
      • Chiu Y.
      • Yantis S.
      Avoiding non-independence in fMRI data analysis: Leave one subject out.
      ). No relationship between drug state modulation of negative learning rates and their associated neural prediction error signals was observed (C− > C+ for α− and neural C− > C+ for δ−; r = −.08, p = .72) (Figure 2C). For C− > C+, contrasts further show deprivation enhancement (see Supplemental Figure S2A and Supplemental Table S2D). Supplemental Figure S3 shows similar imaging results when using group estimates from within-status behavioral estimates. In addition, no effects of cocaine deprivation on neural expected value signals were detected (Supplemental Figure S2B), indicating generally intact outcome valuation unaffected by drug use status.

       Results Relating Neural Prediction Error Signals and Behavioral Cocaine Use Measures

      As described above, the specificity of drug state modulation and deprivation enhancement to positive (i.e., successful loss avoidance) prediction errors (δ+) was consistent with the hypothesis that reinforcement from successfully avoiding negative states contributes to continued drug seeking in addiction. In this case, successful loss avoidance in cocaine-deprived participants should be further related to aspects of real-world cocaine use. To test this possibility, we regressed C− individuals’ neural δ+ responses (β values from outcomes with δ+) against self-reported drug craving [subscales of Cocaine Craving Questionnaire (
      • Tiffany S.T.
      • Singleton E.
      • Haertzen C.A.
      • Henningfield J.E.
      The development of a Cocaine Craving Questionnaire.
      )] (Supplemental Figure S4A and Supplemental Table S3) and observed that neural δ+ responses were related specifically to the desire to use cocaine (Figure 3A and Supplemental Table S3) [r = .70, p < .01; correlations again performed using neural signals obtained from leave-one-out cross-validation analyses and Bonferroni corrected for multiple comparisons as described in Methods and Materials (
      • Esterman M.
      • Tamber-Rosenau B.J.
      • Chiu Y.
      • Yantis S.
      Avoiding non-independence in fMRI data analysis: Leave one subject out.
      )]. These relationships were also present using the deprivation-enhanced neural δ+ signal (i.e., C− > C+, extracted from outcomes with δ+) (r = .67, p < .01) and not observed in the C+ state (i.e., signal from outcomes with δ+ while C+) (Supplemental Figure S4B) (r = −.20, p = .42). Greater neural δ+ responses during cocaine deprivation were also associated with greater years of cocaine use (C−) (Figure 3A) (r = .64, p < .01); no relationship between neural δ+ and chronicity of use was observed for participants in the C+ state (Supplemental Figure S4C) (r = −.09, p = .71). No other subscales of the Cocaine Craving Questionnaire were correlated with striatal δ+ signals (Supplemental Table S3). Lastly, desire to use cocaine (Figure 3A) (r = .61, p < .01) was also positively correlated with participants’ years of cocaine use. Following these observed relationships, a mediation analysis [(
      • Efron B.
      • Tibshirani R.
      Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy.
      ); see Methods and Materials] revealed that the deprivation-enhanced neural δ+ signal fully mediated the relationship between years of cocaine use and desire to use cocaine while deprived (Figure 3B) (path c: β = .09, p < .01; path a: β = .06, p = .04; path b: β = .66, p = .01; path c′: β = .05, p = .07; mediation effect a × b: 95% confidence interval = 0.0008–0.0953).
      Figure thumbnail gr3
      Figure 3During cocaine deprivation, neural positive prediction error signals mediate relationship between chronicity of drug use and desire to use cocaine. In deprived cocaine-dependent individuals (C−): (A) Positive prediction error signal (normalized δ+) in the striatum is associated with greater desire to use cocaine (r = .70, p < .01), longer history of cocaine use predicted increased neural prediction error signal (normalized δ+; r = .64, p < .01), and longer history of cocaine use predicted higher desire to use cocaine (r = .61, p < .01). (B) Following these correlational results, a mediation analysis found that the relationship between individuals’ years of cocaine use and their desire to use while deprived (path c: β = .09, p < .01) was fully mediated by individuals’ deprivation-enhanced neural δ+ signal (path a: β = .06, p = .04; path b: β = .66, p = .01; path c′: β = .05, p = .07; mediation effect a × b: 95% confidence interval [C.I.] = 0.0008–0.0953).

      Discussion

      Using a computational psychiatry approach (
      • Maia T.V.
      • Frank M.J.
      From reinforcement learning models to psychiatric and neurological disorders.
      ,
      • Friston K.J.
      • Stephan K.E.
      • Montague R.
      • Dolan R.J.
      Computational psychiatry: The brain as a phantastic organ.
      ,
      • Friston K.J.
      • Redish A.D.
      • Gordon J.A.
      Computational nosology and precision psychiatry.
      ), we show drug state modulation of learning signals in cocaine-dependent participants, such that successful loss avoidance signals are greater during deprivation and the neural responses are associated with both longer history of drug use and greater desire for cocaine. The specificity of the deprivation enhancement to positive neural prediction error signals during loss avoidance appears to parallel clinical descriptions of addiction as a cycle maintained by negative reinforcement where drug-deprived dependent individuals seek drugs and thus successfully avoid negative states (e.g., withdrawal, isolation); such successful loss avoidance has been posited to reinforce continued drug seeking [for relevant discussions, see (
      • West R.
      • Hardy A.
      Theory of Addiction.
      ,
      • Potenza M.N.
      • Sofuoglu M.
      • Carroll K.M.
      • Rounsaville B.J.
      Neuroscience of behavioral and pharmacological treatments for addictions.
      )].
      These data are consonant with prior studies showing that with greater chronicity of cocaine use, physiological adaptations occur in DA systems (
      • Volkow N.D.
      • Fowler J.S.
      • Wang G.
      • Swanson J.M.
      Dopamine in drug abuse and addiction: Results from imaging studies and treatment implications.
      ,
      • Park K.
      • Volkow N.D.
      • Pan Y.
      • Du C.
      Chronic cocaine dampens dopamine signaling during cocaine intoxication and unbalances D1 over D2 receptor signaling.
      ). In particular, the enhanced neural positive prediction error (δ+) encoding in C− participants relative to C+ participants is consistent with studies showing that humans with long-term cocaine dependency have decreased density of striatal DA receptors and lower tonic DA levels (
      • Volkow N.D.
      • Fowler J.S.
      • Wang G.
      • Swanson J.M.
      Dopamine in drug abuse and addiction: Results from imaging studies and treatment implications.
      ) and that acute cocaine intake in chronically cocaine-treated mice reduces DA signaling (
      • Park K.
      • Volkow N.D.
      • Pan Y.
      • Du C.
      Chronic cocaine dampens dopamine signaling during cocaine intoxication and unbalances D1 over D2 receptor signaling.
      ). Following from these studies, δ+ signals ought to be more evident during drug deprivation (as observed here) than during drug use because DA receptors, although diminished in density, are free in the deprived state to detect δ+ fluctuations. We note that in the current cocaine-dependent participants, neural δ+ responses in the drug-deprived state are comparable to the δ+ observed in nonpsychiatric control participants, whereas δ+ signaling in the drug-using state was diminished relative to the control participants. Together, these data suggest that although learning signal impairments appear to be restored by cocaine deprivation in dependent participants, such intact learning can have increasingly detrimental consequences in the context of unhealthy reinforcers, negative environmental states, and adverse outcomes (e.g., when dependent individuals are faced with withdrawal avoidance, drug-available environments, and drug use).
      The current data are also relevant for closely related reports of significant increase in prediction error correlates following DA agonist administration (
      • Pessiglione M.
      • Seymour B.
      • Flandin G.
      • Dolan R.J.
      • Frith C.D.
      Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans.
      ) and computational model-based theories that drug use exacerbates prediction errors (
      • Redish A.D.
      Addiction as a computational process gone awry.
      ) or triggers false phasic activation of DA neurons (
      • Schultz W.
      Potential vulnerabilities of neuronal reward, risk, and decision mechanisms to addictive drugs.
      ). A key difference between these previous reports and the current findings is the incorporation of the consequences of long-term drug dependence (i.e., diminished DA functioning) into an understanding of learning in addiction (
      • Redish A.D.
      Addiction as a computational process gone awry.
      ,
      • Keiflin R.
      • Janak P.H.
      Dopamine prediction errors in reward learning and addiction: From theory to neural circuitry.
      ). In addition, the current diminished δ+ signaling in C+ individuals and enhancement in C− individuals is consistent with related work showing DA drug state modulation of learning signals in participants with Parkinson’s disease (who are known to have impaired DA function); these participants similarly show reduced prediction error-related blood oxygen level–dependent responses when on DA-enhancing medication (levodopa) and greater prediction error responses while off medication, specifically to positive prediction errors [δ+; (
      • Voon V.
      • Pessiglione M.
      • Brezing C.
      • Gallea C.
      • Fernandez H.H.
      • Dolan R.J.
      • Hallett M.
      Mechanisms underlying dopamine-mediated reward bias in compulsive behaviors.
      ,
      • Schmidt L.
      • Braun E.K.
      • Wager T.D.
      • Shohamy D.
      Mind matters: Placebo enhances reward learning in Parkinson’s disease.
      )].
      Finally, we show that greater neural loss-learning δ+ (signaling successful loss avoidance) during deprivation mediates a relationship between chronicity of drug use and desire for cocaine. This relationship supports the hypothesis that drug state–dependent learning signals play a role in maintaining drug use. The current data thus emphasize that both drug use chronicity and the context in which learning is assessed (e.g., loss, gain) may be critical for identifying neurobehavioral mechanisms that maintain drug use [for related data indicating differences in neural substrates of loss and gain learning, see (
      • Palminteri S.
      • Justo D.
      • Jauffret C.
      • Pavlicek B.
      • Dauta A.
      • Delmaire C.
      • et al.
      Critical roles for anterior insula and dorsal striatum in punishment-based avoidance learning.
      ,
      • Seymour B.
      • Daw N.
      • Dayan P.
      • Singer T.
      • Dolan R.
      Differential encoding of losses and gains in the human striatum.
      ,
      • Cox S.M.L.
      • Frank M.J.
      • Larcher K.
      • Fellows L.K.
      • Clark C.A.
      • Leyton M.
      • Dagher A.
      Striatal D1 and D2 signaling differentially predict learning from positive and negative outcomes.
      ,
      • Pessiglione M.
      • Delgado M.R.
      The good, the bad and the brain: Neural correlates of appetitive and aversive values underlying decision making.
      )].
      The limitations of the current work provide avenues for further study. First, a relatively small number of male participants were included in this study (N = 22). While the within-subjects design and advantages of sample homogeneity partially mitigate the sample size, replication in a larger, more diverse sample would address questions regarding generalizability. In addition, the current study identified drug state modulation of responses to negative outcomes but did not evaluate the degree to which the physical consequences per se (i.e., small or large monetary loss), emotions associated with the consequences, or other aspects of the outcomes contribute to the reinforcement provided by successful loss avoidance. Clarifying the role of components of negative outcomes in maintaining substance use in dependence ought to be a focus of future studies. Finally, we focused our neural analyses primarily on regions of the striatum, given previous work linking learning mechanisms and cocaine pharmacodynamics to these regions (
      • Montague P.R.
      • Hyman S.E.
      • Cohen J.D.
      Computational roles for dopamine in behavioural control.
      ,
      • Schultz W.
      • Dayan P.
      • Montague P.R.
      A neural substrate of prediction and reward.
      ,
      • Pessiglione M.
      • Seymour B.
      • Flandin G.
      • Dolan R.J.
      • Frith C.D.
      Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans.
      ,
      • Volkow N.D.
      • Fowler J.S.
      • Wang G.
      • Swanson J.M.
      Dopamine in drug abuse and addiction: Results from imaging studies and treatment implications.
      ). Supplemental analyses found no effects of cocaine state on encoding of expected value (see Supplemental Figure S2B), indicating generally intact outcome valuation unaffected by drug use status; nonetheless, other neural regions implicated in learning may be of interest in future investigations.
      In summary, in cocaine-dependent participants, we show that drug deprivation enhances neural signaling of successful loss avoidance, which in turn predicts increased desire to use cocaine. The deprivation-enhanced neural prediction error is in line with prior reports of DA adaptations associated with chronic substance use (
      • Potenza M.N.
      • Sofuoglu M.
      • Carroll K.M.
      • Rounsaville B.J.
      Neuroscience of behavioral and pharmacological treatments for addictions.
      ,
      • Volkow N.D.
      • Fowler J.S.
      • Wang G.
      • Swanson J.M.
      Dopamine in drug abuse and addiction: Results from imaging studies and treatment implications.
      ,
      • Park K.
      • Volkow N.D.
      • Pan Y.
      • Du C.
      Chronic cocaine dampens dopamine signaling during cocaine intoxication and unbalances D1 over D2 receptor signaling.
      ) and also points to a potential mechanism by which drug seeking is maintained. That is, when dependent individuals are at their most vulnerable (i.e., during drug deprivation), reward signals associated with successful avoidance of negative states are at their greatest and may contribute to a pernicious cycle of drug seeking in the face of quit attempts. Of note, DA dysregulation has been associated with poor response to behavioral treatments in addiction (
      • Martinez D.
      • Carpenter K.M.
      • Liu F.
      • Slifstein M.
      • Broft A.
      • Friedman A.C.
      • et al.
      Imaging dopamine transmission in cocaine dependence: Link between neurochemistry and response to treatment.
      ), and innovative behavioral training protocols have identified learning systems as potential new mechanistic treatment argets for cocaine dependence (
      • Ersche K.D.
      • Gillan C.M.
      • Jones P.S.
      • Williams G.B.
      • Ward L.H.E.
      • Luijten M.
      • et al.
      Carrots and sticks fail to change behavior in cocaine addiction.
      ). More generally, the current results support targeting learning-based therapies to identify goal-driven behaviors that provide relief from the negative outcomes of drug deprivation and indicate that attention to drug state may be critical for understanding neural mechanisms of addiction and refining learning-based therapies.

      Acknowledgments and Disclosures

      This work was supported in part by the National Institutes of Health (Grant Nos. R01MH091872 and R21DA042274 [to PHC], Grant No. R01DA036017 to [BK-C], and Grant Nos. RC1DA028387 and R01DA023624 [to RDLG]).
      PHC, BK-C, RDLG, and TN designed the experiments. JMW analyzed the data with input from LZ, VMB, PHC, and BK-C. PHC, BK-C, RDLG, and TN supervised this work. JMW and PHC drafted the manuscript with input from all authors. All authors edited and approved the final version.
      We acknowledge the technical assistance of George Christopoulos, Dongil Chung, Jacob Lee, James Mahoney, Dharol Tankersley, Katherine McCurry, Nina Lauharatanahirun, and members of the Chiu, De La Garza, King-Casas, and Newton Labs.
      The authors report no biomedical financial interests or potential conflicts of interest.

      Supplementary Material

      References

        • West R.
        • Hardy A.
        Theory of Addiction.
        Blackwell, Malden, MA2005
        • Potenza M.N.
        • Sofuoglu M.
        • Carroll K.M.
        • Rounsaville B.J.
        Neuroscience of behavioral and pharmacological treatments for addictions.
        Neuron. 2011; 69: 695-712
        • Lucantonio F.
        • Stalnaker T.A.
        • Shaham Y.
        • Niv Y.
        • Schoenbaum G.
        The impact of orbitofrontal dysfunction on cocaine addiction.
        Nat Neurosci. 2012; 15: 358-366
        • Hyman S.E.
        • Malenka R.C.
        • Nestler E.J.
        Neural mechanisms of addiction: The role of reward-related learning and memory.
        Annu Rev Neurosci. 2006; 29: 565-598
        • Montague P.R.
        • Hyman S.E.
        • Cohen J.D.
        Computational roles for dopamine in behavioural control.
        Nature. 2004; 431: 760-767
        • Schultz W.
        • Dayan P.
        • Montague P.R.
        A neural substrate of prediction and reward.
        Science. 1997; 275: 1593-1599
        • Pessiglione M.
        • Seymour B.
        • Flandin G.
        • Dolan R.J.
        • Frith C.D.
        Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans.
        Nature. 2006; 442: 1042-1045
        • Acevedo-Rodriguez A.
        • Zhang L.
        • Zhou F.
        • Gong S.
        • Gu H.
        • De Biasi M.
        • et al.
        Cocaine inhibition of nicotinic acetylcholine receptors influences dopamine release.
        Front Synaptic Neurosci. 2014; 6: 19
        • Volkow N.D.
        • Fowler J.S.
        • Wang G.
        • Swanson J.M.
        Dopamine in drug abuse and addiction: Results from imaging studies and treatment implications.
        Mol Psychiatry. 2004; 9: 557-569
        • Martinez D.
        • Carpenter K.M.
        • Liu F.
        • Slifstein M.
        • Broft A.
        • Friedman A.C.
        • et al.
        Imaging dopamine transmission in cocaine dependence: Link between neurochemistry and response to treatment.
        Am J Psychiatry. 2011; 168: 634-641
        • Park K.
        • Volkow N.D.
        • Pan Y.
        • Du C.
        Chronic cocaine dampens dopamine signaling during cocaine intoxication and unbalances D1 over D2 receptor signaling.
        J Neurosci. 2013; 33: 15827-15836
        • Keramati M.
        • Durand A.
        • Girardeau P.
        • Gutkin B.
        • Ahmed S.H.
        Cocaine addiction as a homeostatic reinforcement learning disorder.
        Psychol Rev. 2017; 124: 130-153
        • Willuhn I.
        • Wanat M.J.
        • Clark J.J.
        • Phillips P.E.M.
        Dopamine signaling in the nucleus accumbens of animals self-administering drugs of abuse.
        Curr Top Behav Neurosci. 2010; 11: 29-71
        • Montague P.R.
        • King-Casas B.
        • Cohen J.D.
        Imaging valuation models in human choice.
        Annu Rev Neurosci. 2006; 29: 417-448
        • Jocham G.
        • Klein T.A.
        • Ullsperger M.
        Differential modulation of reinforcement learning by D2 dopamine and NMDA glutamate receptor antagonism.
        J Neurosci. 2014; 34: 13151-13162
        • Parvaz M.A.
        • Konova A.B.
        • Proudfit G.H.
        • Dunning J.P.
        • Malaker P.
        • Moeller S.J.
        • et al.
        Impaired neural response to negative prediction errors in cocaine addiction.
        J Neurosci. 2015; 35: 1872-1879
        • Tanabe J.
        • Reynolds J.
        • Krmpotich T.
        • Claus E.
        • Thompson L.L.
        • Du Y.P.
        • Banich M.T.
        Reduced neural tracking of prediction error in substance-dependent individuals.
        Am J Psychiatry. 2013; 170: 1356-1363
        • Rose E.J.
        • Salmeron B.J.
        • Ross T.J.
        • Waltz J.
        • Schweitzer J.B.
        • McClure S.M.
        • Stein E.A.
        Temporal difference error prediction signal dysregulation in cocaine dependence.
        Neuropsychopharmacology. 2014; 39: 1732-1742
        • Stewart J.L.
        • Connolly C.G.
        • May A.C.
        • Tapert S.F.
        • Wittmann M.
        • Paulus M.P.
        Cocaine dependent individuals with attenuated striatal activation during reinforcement learning are more susceptible to relapse.
        Psychiatry Res Neuroimaging. 2014; 223: 129-139
        • Brown V.M.
        • Zhu L.
        • Wang J.M.
        • Frueh B.C.
        • King-Casas B.
        • Chiu P.H.
        Associability-modulated loss learning is increased in posttraumatic stress disorder.
        eLife. 2018; 7: 1-27
        • Niv Y.
        • Edlund J.A.
        • Dayan P.
        • O’Doherty J.P.
        Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain.
        J Neurosci. 2012; 32: 551-562
        • Christakou A.
        • Gershman S.J.
        • Niv Y.
        • Simmons A.
        • Brammer M.
        • Rubia K.
        Neural and psychological maturation of decision-making in adolescence and young adulthood.
        J Cogn Neurosci. 2013; 25: 1807-1823
        • Li J.
        • Delgado M.R.
        • Phelps E.A.
        How instructed knowledge modulates the neural systems of reward learning.
        Proc Natl Acad Sci U S A. 2011; 108: 55-60
        • Palminteri S.
        • Lefebvre G.
        • Kilford E.J.
        • Blakemore S.
        Confirmation bias in human reinforcement learning.
        PLoS Comput Biol Aug 11. 2017; 13: e1005684
        • Sutton R.S.
        • Barto A.G.
        Reinforcement Learning: An Introduction. Vol. 1. MIT Press, Cambridge, MA1998
        • Niv Y.
        • Daniel R.
        • Geana A.
        • Gershman S.J.
        • Leong Y.C.
        • Radulescu A.
        • Wilson R.C.
        Reinforcement learning in multidimensional environments relies on attention mechanisms.
        J Neurosci. 2015; 35: 8145-8157
        • Boorman E.D.
        • Behrens T.E.J.
        • Woolrich M.W.
        • Rushworth M.F.S.
        How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action.
        Neuron. 2009; 62: 733-743
        • Cavanagh J.F.
        Cortical delta activity reflects reward prediction error and related behavioral adjustments, but at different times.
        NeuroImage. 2015; 110: 205-216
        • Collins A.G.E.
        • Frank M.J.
        Neural signature of hierarchically structured expectations predicts clustering and transfer of rule sets in reinforcement learning.
        Cognition. 2016; 152: 160-169
        • Daw N.D.
        Trial-by-trial data analysis using computational models.
        in: Delgado M.R. Phelps E.A. Robbins T.W. Decision Making, Affect, and Learning: Attention & Performance XXIII. Oxford University Press, Oxford, UK2011: 3-38
        • Ahn W.-Y.W.
        • Krawitz A.
        • Kim W.
        • Busemeyer J.R.
        • Brown J.W.
        A model-based fMRI analysis with hierarchical Bayesian parameter estimation.
        J Neurosci Psychol Econ. 2011; 4: 95-110
        • Friston K.J.
        • Holmes A.P.
        • Worsley K.J.
        • Poline J.-P.
        • Frith C.D.
        • Frackowiak R.S.J.
        Statistical parametric maps in functional imaging: A general linear approach.
        Hum Brain Mapp. 1994; 2: 189-210
        • Maldjian J.A.
        • Laurienti P.J.
        • Kraft R.A.
        • Burdette J.H.
        An automated method for neuroanatomic and cytoarchitectonic atlas-based interrogation of fMRI data sets.
        NeuroImage. 2003; 19: 1233-1239
        • Garrison J.
        • Erdeniz B.
        • Done J.
        Prediction error in reinforcement learning: A meta-analysis of neuroimaging studies.
        Neurosci Biobehav Rev. 2013; 37: 1297-1310
        • Esterman M.
        • Tamber-Rosenau B.J.
        • Chiu Y.
        • Yantis S.
        Avoiding non-independence in fMRI data analysis: Leave one subject out.
        NeuroImage. 2010; 50: 572-576
        • Tiffany S.T.
        • Singleton E.
        • Haertzen C.A.
        • Henningfield J.E.
        The development of a Cocaine Craving Questionnaire.
        Drug Alcohol Depend. 1993; 34: 19-28
        • Efron B.
        • Tibshirani R.
        Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy.
        Stat Sci. 1986; 1: 54-75
        • Martinez W.L.
        • Martinez A.R.
        Computational Statistics Handbook with MATLAB.
        2nd ed. Chapman and Hall/CRC, London2008
        • Maia T.V.
        • Frank M.J.
        From reinforcement learning models to psychiatric and neurological disorders.
        Nat Neurosci. 2011; 14: 154-162
        • Friston K.J.
        • Stephan K.E.
        • Montague R.
        • Dolan R.J.
        Computational psychiatry: The brain as a phantastic organ.
        Lancet Psychiatry. 2014; 1: 148-158
        • Friston K.J.
        • Redish A.D.
        • Gordon J.A.
        Computational nosology and precision psychiatry.
        Comput Psychiatry. 2017; 1: 2-23
        • Redish A.D.
        Addiction as a computational process gone awry.
        Science. 2004; 306: 1944-1947
        • Schultz W.
        Potential vulnerabilities of neuronal reward, risk, and decision mechanisms to addictive drugs.
        Neuron. 2011; 69: 603-617
        • Keiflin R.
        • Janak P.H.
        Dopamine prediction errors in reward learning and addiction: From theory to neural circuitry.
        Neuron. 2015; 88: 247-263
        • Voon V.
        • Pessiglione M.
        • Brezing C.
        • Gallea C.
        • Fernandez H.H.
        • Dolan R.J.
        • Hallett M.
        Mechanisms underlying dopamine-mediated reward bias in compulsive behaviors.
        Neuron. 2010; 65: 135-142
        • Schmidt L.
        • Braun E.K.
        • Wager T.D.
        • Shohamy D.
        Mind matters: Placebo enhances reward learning in Parkinson’s disease.
        Nat Neurosci. 2014; 17: 1793-1797
        • Palminteri S.
        • Justo D.
        • Jauffret C.
        • Pavlicek B.
        • Dauta A.
        • Delmaire C.
        • et al.
        Critical roles for anterior insula and dorsal striatum in punishment-based avoidance learning.
        Neuron. 2012; 76: 998-1009
        • Seymour B.
        • Daw N.
        • Dayan P.
        • Singer T.
        • Dolan R.
        Differential encoding of losses and gains in the human striatum.
        J Neurosci. 2007; 27: 4826-4831
        • Cox S.M.L.
        • Frank M.J.
        • Larcher K.
        • Fellows L.K.
        • Clark C.A.
        • Leyton M.
        • Dagher A.
        Striatal D1 and D2 signaling differentially predict learning from positive and negative outcomes.
        NeuroImage. 2015; 109: 95-101
        • Pessiglione M.
        • Delgado M.R.
        The good, the bad and the brain: Neural correlates of appetitive and aversive values underlying decision making.
        Curr Opin Behav Sci. 2015; 5: 78-84
        • Ersche K.D.
        • Gillan C.M.
        • Jones P.S.
        • Williams G.B.
        • Ward L.H.E.
        • Luijten M.
        • et al.
        Carrots and sticks fail to change behavior in cocaine addiction.
        Science. 2016; 352: 1468-1471