An extraordinary, must-read article is now available open access:
Jureidini, JN, Amsterdam, JD, McHenry, LB. The citalopram CIT-MD-18 pediatric depression trial: Deconstruction of medical ghostwriting, data mischaracterisation and academic malfeasance. International Journal of Risk & Safety in Medicine, vol. 28, no. 1, pp. 33-43, 2016
The authors had access to internal documents written with the belief that they would be left buried in corporate files. However, these documents became publicly available in a class-action product liability suit concerning the marketing of the antidepressant citalopram for treating children and adolescents.
Detailed evidence of ghost writing by industry sponsors has considerable shock value. But there is a broader usefulness to this article allowing us to peek in on the usually hidden processes by which null findings and substantial adverse events are spun into a positive report of the efficacy and safety of a treatment.
We are able to see behind the scenes how an already underspecified protocol was violated, primary and secondary outcomes were switched or dropped, and adverse events were suppressed in order to obtain the kind of results needed for a planned promotional effort and the FDA approval for use of the drug in these populations.
We can see how subtle changes in analyses that would otherwise go unnoticed can have a profound impact on clinical and public policy.
In so many other situations, we are left only with our skepticism about results being too good to be true. We are usually unable to evaluate independently investigators’ claims because protocols are unavailable, deviations are not noted, analyses are conducted and reported without transparency. Importantly, there usually is no access to data that would be necessary for reanalysis.
The authors whose work is being criticized are among the most prestigious child psychiatrists in the world. The first author is currently President-elect of the American Academy of Child and Adolescent Psychiatry. The journal is among the top psychiatry journals in the world. A subscription is provided as part of membership in the American Psychiatric Association. Appearing in this journal is thus strategic because its readership includes many practitioners and clinicians who will simply defer to academics publishing in a journal they respect, without inclination to look carefully.
Indeed, I encourage readers to go to the original article and read it before proceeding further in the blog. Witness the unmasking of how null findings were turned positive. Unless you had been alerted, would you have detected that something was amiss?
Some readers have participated in multisite trials other than as a lead investigator. I ask them to imagine that they had had received the manuscript for review and approval and assumed it was vetted by the senior investigators – and only the senior investigators. Would they have subjected it to the scrutiny needed to detect data manipulation?
I similarly ask reviewers for scientific journals if they would have detected something amiss. Would they have compared the manuscript to the study protocol? Note that when this article was published, they probably would’ve had to contact the authors or the pharmaceutical company.
Welcome to a rich treasure trove
Separate from the civil action that led to these documents and data being released, the federal government later filed criminal charges and false claims act allegations against Forest Laboratories. The pharmaceutical company pleaded guilty and accepted a $313 million fine.
Links to the filing and the announcement from the federal government of a settlement is available in a supplementary blog at Quick Thoughts. That blog post also has rich links to the actual emails accessed by the authors, as well as blog posts by John M Nardo, M.D. that detail the difficulties these authors had publishing the paper we are discussing.
Aside from his popular blog, Dr. Nardo is one of the authors of a reanalysis that was published in The BMJ of a related trial:
Le Noury J, Nardo JM, Healy D, Jureidini J, Raven M, Tufanaru C, Abi-Jaoude E. Restoring Study 329: efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence. BMJ 2015; 351: h4320
My supplementary blog post contains links to discussions of that reanalysis obtained from GlaxoSmithKline, the original publication based on these data, 30 Rapid Responses to the reanalysis The BMJ, as well as federal criminal complaints and the guilty pleading of GlaxoSmithKline.
With Dr. Nardo’s assistance, I’ve assembled a full set of materials that should be valuable in stimulating discussion among senior and junior investigators, as well in student seminars. I agree with Dr. Nardo’s assessment:
I think it’s now our job to insure that all this dedicated work is rewarded with a wide readership, one that helps us move closer to putting this tawdry era behind us… – John Mickey Nardo
The citalopram CIT-MD-18 pediatric depression trial
The original article that we will be discussing is:
Wagner KD, Robb AS, Findling RL, Jin J, Gutierrez MM, Heydorn WE. A randomized, placebo-controlled trial of citalopram for the treatment of major depression in children and adolescents. American Journal of Psychiatry. 2004 Jun 1;161(6):1079-83.
An 8-week, randomized, double-blind, placebo-controlled study compared the safety and efficacy of citalopram with placebo in the treatment of children (ages 7–11) and adolescents (ages 12–17) with major depressive disorder.
The results and conclusion:
Results: The overall mean citalopram dose was approximately 24 mg/day. Mean Children’s Depression Rating Scale—Revised scores decreased significantly more from baseline in the citalopram treatment group than in the placebo treatment group, beginning at week 1 and continuing at every observation point to the end of the study (effect size=2.9). The difference in response rate at week 8 between placebo (24%) and citalopram (36%) also was statistically significant. Citalopram treatment was well tolerated. Rates of discontinuation due to adverse events were comparable in the placebo and citalopram groups (5.9% versus 5.6%, respectively). Rhinitis, nausea, and abdominal pain were the only adverse events to occur with a frequency exceeding 10% in either treatment group.
Conclusions: In this population of children and adolescents, treatment with citalopram reduced depressive symptoms to a significantly greater extent than placebo treatment and was well tolerated.
The article ends with an elaboration of what is said in the abstract:
In conclusion, citalopram treatment significantly improved depressive symptoms compared with placebo within 1 week in this population of children and adolescents. No serious adverse events were reported, and the rate of discontinuation due to adverse events among the citalopram-treated patients was comparable to that of placebo. These findings further support the use of citalopram in children and adolescents suffering from major depression.
The protocol for CIT-MD-I8, IND Number 22,368 was obtained from Forest Laboratories. It was dated September 1, 1999 and amended April 8, 2002.
The primary outcome measure was the change from baseline to week 8 on the Children’s Depression Rating Scale-Revised (CDRS-R) total score.
Comparison between citalopram and placebo will be performed using three-way analysis of covariance (ANCOVA) with age group, treatment group and center as the three factors, and the baseline CDRS-R score as covariate.
The secondary outcome measures were the Clinical Global Impression severity and improvement subscales, Kiddie Schedule for Affective Disorders and Schizophrenia – depression module, and Children’s Global Assessment Scale.
Comparison between citalopram and placebo will be performed using the same approach as for the primary efficacy parameter. Two-way ANOVA will be used for CGI-I, since improvement relative to Baseline is inherent in the score.
There was no formal power analysis but:
The primary efficacy variable is the change from baseline in CDRS-R score at Week 8.
Assuming an effect size (treatment group difference relative to pooled standard deviation) of 0.5, a sample size of 80 patients in each treatment group will provide at least 85% power at an alpha level of 0.05 (two-sided).
Selective reporting of subtle departures from the protocol could easily have been missed or simply excused as accidental and inconsequential, except that there was unrestricted access to communication within Forest Laboratories and to the data for reanalysis.
The fact that Forest controlled the CIT-MD-18 manuscript production allowed for selection of efficacy results to create a favourable impression. The published Wagner et al. article concluded that citalopram produced a significantly greater reduction in depressive symptoms than placebo in this population of children and adolescents . This conclusion was supported by claims that citalopram reduced the mean CDRS-R scores significantly more than placebo beginning at week 1 and at every week thereafter (effect size = 2.9); and that response rates at week 8 were significantly greater for citalopram (36% ) versus placebo (24% ). It was also claimed that there were comparable rates of tolerability and treatment discontinuation for adverse events (citalopram = 5.6% ; placebo = 5.9% ). Our analysis of these data and documents has led us to conclude that these claims were based on a combination of: misleading analysis of the primary outcome and implausible calculation of effect size; introduction of post hoc measures and failure to report negative secondary outcomes; and misleading analysis and reporting of adverse events.
3.2.1 Mischaracterisation of primary outcome
Contrary to the protocol, Forest’s final study report synopsis increased the study sample size by adding eight of nine subjects who, per protocol, should have been excluded because they were inadvertently dispensed unblinded study drug due to a packaging error . The protocol stipulated: “Any patient for whom the blind has been broken will immediately be discontinued from the study and no further efficacy evaluations will be performed” . Appendix Table 6 of the CIT-MD-18 Study Report  showed that Forest had performed a primary outcome calculation excluding these subjects (see our Fig. 2). This per protocol exclusion resulted in a ‘negative’ primary efficacy outcome.
Ultimately however, eight of the excluded subjects were added back into the analysis, turning the (albeit marginally) statistically insignificant outcome (p < 0.052) into a statistically significant outcome (p < 0.038). Despite this change, there was still no clinically meaningful difference in symptom reduction between citalopram and placebo on the mean CDRS-R scores (Fig. 3).
The unblinding error was not reported in the published article.
Forest also failed to follow their protocol stipulated plan for analysis of age-by-treatment interaction. The primary outcome variable was the change in total CDRS-R score at week 8 for the entire citalopram versus placebo group, using a 3-way ANCOVA test of efficacy . Although a significant efficacy value favouring citalopram was produced after including the unblinded subjects in the ANCOVA, this analysis resulted in an age-by-treatment interaction with no significant efficacy demonstrated in children. This important efficacy information was withheld from public scrutiny and was not presented in the published article. Nor did the published article report the power analysis used to determine the sample size, and no adequate description of this analysis was available in either the study protocol or the study report. Moreover, no indication was made in these study documents as to whether Forest originally intended to examine citalopram efficacy in children and adolescent subgroups separately or whether the study was powered to show citalopram efficacy in these subgroups. If so, then it would appear that Forest could not make a claim for efficacy in children (and possibly not even in adolescents). However, if Forest powered the study to make a claim for efficacy in the combined child plus adolescent group, this may have been invalidated as a result of the ANCOVA age-by-treatment interaction and would have shown that citalopram was not effective in children.
A further exaggeration of the effect of citalopram was to report “effect size on the primary outcome measure” of 2.9, which was extraordinary and not consistent with the primary data. This claim was questioned by Martin et al. who criticized the article for miscalculating effect size or using an unconventional calculation, which clouded “communication among investigators and across measures” . The origin of the effect size calculation remained unclear even after Wagner et al. publicly acknowledged an error and stated that “With Cohen’s method, the effect size was 0.32,”  which is more typical of antidepressant trials. Moreover, we note that there was no reference to the calculation of effect size in the study protocol.
3.2.2 Failure to publish negative secondary outcomes, and undeclared inclusion of Post Hoc Outcomes
Wagner et al. failed to publish two of the protocol-specified secondary outcomes, both of which were unfavourable to citalopram. While CGI-S and CGI-I were correctly reported in the published article as negative , (see p1081), the Kiddie Schedule for Affective Disorders and Schizophrenia-Present (depression module) and the Children’s Global Assessment Scale (CGAS) were not reported in either the methods or results sections of the published article.
In our view, the omission of secondary outcomes was no accident. On October 15, 2001, Ms. Prescott wrote: “I’ve heard through the grapevine that not all the data look as great as the primary outcome data. For these reasons (speed and greater control) I think it makes sense to prepare a draft in-house that can then be provided to Karen Wagner (or whomever) for review and comments” (see Fig. 1). Subsequently, Forest’s Dr. Heydorn wrote on April 17, 2002: “The publications committee discussed target journals, and recommended that the paper be submitted to the American Journal of Psychiatry as a Brief Report. The rationale for this was the following: … As a Brief Report, we feel we can avoid mentioning the lack of statistically significant positive effects at week 8 or study termination for secondary endpoints” .
Instead the writers presented post hoc statistically positive results that were not part of the original study protocol or its amendment (visit-by-visit comparison of CDRS-R scores, and ‘Response’, defined as a score of ≤28 on the CDRS-R) as though they were protocol-specified outcomes. For example, ‘Response’ was reported in the results section of the Wagner et al. article between the primary and secondary outcomes, likely predisposing a reader to regard it as more important than the selected secondary measures reported, or even to mistake it for a primary measure.
It is difficult to reconcile what the authors of the original article reported in terms of adverse events and what our “deconstructionists “ found in the unpublished final study report. The deconstruction article also notes that a letter to the editor appearing at the time of publication of the original paper called attention to another citalopram study that remain unpublished, but that was known to be a null study with substantial adverse events.
3.2.3 Mischaracterisation of adverse events
Although Wagner et al. correctly reported that “the rate of discontinuation due to adverse events among citalopram-treated patients was comparable to that of placebo”, the authors failed to mention that the five citalopram-treated subjects discontinuing treatment did so due to one case of hypomania, two of agitation, and one of akathisia. None of these potentially dangerous states of over-arousal occurred with placebo . Furthermore, anxiety occurred in one citalopram patient (and none on placebo) of sufficient severity to temporarily stop the drug and irritability occurred in three citalopram (compared to one placebo). Taken together, these adverse events raise concerns about dangers from the activating effects of citalopram that should have been reported and discussed. Instead Wagner et al. reported “adverse events associated with behavioral activation (such as insomnia or agitation) were not prevalent in this trial”  and claimed that “there were no reports of mania”, without acknowledging the case of hypomania .
Furthermore, examination of the final study report revealed that there were many more gastrointestinal adverse events for citalopram than placebo patients. However, Wagner et al. grouped the adverse event data in a way that in effect masked this possibly clinically significantly gastrointestinal intolerance. Finally, the published article also failed to report that one patient on citalopram developed abnormal liver function tests .
In a letter to the editor of the American Journal of Psychiatry, Mathews et al. also criticized the manner in which Wagner et al. dealt with adverse outcomes in the CIT-MD-18 data, stating that: “given the recent concerns about the risk of suicidal thoughts and behaviors in children treated with SSRIs, this study could have attempted to shed additional light on the subject”  Wagner et al. responded: “At the time the [CIT-MD-18] manuscript was developed, reviewed, and revised, it was not considered necessary to comment further on this topic” . However, concerns about suicidal risk were prevalent before the Wagner et al. article was written and published . In fact, undisclosed in both the published article and Wagner’s letter-to-the-editor, the 2001 negative Lundbeck study had raised concern over heightened suicide risk [10, 20, 21].
A later blog post will discuss the letters to the editor that appeared shortly after the original study in American Journal of Psychiatry. But for now, it would be useful to clarify the status of the negative Lundbeck study at that time.
The letter by Barbe published in AJP remarked:
It is somewhat surprising that the authors do not compare their results with those of another trial, involving 244 adolescents (13–18-year-olds), that showed no evidence of efficacy of citalopram compared to placebo and a higher level of self-harm (16 [12.9%] of 124 versus nine [7.5%] of 120) in the citalopram group compared to the placebo group (5). Although these data were not available to the public until December 2003, one would expect that the authors, some of whom are employed by the company that produces citalopram in the United States and financed the study, had access to this information. It may be considered premature to compare the results of this trial with unpublished data from the results of a study that has not undergone the peer-review process. Once the investigators involved in the European citalopram adolescent depression study publish the results in a peer-reviewed journal, it will be possible to compare their study population, methods, and results with our study with appropriate scientific rigor.
It may be considered premature to compare the results of this trial with unpublished data from the results of a study that has not undergone the peer-review process. Once the investigators involved in the European citalopram adolescent depression study publish the results in a peer-reviewed journal, it will be possible to compare their study population, methods, and results with our study with appropriate scientific rigor.
Conflict of interest
The authors of the deconstruction study indicate they do not have any conventional industry or speaker’s bureau support to declare, but they have had relevant involvement in litigation. Their disclosure includes:
The authors are not members of any industry-sponsored advisory board or speaker’s bureau, and have no financial interest in any pharmaceutical or medical device company.
Drs. Amsterdam and Jureidini were engaged by Baum, Hedlund, Aristei & Goldman as experts in the Celexa and Lexapro Marketing and Sales Practices Litigation. Dr. McHenry was also engaged as a research consultant in the case. Dr. McHenry is a research consultant for Baum, Hedlund, Aristei & Goldman.
I don’t have many illusions about the trustworthiness of the literature reporting clinical trials, whether pharmaceutical or psychotherapy. But I found this deconstruction article quite troubling. Among the authors’ closing observations are:
The research literature on the effectiveness and safety of antidepressants for children and adolescents is relatively small, and therefore vulnerable to distortion by just one or a two badly conducted and/or reported studies. Prescribing rates are high and increasing, so that prescribers who are misinformed by misleading publications risk doing real harm to many children, and wasting valuable health resources.
I recommend readers going to my supplementary blog and reviewing a very similar case of efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence. I also recommend another of my blog posts that summarizes action taken by the US government against both Forest Laboratories and GlaxoSmithKline for promotion of misleading claims about about the efficacy and safety of antidepressants for children and adolescents.
We should scrutinize studies of the efficacy and safety of antidepressants for children and adolescents, because of the weakness of data from relatively small studies with serious difficulties in their methodology and reporting. But we should certainly not stop there. We should critically examine other studies of psychotherapy and psychosocial interventions.
I previously documented [ 1, 2] interference by promoters of the lucrative Triple P Parenting in the implementation of a supposedly independent evaluation of it, including tampering with plans for data analysis. The promoters then followed it up attempting to block publication of a meta-analysis casting doubt on their claims.
But suppose we are not dealing the threat of conflict of interest associated with high financial stakes as an pharmaceutical companies or a globally promoted psychosocial program. There are still the less clear conflicts associated with investigator egos and the pressures to produce positive results in order to get refunded. We should require scrutiny of protocols, whether they were faithfully implemented, with the resulting data analyzed according to a priori plans. To do that, we need unrestricted access to data and the opportunity to reanalyze it from multiple perspectives.
Results of clinical trials should be examined wherever possible in replications and extensions in new settings. But this frequently requires resources that are unlikely to be available
We are unlikely ever to see anything for clinical trials resembling the replication initiatives such as the Open Science Collaboration’s (OSC) Replication Project: Psychology. The OSC depends on mass replications involving either samples of college students or recruitment from the Internet. Most of the studies involved in the OSC did not have direct clinical or public health implications. In contrast, clinical trials usually do and require different approaches to insure the trustworthiness of findings that are claimed.
Access to the internal documents of Forest Laboratories revealed a deliberate, concerted effort to produce results consistent with the agenda of vested interests, even where prespecified analyses yielded contradictory findings. There was clear intent. But we don’t need to assume an attempt to deceive and defraud in order to insist on the opportunity to re-examine findings that affect patients and public health. As US Vice President Joseph Biden recently declared, securing advances in biomedicine and public health depends on broad and routine sharing and re-analysis of data.
My usual disclaimer: All views that I express are my own and do not necessarily reflect those of PLOS or other institutional affiliations.