Part one: Psychoanalytic psychotherapy uber alles?
One might conclude from a quick internet search that the benefits of long-term psychoanalytic or psychodynamic psychotherapy (LTPP) versus shorter treatments has already been shown: LTPP has a superiority that justifies the greater commitment of time and money. A meta-analysis in the Journal of the American Medical Association (JAMA) makes that claim [PDF available here.]. It is accompanied by a glowing editorial. The authors of the JAMA paper, Leichsenring and Rabung, have also repeated their claims in redundant articles published in other the journals. And their claims have been echoed enthusiastically by proponents of LTPP around the world
Read on and you’ll
- Romp with me through the many problems that apparently slipped by unnoticed in the publication of a meta-analysis in a high impact journal.
- Have your confidence shaken that publication in one of the highest impact medical journals is any reassurance of the trustworthiness of evidence and freedom from bias.
- Get tips how to spot bad meta analyses being shaped for marketing and propaganda purposes.
- Find a brief distraction in Bambi meets Godzilla, the animated 91-second video that has become a metaphor for confrontations between the practice of psychotherapy and the demand for evidence of cost-effectiveness.
- Be left grappling with the broader issue of just how much politics and personal connections determine whether manuscripts get published in high impact journals and with accompanying editorials.
The JAMA meta analysis drew on 11 RCTs and 12 non-RCT naturalistic, observational case series studies of long-term psychodynamic psychotherapy. The authors concluded that LTPP is superior to shorter-term psychotherapies. For their assessment to be unseated, they claim, there would have to be 921 negative studies being left unpublished.
With regard to overall effectiveness, a between-group effect size of 1.8 (95% confidence interval [CI], 0.7-3.4) indicated that after treatment with LTPP patients with complex mental disorders on average were better off than 96% of the patients in the comparison groups (P=.002).
The accompanying editorial entitled “Psychodynamic psychotherapy and research evidence: Bambi survives Godzilla?” praised the study’s
thorough search methods, including the requirement for reliable outcome measures, and their careful assessments of heterogeneity and lack of evidence for publication bias are strengths of the study.
Does this new meta-analysis mean that LTPP has survived the Godzilla of the demand for empirical demonstration of its efficacy? The answer is a qualified yes. The meta-analysis was carefully performed and yielded a within-group effect size of 0.96 (95% confidence interval [CI], 0.87-1.05) for pretreatment-postreatment overall outcomes, which would be considered a large effect.
The title of the editorial refers to a classic 1982 article by Morris Parloff, former head of the NIMH Psychotherapy and Behavioral Intervention Section in the Clinical Research Branch (1972-1980). Parloff correctly anticipated that if third party reimbursement were sought for psychotherapy, policy makers would demand evidence that particular therapies worked for particular problems. The encounter between the existing evidence and policy makers was “Bambi meets Godzilla.”
Parloff’s title was in turn inspired by an animation classic of the same name. For a quick digression, you can watch the complete 91-second cartoon here.
The JAMA meta-analysis claimed that LTPP was superior to shorter psychotherapies by one of the largest margins probably ever seen in the comparative psychotherapy literature. If you are new to the field or simply unfamiliar with the relevant literature, you may be reluctant to dispute conclusions that appeared in a prestigious, high impact journal (JIF=30) and, not only that, were endorsed by the editor of JAMA.
Peter Kramer, author of Listening to Prozac, blogged about this meta analysis and suggested readers should be swayed by this sheer authority:
[R]esults are suggestive, and they are the best we have. And then, there’s the brand label: JAMA. The study passed a rigorous peer review. There is no comparably prestigious study denying that long-term therapy is the treatment of choice.
You truly disappoint me, Peter. I hope you agree by the end of my longread analysis and critique that your judgment was, uh, hasty.
We don’t have to defer to the authority of publication in JAMA and should not. We can decide for our own, making use of objective standards developed and validated by people who have no dog in this fight.
Intense criticism of these claims.
…As could be expected, letters to the editor poured into JAMA concerning the meta-analysis.
JAMA has limits on their number and length of letters that will be published in response to an article. These restrictions prompted some authors to organize and take their more lengthy critiques elsewhere. I was honored to join a group comprised of Aaron T Beck, Brett Thombs, Sunil Bhar, Monica Pignotti, Marielle Bassel, and Lisa Jewett.
You can find our extended critique here. I also recommend a splendid, incisive critique by Julia Littell and Aron Shlonsky that applied the validated AMSTAR checklist for evaluating meta analyses. The JAMA article plus their brief one makes a great reading assignment for teaching purposes.
If you are only interested only in a summary evaluation of the JAMA meta-analysis and are willing to trust my authoritative opinion (No, distrust all authority when it comes to evaluating evidence!), here it is–
- The meta-analysis was undertaken and reported in flagrant violation of the usual rules for conducting and reporting a meta-analysis.
- There were arbitrary decisions made about which studies to include and what constituted a control condition.
- Different rules were involved in selecting studies of LTPP versus comparison psychotherapies, giving an advantage to the LTPP.
- Results from randomized controlled trials were integrated with results from poor quality naturalistic, observational studies in unconventional ways that strongly favored LTPP.
- For some analyses, effect sizes from LTPP studies were calculated in an unconventional manner. Any benefits of these LTPP conditions being evaluated as part of a randomized trial were eliminated. This maneuver seriously inflated the estimates of effect sizes in favor of LTPP. Results are not comparable to what would be obtained with more conventional methods.
- Calculation of some effect sizes involved further inexplicable voodoo statistics, so that a set of studies in which no effect size was greater than 1.4 produced an overall effect size of 6.9. Duh!
- In the end, a bizarre meta-analysis compared 1053 patients assigned to LLTP to 257 patients assigned to a control condition, only 36 of whom were receiving an evidence based therapy for their condition. Yet, sweeping generalizations were made for advantages that should be expected for LLP patients over those receiving any shorter psychotherapies.
- The effect size comparing LTPP to control conditions would not generalize to any credible psychotherapy, much less one that was evidence-based.
Please read on if you would like a more in-depth analysis. I hope you will. I want to thoroughly disturb your sense that you can trust an article you find in high impact journals simply because it apparently survived peer review there. And aside from encouraging a healthy skepticism, I will leave you with tips for what to look for in other articles.
A flagrant violation of established rules for conducting and reporting meta analyses.
Meta-analyses are supposed to be conducted in an orderly, pre-specified, transparent, and readily replicable fashion, much like a randomized trial. There are established standards for reporting meta-analyses, such as PRISMA for evaluating health care interventions and MOOSE for observational studies as well as AMSTAR, a brief checklist for evaluating the adequacy of the conduct of a meta-analysis.
Increasingly, journals also require published preregistration of plans for meta-analyses, much like preregistration of the design of a randomized trial. My experience has been that JAMA encourages submission of plans for meta-analyses for preapproval. My colleagues and I have always submitted our plans. There is no indication that this was done in the case of this meta-analysis of LTPP.
Judged by the usual standards, this meta-analysis was seriously deficient.
Leichsenring and Rabung formulated an unconventionally broad research question. Systematic review objectives typically define the patient population, intervention, control treatment, outcomes, and study designs of interest. These authors defined the patient population broadly as a group of adults with mental disorders. The sole criterion that outcomes should have fulfilled is that they were reliable. Control treatments were not defined at all. Thus, analyzed diagnoses, outcomes, and control groups showed a very large clinical heterogeneity. Although the reviewers tried to account for this through subgroup analyses, their method of building clusters of heterogeneous disorders and outcomes still allowed for considerable variation. Thus, it is unclear for what disorder, for what outcome variables, and in comparison with which control groups the evidence was shown.
Trick question: how many RCTs were entered into the meta-analysis?
Probably 90% of the casual readers of this article will give the wrong answer of 11, rather than the correct answer of 8. A number of commentators on this article got this question wrong. Understandably.
The abstract clearly says 11 RCTs, and that statement is repeated in the text and figures. To find out what was actually done you have to pay attention to the superscripts in Table 1 and read carefully the last paragraph on page 1554. You can eventually figure out that for the largest RCT, Knekt et al, 2008, the control conditions were dropped. The next largest RCT, Vinnars et al, 2005 was a comparison between two LTPP’s and these were dealt with as if they came from separate trials. The same was true for the Hogeland et al study, which compared LTPP with and without allowing the psychoanalyst to make transference interpretations. Because there was no difference in treatment efficacy, conventionally calculated effect sizes for this trial would be quite small.
Finally, the control group of Huber and Klug was dropped. So, Leichsenring and Rabung started with 12 comparisons in 11 RCTs, but extracted and integrated data from 4 of these groups in ways that did not preserve the benefits of their having come from an RCT.
Leichsenring and Rabung eliminated the largest comparisons in RCTs and were left with 8 pitifully small, underresourced trials, each of which had less than a 50% chance of detecting a moderate sized effect, even if it were there. Yet, these trials obtained results at a statistically improbable rate. There is clearly wild publication bias going on here.
The table below summarizes the eight surviving trials.
Leichsenring and Rabung were interested in making sweeping statements about the value of LTPP over shorter treatments, but these 8 RCTs are all they were left to work with. Perhaps the best solution would be to simply declare a “failed meta-analysis.” That’s not such a bad thing. It simply an acknowledgment that the literature does not yet have sufficient high quality studies with large enough samples to draw conclusions.
Instead, Leichsenring and Rabung engaged in all sorts of mischief that is quite misleading to the unwary reader. They started by salvaging the LTPP groups from the excluded RCTs and quantifying effect sizes in ways that would confuse readers expecting something more conventional. They then threw in the uncontrolled case series studies which represent a much lower quality of evidence than an RCT that had been preserved intact.
Stacking the deck in favor of LTPP
The overall methodological quality of the LTPP studies that were included in the meta-analysis was quite poor, particularly the naturalistic studies that just involved collecting uncontrolled case series of patients. Investigators in these studies were free to selectively add or drop cases without reporting what they had done, or to keep the case series going, adding more cases or extending the length of treatment or follow-up as needed to obtain an appearance of an overall positive effect.
One study lost 45% of the patients originally recruited to follow-up. Results do not even generalize back to the patients who entered the study. Results from the many studies that were biased by substantial loss to follow-up were simply combined with fewer studies in which all patients who had been recruited were retained for analysis of outcome.
There were no such naturalistic observational studies of psychotherapies other than LTPP included for comparison, so only the LTPP had the benefits of exaggerated bad data. As we will see, given the odd way in which effect sizes were calculated, the lack of such studies represented a serious bias in favor of LTPP.
Randomized trials in which LTPP were provided for less than a year were excluded from the meta-analysis. Among other studies, the 10 month trial of psychedelic psychotherapy versus cognitive behavior therapy for anorexia would be excluded because the therapy did not go a year. That is unfortunate, because this particular study is of much better methodological quality than any of the others included in the meta-analysis.
There are no evidence-based criteria for setting duration of psychotherapy of at least a year ahead of time. Why did these authors settle on this requirement for inclusion of studies? I strongly suspect that the authors were being responsive to the need for evidence to justify a full year of insurance coverage for psychoanalytic psychotherapy.
You can probably get psychoanalysts to agree to keep patients in treatment for over a year, but many other therapists and their patients would object to committing ahead of time to such a length of treatment. There are of course many other randomized trials of psychotherapies out there, but most do not involve providing a year of treatment. One could ask the very reasonable question ‘Do these trials of shorter treatments provide comparable or better outcomes than a year of LTPP?’ but apparently these authors were not really interested in that question.
There were some seemingly arbitrary reclassifications of studies as being LTPP.
One study supportive of LTPP had previously been classified in a meta-analysis by Leichsenring and Rabung as involving short-term psychodynamic psychotherapy (STPP), but was as reclassified in the present meta-analysis as involving long-term psychodynamic psychotherapy.
And arbitrary exclusion of relevant studies. Conventional standards for conducting and reporting meta-analyses suggest providing list of excluded studies, but that was missing.
the exclusion of the study by Giesen-Bloo et al that favored schema-focused therapy over LTPP appears arbitrary. The original article defined patients in treatment for three years as completers and presented effect sizes on that basis.
Compared to what?
Overall, control conditions included in the meta analysis did not adequately represent shorter-term therapies and blurred the distinction between these therapies and no treatment at all. What went on in these control conditions cannot be generalized to what would go on in credible psychotherapies.
the designation of ‘shorter-term methods of psychotherapy,’ included five treatments that did not constitute formal psychotherapy as it is generally understood. These treatments consisted of a waitlist control condition, nutritional counseling, standard psychiatric care, low contact routine treatment and treatment as usual in the community.
In only two studies, LTPP was compared to an empirically supported treatment, as defined by Chambless and Hollon  that is, DBT for borderline personality disorder , and family therapy for anorexia nervosa . In two other studies, LTPP was compared to cognitive therapy (CT) and short term psychodynamic psychotherapy (STPP), which are established as efficacious for some disorders, but not yet validated for the disorder being treated (i.e., cluster C personality disorders, “neurosis”). In a fifth study , LTPP was compared to “cognitive orientation therapy”, an unvalidated treatment. In these original studies, statistical superiority of LTPP over control conditions was found only when control conditions involved either no psychotherapy, or an unvalidated treatment. Studies that compared LTPP to an empirically supported (e.g., DBT, family therapy) or established treatment (e.g., STPP, CT) found that LTPP was equally or less effective than these treatments despite a substantially longer treatment period.
So, Leichsenring and Rabung kept the 1053 LTPP patients in their analyses, but by a complex process of elimination reduced the number of comparison patients to 257. Of these 257 comparison patients only 36 patients were receiving treatment that was evidence based for their condition: 17 receiving dialectical behavior therapy for borderline personality and 19 receiving a family therapy validated for anorexia.
Comparison/control patients came from a variety of conditions, including no formal treatment. Aggregate estimates of outcomes would not apply to patients assigned to any of these particular conditions. Leichsenring and Rabung cannot generalize beyond the odd lot of patients they assembled, but doing so was their intention. Their efforts could only serve to put an illusory glow on LTPP.
A mixed bag of patients
The abstract stated that patients had “complex disorders,” but the term was never defined and was inconsistently applied. It is difficult to see how it applies to patients in one study having “typically presented outpatient complaints concerning anxiety, depression, low self-esteem, and difficulties with interpersonal relationships” (p. 269). The judgment that patients required a year of treatment seems, again, theoretically, not empirically driven.
Across the eight studies, LTPP was compared to other interventions for a total of nine types of mental health problems, including “neurosis” , “self-defeating personality disorder”  and anorexia nervosa [5,7] (Table 3). This is akin to asking whether one type of medication is superior to another for all types of physical illnesses .
Unconventional calculation of effect sizes
[This section is going to be a bit technical, but worth the read for those who are interested in acquiring some background on this important topic.]
As I have spelled out in an earlier blog post, psychotherapies do not have effect sizes, but comparisons do. Randomized trials facilitate comparisons between a psychotherapy and comparison/control conditions. When you calculate a conventional between-group effect size, it takes advantage of randomization and controls for background factors, like placebo or nonspecific effects. So, you focus on what change went on in a particular therapy, relative to what occurred in patients who didn’t receive it.
In another past blog post, I discussed my colleagues and my comparison of psychotherapies to pill placebo conditions. The between-group effect sizes took into account the difference between the change that went on in psychotherapy and pill placebo conditions, not just the change that went on in psychotherapy. We wanted an estimate of the effects of psychotherapy, above and beyond any benefits of the support and positive expectations that went with being in a clinical trial.Of course, the effect size that we observed were lowered from what we would’ve seen in a comparison between psychotherapy and no treatment.
That is not what Leichsenring and Rabung did. They calculated within-group effect sizes for LTPP that ignored but what went on in the comparison/control group and the rest of the trial. Any nonspecific effects , gets attributed to LTPP, including the substantial improvement over the time that would naturally occur without treatment. These effect sizes were then integrated with calculations from naturalistic, case series studies in which there was no control over patients lost or simply left out of the case series. Confused yet? Again, there were no such naturalistic, case series studies included from other comparison/control therapies. So the advantage was entirely with LTPP. If LTPP did look not better under these odd circumstances, could it ever?
In my last blog post, I reviewed a recent Lancet article reporting a RCT comparing cognitive behavioral therapy to focused psychotherapy for anorexia. Neither therapy did particularly well in increasing patient’s weight, either in absolute terms or in comparison to an enhanced routine care. And the article reported within-group and between–group effect sizes, allowing a striking demonstration of how different they are. The within-group effect size for weight gain for the focal psychodynamic therapy was a seemingly impressive 1.6, p < .001. But the more appropriate between-group effect size for comparing focal psychodynamic therapy to treatment as usual was a wimpy, nonsignificant .13, p< .48 (!).
[Now, we are going to get really technical. Skim or jump down the next section if you do not want to deal with this.]
There are some bizarre calculations by Leichsenring and Rabung that are difficult to explain or replicate, but these calculation gave a clear bias to LTPP. My guess with my colleagues was that:
Leichsenring and Rabung apparently used a conversion formula intended for conversions of between-group point biserial correlations to standardized difference effect sizes in an attempt to convert their correlations of group and within-group pre-post effect sizes into deviation-based effect sizes. As a result, even though none of the eight studies reported an overall standardized mean difference > 1.45 [2, see figure 2 on p. 1558], the authors reported a combined effect size of 1.8. Similarly, these methods generated an implausible between-group effect size of 6.9, equivalent to 93% of variance explained, for personality functioning based on 4 studies [3,5,6,26], none of which reported an effect size > approximately 2.
In order to figure out what had been done, my colleagues and I generated 10 hypothetical studies in which
the pre-post effect size for the treatment group was 0.01 larger than the effect size for the control group. In the tenth study, the effect sizes were equal. Despite negligible differences in pre-post treatment effects, the method employed by Leichsenring and Rabung  generates a correlation between pre-post effect size and group of 0.996 and an unreasonably large deviation-based effect size of 21.2. Thus, rather than realistic estimates of the comparative effects of LTPP, Leichsenring and Rabung based their meta-analysis on grossly incorrect calculations.
It is simply mind blowing that the editor and reviewers at JAMA let these numbers get by. The numbers are so provocative that they should have invited skepticism.
Almost 1000 new studies needed to reverse the claims of this meta-analysis?
One of the many outrageous claims made in the meta-analysis was that the number of nonsignificant, unpublished, or missing studies needed to move the meta-analysis to nonsignificance was “921 for overall outcome, 535 for target problems, 623 for general symptoms, and 358 for social functioning.”
What? 921 studies? That is more than the number of control patients included in the meta analysis! The claim is a testimony to how badly distorted this meta-analysis has become.
Leichsenring and Rabung were attempting to bolster their claims using Rosenthal’s failsafe N, which, among meta analysis methodologists is considered inaccurate and misleading. The Cochrane Collaboration recommends against its use. Mortiz Heene does an excellent job explaining what is wrong with failsafe N. He notes that among the many problems of relying on failsafe N as a check on bias are:
- Estimates are not influenced by evidence of bias in the available data.
- Heterogeneity among the studies that are available and those that might be lurking in desk drawers is ignored.
- Choice of zero for the average effect of the unpublished studies is arbitrary, almost certainly biased.
- Allowing for unpublished negative studies substantially reduces failsafe N.
The results of the trial described in my recent blog post comparing psychoanalysis to CBT for anorexia certainly contradicts the last assumption of there being no negative trials that were missed.
I know, many meta-analyses of psychological interventions bring in the failsafe N to bolster claims that there are estimates of effect sizes are so robust that we cannot expect any negative studies lurking somewhere to change the overall results. Despite this being a common practice in psychology, failsafe N is uniformly rejected in other fields, and notably clinical epidemiology, as providing an inflated, unrealistic estimate of the robustness of findings.
Coming up in my next blog:
- Leichsenring and Rabung respond to critics, dodging basic criticisms and condemning that those who reject their claims are bringing in biases of their own.
- Leichsenring and Rabung renew their claims in another meta-analysis in British Journal of Psychiatry for which 10 of the 11 studies were already included in the JAMA meta-analysis and.
- The long term psychodynamic/psychoanalytic community responds approvingly and and echo Leichsenring and Rabung’s assessment of skeptics.
- The important question of whether long-term psychoanalytic psychotherapy is better than shorter term therapies gets an independent evaluation by another group, which included the world-class meta-analyst and systematic reviewer, John Ioannidis.