Note: BMC Medicine subsequently invited a submission based on this blog post.
Coyne, J. C., & Kwakkenbos, L. (2013). Triple P-Positive Parenting programs: the folly of basing social policy on underpowered flawed studies. BMC Medicine, 11(1), 11.
It is now available here:
Promoters of Triple P parenting enjoy opportunities that developers and marketers of other “evidence-supported” psychosocial interventions and psychotherapies only dream of. With a previously uncontested designation as strongly supported by evidence, Triple P is being rolled out by municipalities, governmental agencies, charities, and community-based programs worldwide. These efforts generate lots of cash from royalties and license fees, training, workshops, and training materials, in addition to the prestige of being able to claim that an intervention has navigated the treacherous path from RCT to implementation in the community.
With hundreds of articles extolling its virtues, dozens of randomized trials, and consistently positive systematic reviews, the status of the Triple P parenting intervention as evidence supported would seem beyond being unsettled by yet another review. Some of the RCTs are quite small, but there are public health level interventions, including one involving 7000 children from child protective services. Could this be an instance in which it should be declared “no further research necessary”? Granting agencies have decided not to fund further evaluation of interventions on the basis of a much smaller volume of seemingly less unanimous data.
But the weaknesses revealed in a recent systematic review and meta-analysis of the Triple P by Philip Wilson and his Scottish colleagues show how apparently strong evidence can evaporate when it is given a closer look. Other apparently secure “evidence supported” treatments undoubtedly share these weaknesses and the review provides a model of where to look. But when I took careful look, I discovered that Wilson and colleagues glossed over a very important weakness in the body of evidence for Triple P. They noted it, but didn’t dwell on it. So, weakness in the body of evidence for Triple P is much greater than a reader might conclude from Wilson and colleagues’ review.
WARNING! Spoiler Ahead. At this point, readers might want to download the article and form their own impressions, before reading on and discovering what I found. If so, they can click on this link and access the freely available, open access article.
Wikipedia describes Triple P as
a multilevel parenting intervention with the main goal of increasing the knowledge, skills, and confidence of parents at the population level and, as a result, reduce the prevalence of mental health, emotional, and behavioral problems in children and adolescents. The program is a universal preventive intervention (all members of the given population participate) with selective interventions specifically tailored for at risk children and parents.
A Triple P website for parents advertises
the international award winning Triple P – Positive Parenting Program®, backed by over 25 years of clinically proven, world wide research, has the answers to your parenting questions and needs. How do we know? Because we’ve listened to and worked with thousands of parents and professionals across the world. We have the knowledge and evidence to prove that Triple P works for many different families, in many different circumstances, with many different problems, in many different places!
The Triple P website for practitioners declares
As an individual practitioner or a practitioner working within an organisation you need to be sure that the programs you implement, the consultations you provide, the courses you undertake and the resources you buy actually work.
Triple P is one of the only evidence-based parenting programs available worldwide, founded on over 30 years of clinical and empirical research.
Disappearing positive evidence
In taking stock of Triple P, Wilson and colleagues applied objective criteria in a way that readily allows independent evaluation of their results.
They identified 33 eligible studies, almost all of them positive in indicating that Triple P has positive effects on child adjustment.
- Of the 33 studies, most involving media-recruited families so that participants in the trials were self-selected and more motivated than if they are clients referred from community services or involuntarily getting treatment mandated by child protection agencies.
- 31/ 33 studies compared Triple P interventions with waiting list or no-treatment comparison groups. This suggests that Triple P may be better than doing nothing with these self-referred families, but doesn’t control for simply providing attention, support, and feedback. The better outcomes for families getting Triple P versus getting than wait list or no treatment may reflect families assigned to these control conditions registering the disappointment with not getting what they had sought in answering the media ads.
- In contrast, the two studies involving an active control group showed no differences between groups.
- The trials evaluating Triple P typically administered a battery of potential outcomes, and there is no evidence for any trials that particular measures were chosen ahead of time as the primary outcomes. There was considerable inconsistency among studies using the same instruments in decisions about which subscales were reported and emphasized. Not declaring outcomes ahead of time provides a strong temptation for selective reporting of outcomes. Investigators analyze the data, decide what measures puts Triple P in the most favorable light, and declare post hoc those outcomes as primary.
- Selective reporting of outcomes occurred in the the abstracts of these studies. Only 4/33 abstracts report any negative findings and 32/33 abstracts were judged to give a more favorable picture of the effects of Triple P.
- Most papers only reported maternal assessments of child behavior and the small number of studies that obtained assessments from fathers did not find positive treatment effects from the father’s perspective. This may simply indicate the detachment and obliviousness of the fathers, but can also point to a bias in the reports of mothers who had made more of an investment in getting treatment.
- Comparisons of intervention and control groups beyond the duration of the intervention were only possible in five studies. So, positive results may be short-lived.
- Of the three trials that tested population level effects of Triple P, two were not randomized trials, but had quasi-experimental designs with significant intervention and control group differences at baseline. A third trial reported a reduction in child maltreatment, but examination of results indicate that this was due to an unexplained increased in child maltreatment in the control area, not a decrease in the intervention area.
- Thirty-two of the 33 eligible studies were authored by Triple-P affiliated personnel, but only two had a conflict of interest statement. Not only is there strong possibility of investigator allegiance exerting an effect on the reported outcome of trials, there are undeclared conflicts of interest.
The dominance of small, underpowered for quality studies
Wilson and colleagues noted a number of times in their review that many of the trials are small, but they do not dwell on how many, how small, or with what implications. My colleagues have adopted the lower limit of 35 participants in the smallest group for inclusion of trials in meta-analyses. The rationale is that any trial that is smaller than this does not have a 50% probability of detecting a moderate sized effect, even if it is present. Small trials are subject to publication bias in that if results are not claimed to be statistically significant, they will not to get published because the trial was insufficiently powered to obtain a significant effect. On the other hand, when significant results are obtained, they are greeted with great enthusiasm precisely because the trials are so small. Small trials, when combined with flexible rules for deciding when to stop a trial (often based on a peek at the data), failure to specify primary outcomes ahead of time, and flexible rules for analyses, can usually be made to appear to yield positive findings, but that will not be replicated. Small studies are vulnerable to outliers and sampling error and randomization does not necessarily equalize group differences they can prove crucial in determining results. Combining published small trials in a meta-analysis does not address these problems, because of publication bias and because of all or many of the trials sharing methodological problems.
What happens when we apply the exclusion criterion to Triple P trials of <35 participants in the smallest group? Looking at table 2 in Wilson and colleagues’ review, we see that 20/23 of the individual papers included in the meta-analyses are excluded. Many of the trials quite small, with eight trials having less than 20 participants (9 -18) in the smallest group. Such trials should be statistically quite unlikely to detect even a moderate sized effect, and that so many nonetheless get significant findings attests to a publication bias. Think of it: with such small cell sizes, arbitrary addition or subtraction of a single participant can alter results. Figure 2 in the review provides the forest plot of effect sizes for two of the key outcome measures reported in Triple P trials. Small trials account for the outlier strongest finding, but also the weakest finding, underscoring sampling error. Meta-analyses attempt to control for the influence of small trials by introducing weights, but this strategy fails when the bulk of the trials are small. Again examining figure 2, we see that even with the weights, small trials still add up to over 83% of the contribution to the overall effect size. Of the three trials that are not underpowered, two have nonsignificant effects entered into the meta-analysis. The confidence intervals for the one moderate size trial that is positive barely excludes zero (.06).
Wilson and colleagues pointed to serious deficiencies in the body of evidence supporting the efficacy of Triple P parenting programs, but once we exclude underpowered trials, there is little evidence left.
Are Triple P parenting programs ready for widespread dissemination and implementation?
Rollouts of the kind that Triple P is now undergoing are expensive and consume resources that will not be available for alternatives. Yet, critical examination of the available evidence suggests little basis for assuming that Triple P parenting programs will have benefits commensurate with their cost.
In contrast to the self-referring families stayed in randomized trials, the families in the community are likely to be more socially disadvantaged, often single parent, and often coming to treatment only because of pressure and even mandated attendance. Convenience samples of self-referred participants are acceptable in the early stages of evaluation of an intervention, but ultimately the most compelling evidence must come from participants more representative of the population who will be treated in the community.
Would other evidence supported interventions survive this kind of scrutiny?
Triple P parenting interventions have the apparent support of a large literature that is unmatched in size by most treatments claiming to be evidence supported. In a number of articles and blog posts, I have shown that other treatments claimed to be evidence supported often have only weak evidence. Similar to Triple P, other treatments are largely evaluated by investigators who have vested financial and professional interests in demonstrating their efficacy, in studies that are underpowered, and with a high risk of bias, notably in the failure to specify which of many outcomes that are assessed are primary. Similar to Triple P, psychotherapies routinely get labeled as having strong evidence based solely on studies that involve comparisons with no treatment or waitlist controls. Effect sizes exaggerate the advantage over these therapies over patient simply getting nonspecific, structured opportunities for attention, support, and feedback under conditions of positive expectations. And, finally, similar to what Wilson and colleagues found for Triple P, there often large gaps between the way findings are depicted in abstracts for reports of RCTs and what can be learned from the results sections of the actual articles.
In a recent blog post, I also showed that American Psychological Association Division 12 Clinical Psychology had designated Acceptance and Commitment Therapy (ACT) as having strong evidence for efficacy n hospitalized psychotic patients, only to have that designation removed when I demonstrated that the basis for this judgment was two null flawed and small trials. Was that shocking or even surprising? Stay tuned.
In coming blog posts, I will demonstrate problems with claims of other treatments being evidence-based, but hopefully this blog provides readers with tools to investigate for themselves.