Interpretive jiggery-pokery in The Lancet
A tale of a convenience sample with inconvenient serious limitations.
I would have dismissed this study in a brief screen, except that it appeared in The Lancet.
Roberts E, Wessely S, Chalder T, Chang CK, Hotopf M. Mortality of people with chronic fatigue syndrome: a retrospective cohort study in England and Wales from the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Clinical Record Interactive Search (CRIS) Register. The Lancet. 2016 Feb 10.
The study has too small a sample to warrant a manuscript submitted to a peer-reviewed journal. If you keep looking at it, its problems get compounded.
I recommend downloading this open access article and following me as I explained its flaws.
What, too small a sample? With 2147 patients, the study is probably the largest ever of mortality among patients with chronic fatigue syndrome. But the adequacy of sample size is determined not by the total number of participants, but the number having particular events, in this case, the relatively rare events of mortality and suicide. In the seven-year follow-up, there were 17 instances of all-cause mortality, 11 among women and six among men. There were five suicides, three among women and two among men.
In order to assemble a sample of 2147 patients from an existing data set, the authors adopted relaxed diagnostic criteria:
We adopted the most inclusive criteria, and thus included all patients with a clinical diagnosis of chronic fatigue syndrome. A subsample of 755 patients had full diagnostic criteria applied prospectively of which 65% met Oxford criteria, 58% the 1994 case definition criteria, and 88% NICE criteria. All patients in this sample met at least one criterion.
Getting loose about diagnosis allowed the authors to assemble the largest possible sample, but the strategy created a host of other problems. They created a mixed (heterogeneous) sample of patients. Any generalization about the full sample might not apply to patients meeting a particular criterion. Allowing entry into the sample based on multiple criteria risks that the overall patient sample had considerable dissimilarities, and they might be similar in ways that the authors wouldn’t have wanted, i.e., they shared confounding variables. Note that two thirds met the Oxford criteria, which are discredited in much of the Western world. These criteria allow patients to be included with psychiatric comorbidity that would be excluded by other criteria for chronic fatigue syndrome that were used in the study.
When it comes to talking about suicide, psychiatric disorder, including major depression, is a robust major predictor and a major confound when examining other predictors, even if most patients with major depression do not commit suicide.
But here’s the rub: some patients with major depression were entered into the study because they met the Oxford criteria, whereas other potential patients were excluded from the study because the criteria applied to them did not allow psychiatric comorbidity. Hmm, we have a developing mess on our hands.
Now let’s consider how the small number of events to be explained, suicides, compounded the problem. Basically, we are sprinkling three women and two men across these different patient groups. Pure chance is at play, but if the authors misapply sophisticated statistics, they will capitalize on this chance.
Descriptive-observational samples like this one pose challenges to epidemiologists in confidently interpreting any associations that are identified. With a larger sample – i.e., larger number of suicides to explain – the authors might have used multivariate analyses with statistical control of possible confounds. For instance, it might be tempting to attempt to control for major depression. But that would involve inferences about what is going on among subgroupings of three women and two men with no possibility of deciding what is due to chance, i.e. spurious.
Ignoring these obvious problems, the authors statistically controlled for age and sex. The authors didn’t have to control for race/ethnicity, because all of the participants in the study who died were whites. But I would make too much of race, given the small number of deaths and the small number of suicides. The authors also tried to break down (stratify) the sample according to whether participants who died had major depression and where they ranked on a measure of social deprivation. But, here again, we getting into lala land.
But the study gets worse with a closer look.
Compared to what? The authors wanted to make statements about relative mortality and suicides among patients with chronic fatigue syndrome. To do that, they did something very expedient, they created a ratio of deaths in this sample to deaths in England and Wales in 2011.
The denominator was the expected number of deaths, estimated by 5-year age bands, and sex-specific mortality rates for the England and Wales population in 2011 multiplied by the weighting of average person-years in the at-risk period experienced by chronic fatigue syndrome patients in each age and sex category.
The authors want to examine whether there is an excess of deaths and suicides among patients with chronic fatigue syndrome versus the general population. The obvious problem is that these patients may differ in other ways from the general population besides in chronic fatigue syndrome.
Let’s look at where these patients were recruited.
We investigated a retrospective cohort consisting of people diagnosed with chronic fatigue syndrome, using data from the national research and treatment service for chronic fatigue at the South London and Maudsley NHS Foundation Trust (SLaM) and King’s College London Hospital (KCH).
These are specialty settings and patients with chronic fatigue syndrome recruited from them may not be representative of the larger population of patients in the UK. In the discussion section, the authors concede this serious problem:
Because the referral pathway for this centre includes a full assessment including a psychiatric evaluation, an argument could be made that cases referred to the joint SLaM and KCH service may not be representative of chronic fatigue syndrome cases seen in secondary and tertiary care, and may include a referral bias, favouring patients with more severe chronic fatigue syndrome, psychiatric comorbidity, and higher socioeconomic status.
Actually, authors, I don’t see how anyone could argue against their being a strong referral bias in the sample, such that is unrepresentative of patients being seen in other settings.
Not surprising, with so few events to explain, the authors found no differences in all-cause of cancer specific deaths among patients with chronic fatigue syndrome. But they claimed to a found:
There was a significant increase in suicide-specific mortality (SMR 6·85, 95% CI 2·22–15·98; p=0·002).
Bingo! The article is saved from a predictable, all null findings by misapplication of multivariate statistics. If you believe the authors, patients with chronic fatigue syndrome over six times more likely to die by suicide, although the confidence interval stretch from 2.2 times to 16 times.
Then the qualification:
Although the suicide-specific SMR is raised compared with the general population, it is lower than for psychiatric disorders including affective disorders, personality disorders, and alcohol dependence reported in other population-based studies.
But to what population can these findings be generalized? Certainly not to patients with chronic fatigue syndrome drawn from low settings in the UK. Not to the United States, where the Oxford criteria are considered the least valid, in part because of the confounding with psychiatric disorder. The Oxford criteria even allow for psychiatric disorder to be the primary diagnosis, with chronic fatigue a secondary diagnosis.
The conclusion that the authors draw.
Although completed suicide was a rare event, the findings strengthen the case for robust psychiatric assessment by mental health professionals when managing individuals with chronic fatigue syndrome.
Ah yes, the article provides more evidence that mental health professionals should be overseeing the management of chronic fatigue syndrome and gives them new tasks for screening that might prevent the infrequent event of patient suicide. But the base rates found in this study don’t warrant formal screening efforts and I doubt that any evidence could be mustered that suicide would be reduced.
How did this article get published?
Certainly the article would not have been published in The Lancet if the authors had not had the gumption to submit it there. But do readers really believe that that if someone outside a tight circle of friends and family had submitted such an article to The Lancet, it would have been accepted?
Richard Horton, passing such a manuscript through to publication and attaching a commentary may demonstrate loyalty to your friends, but does nothing for the journal’s reputation or for the authors to have such embarrassingly bad science a matter of public record.
Unfortunately, I couldn’t find any information about the processing of this article from manuscript to final acceptance. Was it fast tracked? Certainly The Lancet found the article worthy of a brief commentary:
The risk of dying is increased in many illnesses, but the mortality associated with chronic fatigue syndrome is relatively unexplored. In The Lancet, Emmert Roberts and colleagues1 report results from a case register study that linked the clinical details of more than 2000 people with chronic fatigue syndrome presenting to a specialist clinic (in London and the south of England) with mortality outcomes over 7 years. This is the largest study of its type so far, and used a robust case definition.
“Robust case definition”? Give me a break!
I can’t speak for any of the other thousands of Academic Editors at PLOS One. The editors have an expressed commitment to publishing all articles that are not seriously flawed, so that post publication peer review can establish articles’ importance and contribution to the field. But I would not have even sent this manuscript out for review. Its flaws are too obvious and unfixable, and the unnecessary burden on reviewers to waste their time figuring that out. There are just too many more promising manuscripts and too few reviewers to process them all.