I was prompted by a lot of media coverage about the mental health and suicidality of airline pilots. Starting with wild headlines, the coverage struck me has exaggerated to the point of irresponsibility. So, I began an investigation. I soon lost confidence that there was much basis for the claims in the social media, but I also saw that the problems arose with what the authors themselves said about their studies. As often is the case, hype and distortion starts with the exaggerations of authors of peer-reviewed papers, and particularly in their abstracts. I suspect that often journals don’t read anything but the abstracts, if there isn’t a press release available.

 Follow my progress through the paper and see how I came to my judgments.

I worked my way through the article and then I made a quick comparison to the media coverage that first annoyed me.

I track news reports back to the original studies that generate them. In the case of this report, the task was easy, both because some of the media coverage actually made reference to the title of the journal, Environmental Health, and the journal turned out to be open access rather than pay walled.

titleHere is my running analysis, as I read through the paper. You can click on the link above to obtain the paper and follow along.

The title

The title indicates  this is a cross-sectional study using anonymous web-based survey.

What is my hypotheses based on the title alone? As a cross-sectional study, we can’t make causal inferences, although probably the authors and the media coverage will be tempted to do so anyway. Being an anonymous web-based survey means that likely depended on a convenience sample of the minority of pilots willing to visit the website,with unknown relationship to whatever larger population was being sampled. We are being set up for not being able to make any kind of  statements about pilots in general, only about the biased sample of pilots who responded to this survey.

The abstract

Apparently the study was prompted by a widely covered story last year of a pilot deliberately having crashed an airliner, killing all the passengers. So, the study is timely, but likely hastily put together. It describes itself as the “first study,” which is usually a warning of hype to come. It’s unclear why it brings up “especially among female airline pilots.” There are not  yet many female airline pilots, and to my knowledge none is ever died by deliberately crashing a plane. My hunch is this something went wrong in the sampling, and the authors are trying to turn it into a strength of the study. But I’m open to be proven wrong.

A very broad shotgun sampling strategy is described, drawing on multiple sources. In less the authors are careful, they going to get a very mixed (heterogeneous) sample from which they will have trouble making valid generalizations.

A look at the results sections confirms that they are making inferences about diagnosis and suicidality from weak self-report data. Diagnosis requires clinical interviews and self-report data likely overestimates the prevalence of depression. Apparently the measure of suicidality came from an item on the  depression questionnaire. Such items are notoriously invalid estimates of the likelihood of whether someone who will die by suicide. We are heading further into trouble. I defer judgment about the importance of sexual or verbal harassment. Web-based surveys are often quite vague in their wording, and so we can’t tell the seriousness of the sexual or verbal harassment.

I suspect that  the following conclusion was something the authors started with, before they analyzed their data.

“Hundreds of pilots currently flying are managing depressive symptoms perhaps without the possibility of treatment due to the fear of negative career impacts.”

Heightened scores on a self-report questionnaire are not necessarily indications of need for treatment. The authors do not know whether their sample should be getting treatment if they are not. Obviously, rates of treatment in the general population are lower than rates of depression. But a lot of depression revealed in surveys is of mild to moderate severity and formal treatment is not necessarily indicated.

The wrap up recommendation is quite commonly made, a downright cliché.  I’m not sure how they were any more confident of it having done their study than before. However, we know that screening for depression, including with self-report measures of suicidal ideation, are useless in reducing suicidality. They identify too many people at risk and divert resources to people at relatively low risk, rather than allowing immediate attention to persons in crisis.

We seem to have arrived  at a tired old recommendation for screening and early intervention for pilots who screen positive. This is standard across many different population and an little evidence that it would accomplish anything, and certainly not head off the unlikely event like the crash of the Germanwing flight.

The introduction

The introduction immediately links the rational fot the paper to the deliberate crash of the Germanwings flight. It makes the erroneous assumption that because we can track back the crash to the copilot’s history of depression and suicidality. I’m skeptical because just because we can track this event back in time, doesn’t mean that we can move forward from the circumstances and prevents and other of it which is likely to be quite rare.

On March 24, 2015, Germanwings flight 4U 9525 crashed into the French Alps killing 150 people. Investigators of this tragic event report the 27-year-old co-pilot deliberately crashed the plane [1, 2]. Further examination of the co-pilot’s history found evidence suggesting the co-pilot suffered from clinical depression [3]. Previous suicide attempts and having a history of mental disorders, particularly clinical depression, are risk factors of suicide [4].

A revelation that we will should in mind when we review the actual results of the study: only 4% of pilots are female. unless the sample is huge or unrepresentative, we’re not going to have many female pilots to talk about. My curiosity is heightened about the authors’ mention of females in the abstract.


The description of the sampling strategy sounds broad, perhaps difficulty obtaining the sample with a much simpler strategy. It’s unlikely that we can figure out how representative the sample is or the nature of potential biases.  We are not clear how female pilots were targeted or why.

Recruitment methods included targeted e-mail, newsletters, word-of-mouth, handing postcards to pilots, and aviation publication advertisements. Airline pilot populations that gave rise to the survey population included pilot unions (>5 unions), airline representatives (>65 airlines), pilot groups (>12 groups), and aviation safety organizations (>2 organizations). We targeted female pilots in recruiting because of the small percentage of female pilots among the general airline pilot population. We downloaded 3485 surveys on December 31, 2015.

The last sentence in the method section confirms that the authors used a cut off on the self-report questionnaire likely to produce a lot of false positives and otherwise unreliable estimates of the prevalence of clinical depression.:

Therefore, we refer to meeting the cut-off of having a PHQ-9 total score of 10 as depression.


The opening of the results confirms that this is a sample from wide ranging settings that presumably vary greatly in the qualifications of pilots and particularly the screening at they underwent.

Table 1 indicates that women are much more represented in the sample than the proportion of women pilots flying. They also indicate they are disproportionately young women. This will clearly be a bias limiting generalizations about the larger population of pilots.

Table 1 also breaks down the sample by age in a way that is not particularly helpful.The breakdown actually frustrates any effort to track numbers attached to other breakdowns, like, importantly males versus females.

Figure 1 reveals that there were number of pilots in the 70 to over 90 age range. I suspected that is a matter of the carelessness of the authors rather than the proportion pilots actually being that old.

Table 3 list the sole item from the self-report depression questionnaire including the one used to measure suicidality.


The columns break down the answers to the question by age. As can be seen, over 95% of the pilots indicate no suicidal thoughts. Note that the item combines “better off dead” and “hurting oneself in some way.” These are vaguely worded items and with so few people answering them in a positive direction, and hesitate to make much of them. Recall that we also have data from the sample indicated some of the pilots are from 70 to over 90 years of age. Who knows what is going on here. Responsive to this item are unlikely to be clinically useful measures of likelihood of taking ones own life anymore than they are in other populations.

The estimates of clinical depression are lhighly unreliable. But I spot an anomaly. The authors indicate that the proportion of males was slightly higher than the proportion of females. These figures are contradicted by hundreds of studies that indicate that post adolescence, twice as many females score above cutoffs than males. Rather than trying to develop some post hoc explanation, I take this has further indications of the unreliability of this data.

Among participants who answered the PHQ-9 questions, 233 (12.6%) met threshold associated with clinical levels of depression. Two-hundred-and-four (12.8%) males and 29 (11.4%) females (χ2 p = 0.52) met depression threshold. One-hundred and ninety-two (13.6%) of the 1413 pilots who reported working as an airline pilot in the last 30 days met depression threshold (Table 4)


I’m inclined to give up on the study because of its methodological weaknesses, evidence of sampling bias and incorrect coding of data. The authors are making much too much of poor quality data that was likely hastily collected.


Media coverage of the study typically emphasizes what authors say about it in their abstract and their conclusion. Let’s jump to the conclusion:

This study fills an important gap of knowledge by providing a current glimpse of mental health among commercial airline pilots, which to date had not been available. Our study found 233 (12.6%) of the 1848 airline pilots responding to the PHQ-9 met criteria for likely depression. Of the 1430 pilots who reported working as an airline pilot in the last seven days at time of survey, 193 (13.5%) met these criteria. Seventy-five participants (4.1%) reported having thoughts of better being off dead or self-harm within the past two weeks. We found a significant trend in proportions of depression at higher levels of use of sleep-aid medication (trend test z = 6.74, p < 0.001) and among those experiencing sexual harassment (z = 3.18, p = 0.001) or verbal harassment (z = 6.13, p < 0.001). Although the results have limited generalizability, there are a significant number of active pilots suffering from depressive symptoms. Future studies will evaluate additional predictors such as sleep and circadian rhythm disturbances.

Poor mental health is an enormous burden to public health worldwide. The tragedy of Germanwings flight 4U 9525 should motivate further research into assessing the issue of pilot mental health. Although current policies aim to improve mental health screening, evaluation, and record keeping, airlines and aviation organizations should increase support for preventative treatment.

Really authors, you have no business making any claims of this kind. Shame on you, if you’re intelligent enough to publish a scientific paper, you should be more aware of the limitations of your data and your responsibility and accurately conveying those limitations in any interpretations. Yes, you do indeed have “limited generalizability” to your data, so limited that you should be silently you make a fuss.

Coverage of the study in social media

The study first came to my attention because of click bait tweets with links to a British newspaper.

I had thought that a tweet about some coverage of the study was particularly irresponsible and inaccurate. But now we compare the headlines, we see that they largely reflect what the authors inaccurately say about their study

Hundreds of airline pilots are suicidal or thinking of self-harm, study finds

Pilots ‘are managing depression, and even suicidal thoughts, without the possibility of treatment due to the fear of negative career impacts’

Actually, it’s not a bad reproduction of the distorted statements in the authors’ abstract.

photo from newspaper.PNG

Photo from the Independent article.

