We are at a tipping point in the struggle for routine data-sharing.
We look to journals to implement policies based on a consensus of directives from governments, requirements of funding bodies, and, now, The International Committee of Medical Journal Editors (ICMJE).
But we should not confuse circulation of an ICMJE proposal with universal adoption of data sharing.
Moreover, the large group of academics who’ve long been advocating data sharing threatens to be splintered by backlash against an attempt to get release of a PLOS One data set that authors had promised would be available.
This blog post provides a quick tour of some recent developments in the push for routine data sharing. A follow-up post will identify some misunderstandings about the PLOS One PACE data set that I am trying to access. I will argue that there should be an immediate, unconditional release of the data set.
In defense of publicly available data sets for published papers
Alex Holcome’s brief nugget in The Conversation, Science is best when the data is an open book is one of the better statements about the necessity of journals’ requiring authors to provide public access to data as a condition for publishing. Some excerpts:
Many scientific societies recognise this. For many years now, some of the journals they oversee have had a policy of requiring authors to provide the raw data when other researchers request it.
Unfortunately, this policy has failed spectacularly, at least in some areas of science. Studies have found that when one researcher requests the data behind an article, that article’s authors respond with the data in fewer than half of cases. This is a major deficiency in the system of science, an embarrassment really.
The well-intentioned policy of requiring that data be provided upon request has turned out to be a formula for unanswered emails, for excuses, and for delays. A data before request policy, however, can be effective.
Implicit in Alex’s blog post is the assumption that authors of published papers should not have veto power over who re-analyzes their data.
My colleagues and I have had positive experiences when data was available to us at a public depository. In one instance, we were able to show we could replicate investigators’ original findings in Proceedings of the National Academy Of Sciences using their data, but we were also able to show that random numbers yielded essentially the same effects. This has to be an important qualification to the authors’ claims about their original data.
Getting data sets to reanalyze can be a frustrating task
Negative experiences in requesting data are more likely to occur when access depends on the willingness of investigators to release their data in response to a formal request. One of my negative experiences came when I asked for data to reanalyze results that the principal investigator claimed to demonstrate an effect of attendance of group therapy on the survival of patients with early breast cancer. My colleagues and had published a critique of the study showing that any such claims were dependent on multivariate analysis of dubious appropriateness But to delve further into the study, I needed access to some simple statistics that should have been included in the original report. When I formally sought request of these data, the principal investigator refused me, her university claimed that the data were intellectual property, and the US Office of Research Integrity responded with a statement that they had no authority to enforce sharing, despite sharing being mandated.
Premature optimism about data sharing and a false sense of progress
Some of us were prematurely optimistic about a new age of routine data-sharing having arrived when a joint publication occurred across 14 member journals, Sharing clinical trial data: a proposal from the International Committee of Medical Journal Editors. The article declared in no uncertain terms:
As a condition of consideration for publication of a clinical trial report in our member journals, the ICMJE proposes to require authors to share with others the de-identified individual-patient data (IPD) underlying the results presented in the article (including tables, figures, and appendices or supplementary material) no later than 6 months after publication.
What seemed as a bold call for data is only aspirational, not mandated at this point. We needed to remind ourselves of the resistance of high impact journals to previously “mandated” changes concerning both registration of trials and declaration of conflicts of interest. Despite endorsement of registration of trials, some journals like The Lancet routinely allows registration after data collection has started and blatant outcome switching, as in the PACE trial. New England Journal of Medicine has undertaken a vigorous effort to escape from previous commitments to declarations of conflict of interest.
And now, notoriously, an editorial in New England Journal of Medicine published about the same time as the ICMJE proposal warns:
A second concern held by some is that a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.”
The editorial shows what those who advocate universal data-sharing are up against. It was met with appropriate ridicule in the social media, including a stinging takedown by Richard Lehman Journal Review at the BMJ ‘s blog that deserves to be remembered longer than the editorial that inspired it. Some excerpts:
Having signed the remarkably radical ICMJE proposal, Jeff Drazen, editor of NEJM, co-authors an editorial simply titled “Data Sharing.” It begins by mocking a delusive biscuit-tin landscape of everybody happily sharing data, moves on to grave warnings about “research parasites” feeding off other people’s hard work, and ends up advocating “research symbiosis,” by which the original researchers collaborate with others in moving beyond the data already gathered. I think they are trying to be witty, and I don’t want to discourage that, but they make it clear that they regard re-analysis and meta-analysis as little better than what a tapeworm gets up to in the bowel, as illustrated on p.234.
Personally, I think we need all the data parasites we can get, as well as symbionts and all sorts of other creatures which this ill-chosen metaphor can’t encompass. What this piece really shows, in my opinion, is how far the authors are from understanding and supporting the true opportunities of clinical data sharing.
PLOS journals’ solution
Public Library of Science Journals (PLOS) is a large player in scientific publishing and a prime mover in the push for routine data sharing. PLOS is a family of seven journals. The largest of them, PLOS One, publishes over 30,000 articles a year. As a measure of its prestige, data published from the UK’s 2014 Research Excellence Framework (REF) [https://www.plos.org/uk-researchers-consider-their-plos-publications-among-their-best/] indicate that when UK scientists had to choose only four of their best articles to submit for the REF exercise, PLOS was the only Open Access biomedical sciences publisher listed in the top 10, and PLOS ONE articles accounted for nearly half of the approximately 2,400 PLOS submissions.
A February 24, 2014 PLOS’ New Data Policy: Public Access to Data declared:
In an effort to increase access to this data, we are now revising our data-sharing policy for all PLOS journals: authors must make all data publicly available, without restriction, immediately upon publication of the article. Beginning March 3rd, 2014, all authors who submit to a PLOS journal will be asked to provide a Data Availability Statement, describing where and how others can access each dataset that underlies the findings. This Data Availability Statement will be published on the first page of each article.
This does not represent a radical departure from what has always been the case with PLOS journals.
“PLOS journals have requested data be available since their inception, but we believe that providing more specific instructions for authors regarding appropriate data deposition options, and providing more information in the published article as to how to access data, is important for readers and users of the research we publish.”
What data do I need to make available?
We ask you to make available the data underlying the findings in the paper, which would be needed by someone wishing to understand, validate or replicate the work. Our policy has not changed in this regard. What has changed is that we now ask you to say where the data can be found.
On August 12, 2012, the PACE investigators published a paper in PLOS One. In doing so, they incurred a responsibility to make their data available.
It’s been almost 100 days since I requested the data from the PLOS One article . I have not received the data. My request has been turned into a Freedom of Information Act, which is decidedly what it was not. I’ve been deemed “vexatious” for having made it.
Further to your recent request for information held by King’s College London, I am writing to confirm that the requested information is held by the university. The university is withholding the information in accordance with section 14(1) of the Act – Vexatious Request.
“The university considers that there is a lack of value or serious purpose to your request. The university also considers that there is improper motive behind the request. The university considers that this request has caused and could further cause harassment and distress to staff”.
Several readers have raised concerns regarding the analyses reported in this article. We are also aware that there have been requests for the data from this study.
The article was published in 2012; the PLOS data policy that applies to the article is that for submissions prior to March 3, 2014, which is outlined here: . The policy expects authors ‘to make freely available any materials and information described in their publication that may be reasonably requested by others for the purpose of academic, non-commercial research’. The policy also notes that access to the data should not compromise confidentiality in the context of human-subject research.
PLOS ONE takes seriously concerns raised about publications in the journal as well as concerns about compliance with the journal’s editorial policies. PLOS staff are following up on the different concerns raised about this article as per our internal processes. As part of our follow up we are seeking further expert advice on the analyses reported in the article, and we will evaluate how the request for the data from this study relates to the policy that applies to the publication. These evaluations will inform our next steps as we look to address the concerns that have been noted
In a follow-up blog post, I will review restrictions that have been suggested for making the PLOS One PACE data available. Some of these proposed restrictions are irrelevant. Others make false assumptions about what was promised to patients in the consent process for the study. I will show why the data should be unconditionally available. Not doing so is a bad precedent for future data-sharing.