The names of the six finalists for the ASAP awards are now out, and I was pleased to see Daniel Mietchen’s name in the list. Daniel Mietchen, Raphael Wimmer and Nils Dagsson Moskopp have been working on a really valuable project. There was an opportunity in exploiting open access literature to illustrate articles in Wikipedia.
Many scientific articles have a “supplementary” materials section, which can be rich in multimedia, but these artifacts may not as easy to find as those that make up the main body of scientific manuscripts. What Daniel, Raphael and Nils did is maximise the impact of those research outputs by putting them in a place where they can be found, explored and reused by scientists and non-scientists alike.
They developed a tool called Open Access Media Importer (OAMI) that searches for multimedia files through Open Access articles in PubMed Central and uploads them to WIkimedia. This tool exemplifies the added value of papers published under open access using a libre copyright licence such as CC-BY. Not only are the articles available to read, but also they can be repurposed in other contexts. The files that the OAMI bot uploaded now illustrate more than 200 English Wikipedia pages, and many more in other languages.
Q: How did you get started with this project?
DM: My PhD was on Magnetic Resonance Imaging, which primed me to work with videos, and my first postdoc was on music perception, which naturally involved a lot of audio. Both made me aware of all the audiovisual material that was hidden in the supplements of scholarly articles, and I found that the exposure of that part of the literature left much to be desired. For instance, every video site on the Web provides thumbnails or other forms of preview of video content, but back then, no scholarly publisher exposed video content this way. Wikimedia Commons did. I also noticed that Wikipedia articles on scientific topics were rarely illustrated with multimedia. So the two fit well together. Nils, Raphael and I met online, and then sent our first funding proposal in 2011 in order to automate the import of supplementary audio and video files from scholarly articles into Wikimedia Commons.
Q: How did you get started with the project?
DM: We chose to start with PubMed Central. It is one of the largest repositories of scholarly publications, many of which have supplementary materials, and it has an API we could use.
Q: How far have you come?
DM: We have now imported basically all audio and video materials from suitably licensed articles available from PubMed, save a few where there were technical difficulties with file conversion or upload. Initially, we did not know how many files this would be, and had roughly estimated (there is no easy way to search for supplementary video or audio files) the number at somewhere between 5,000 and 10,000 back in 2011. The bot now adds several hundred files from newly published articles every month and passed 14,000 uploads to Wikimedia Commons earlier this week. So if you are going to publish multimedia with a suitably licensed paper in a journal indexed in PubMed Central, you – and anyone else – can find it on Commons shortly thereafter.
Q: How does that compare to other Wikimedia content?
DM: Most of the uploaded files are videos, and given that there are about 36,000 video files on Commons in total, about one third of them now has scientific content. That is a much higher proportion than, say, that of scientific articles out of all articles on any Wikipedia. However, the number would be even higher if more authors (or journals) would decide (or funders mandate) to put their materials under a Wikimedia-compatible license. If materials from their papers cannot be reused on Wikimedia Commons, they are not Open Access.
Q: Were there any hurdles along the way?
DM: Sure. The project actually evolved more slowly than we had anticipated because we had underestimated the extent to which the standards for machine readability of manuscripts deposited in PubMed Central are being ignored by publishers, or interpreted in a rather inconsistent fashion. We put forward a number of suggestions to PubMed Central – who are very cooperative – in order to monitor standard compliance and to facilitate reuse by us and others, and we’ll present a paper on that at a conference during Open Access Week.
Q: What else can OAMI do, and how can people have access to it?
DM: The software is available on GitHub and was built to be both reusable and extendable, so if someone wants to write a plugin to export the videos from PubMed Central to places like YouTube, they can start doing that right now (in fact, work on a YouTube pipeline has already started). Or we could think about harvesting in places other than PMC, or materials other than audiovisuals. If anyone has ideas in this regard, they would be most welcome.
Q: What comes next for you?
DM: This was and is a spare time project and will likely continue as such for some time. While it was a perfect fit to my Wikimedian in Residence project at the Open Knowledge Foundation Germany that ended this summer, I am continuing to work at the interface between research, openness and the public, as I am now at the Natural History Museum in Berlin, working on the pro iBiosphere project that aims to lay the ground for integrating biodiversity research with the Web, which will require a greater degree of openness than what we are used to now, as well as better machine readability of the relevant information, a topic that I am currently focusing on.
I met Daniel online a few years ago, and he has been a source of motivation and inspiration for a lot of us. It makes me very happy to see that his work has not gone unnoticed, and look forward to seeing the outcome of his next projects.