Science

Debate over PLoS data-publishing demands

Debate over PLoS data-publishing demands

The Public Library of Science, the open access publisher of the largest scientific journal in the world, PLoS One, has announced that from the 3rd of March, authors of an article in any of their seven journals must make all data related to the manuscript publicly available immediately upon publicati

The Public Library of Science, the open access publisher of the largest scientific journal in the world, PLoS One, has announced that from the 3rd of March, authors of an article in any of their seven journals must make all data related to the manuscript publicly available immediately upon publication. While many have welcomed this policy, others are concerned that it could be difficult to implement and dissuade people from publishing in PLoS journals. Open access to results has been a driving force behind the development and growth of PLoS, and the new open data policy is a natural continuation of that. PLoS hope that sharing of data will encourage collaborations and make it easier for old data to be obtained by relieving the burden on the scientists. They also believe it will permit validation and replication of results by third parties, as well as merging of data-sets to give new insights, which would not have been possible with only summary data. Advocates of open access have praised the policy, with Dan Gezelter of OpenScience.org stating that the policy makes PLoS “a much more attractive place to send our next paper”.

In many fields, such as genomics, submission of data to online repositories is already commonplace. Ian Dworkin, a geneticist at Michigan State University, has benefitted in the past from data sharing and believes the policy will allow other scientists to “address interesting and novel questions in the future” using the archived data.

However, not everyone has greeted the news so happily. Many authors are concerned about the size of the data, which PLoS states must be supplied in its raw form before any processing. The raw data produced when sequencing a human genome can easily be over 300 GB, meaning authors could have to make terabytes of data available. Exactly how this will be achieved is not clear, but the problem will also apply to many other fields handling large volumes of data.

Others are also concerned about data privacy. Some data may also not be released ethically, such as data relating to individual patients, or legally, if obtained from a third party. In these cases, PLoS requires the data to be available upon request, but not necessarily made public. In addition, many groups use the same data to produce multiple papers, each investigating the data in a different way. By making the data publicly available after publishing a single paper, other groups will be able to use it, potentially being ‘scooped’ to an interesting finding. Writing on her blog, neuroscientist Erin McKiernan describes how in many cases, “data acquired are like gold”, particularly in labs with less funding, meaning releasing their data could decrease the amount of mileage they are able to get from a data set, impacting future funding opportunities. This new policy is a brave step along the path to open science, and only time will tell how it impacts PLoS journals, authors, and the wider scientific community.