Open Data

PLOS (the Public Library of Science) is a non-profit open access publisher of science articles. Their goal is to make scientific data accessible to everyone, in the name of transparency and open communication. Now they have taken their approach one step further, announcing their policy that all articles published in a PLOS journal must submit their original data so that anyone can access and analyze it for themselves.

In an effort to increase access to this data, we are now revising our data-sharing policy for all PLOS journals: authors must make all data publicly available, without restriction, immediately upon publication of the article. Beginning March 3rd, 2014, all authors who submit to a PLOS journal will be asked to provide a Data Availability Statement, describing where and how others can access each dataset that underlies the findings. This Data Availability Statement will be published on the first page of each article.

They allow for exceptions—when subject confidentiality is an issue, sensitive information related to endangered species, and when the authors do not own the data. In such cases, however, data must be available upon request, and not controlled by the authors. Otherwise the raw data must be made available.

I think this is a fabulous idea, for many reasons. We frequently write here at SBM about the challenges faced by the various institutions of science to maintain high standards of quality and transparency. Those challenges include publication bias, the literature being flooded with preliminary or low quality research, researchers exploiting degrees of freedom (also referred to as “p-hacking”) without their questionable behavior being apparent in the final published paper, conflicts of interest, the relative lack of replications and lack of desire on the part of editors to publish replications, frequent statistical errors and the occasional deliberate fraud.

There are many ways to erode the quality of scientific research, or to manipulate research to achieve a desired end (rather than discover what is real). In the end, however, I am not nihilistic. Science can and does still move forward, although slowly. We just have to recognize how messy the process is so that we can best sift out the noise and find the reliable evidence.

It does feel as if we are in an era of self-examination and increased efforts to identify and correct the failings of modern scientific research. It further seems as if the transparency and immediate access to data afforded by the internet is largely responsible for this. This is also an era of experimentation where various models are being proposed or tried as potential solutions to the various problems faced by science.

The Open Access movement is one such experiment, and PLOS has been its flagship. In my opinion, the experiment has been a partial success, but has created some of its own problems. It is certainly extremely useful to have immediate access to a full published article when researching a topic. This facilitates post-publication peer review, and the discussion within the community about the research. PLOS has also managed to maintain a reasonably high quality among its journals.

However, open access journals without such high standards have also proliferated. The business model of most open access journals is that they do not have subscriptions (by definition) so they pay for themselves by charging authors a publication fee. This can create a perverse incentive to publish lots of low quality papers in order to garner those fees, and since publication is only online (without the expense of print journals), creating minimalist open access journals with terrible quality control can be profitable. Last year Science published the results of a “sting” operation exposing the poor quality of many open access journals (PLOS, to its credit, did not fall for the sting).

Open access is therefore not a panacea, and comes with its own challenges. It can, however, address the issues of transparency and universal access to facilitate review and discussion. I therefore think it is a great idea for PLOS to go “all in” on this strategy. If you are going for transparency, then make the raw data transparent, not just the final worked-over data.

In fact we have proposed this previously as one strategy to combat the problem of p-hacking. If researchers disclose the process by which they collected data, all the data they collected, and every way it was analyzed, then p-hacking would become more transparent, and this would hopefully discourage the practice. At the very least it would make it easier for other researchers to reanalyze the data to see if the results are genuine or an artifact of creative analysis.

In fact, I think all journals should adopt this policy. Researchers should make available to journal editors all their raw data, so that the journal can make it available either online or on request to other researchers who want to review or reanalyze the data, or just to help them replicate the study.


I think we are in an exciting time in the evolution of the institutions of science. Many problems that have been festering for a long time are being exposed and discussed. This may be unsettling—to learn about all the flaws in the practice of science. But these flaws all have potential solutions, and many of them are not difficult at all, they just need the will to execute.

Journals and their editors are largely the gatekeepers for the official record of scientific research—the published peer-reviewed literature. Therefore many of the solutions to these problems rest with them. Open access is one approach, that I feel will have a long future and play an important role in reforming the institutions of science. Requiring open access to data is a great move that capitalizes on the strength of the open access movement.

Print journal editors, in fact, would be well-advised to follow suit.

In recent years it has become policy for researchers to disclose funding and potential conflicts of interest. Clinical trial registries have also been created so that companies cannot hide research whose results they don’t like. But more reforms still are needed.

Universities should institute more uniform education of researchers so that they are more aware of the problems of p-hacking, to minimize error and bias and to maximize scientific rigor.

Journal editors should publish more negative studies and more replications (including exact replications). Not everything has to end up as a full article in the print version of the journal. Online supplements can provide the space needed to publish whatever is necessary to maximize the unbiased flow of quality scientific information.

Science is often characterized as a self-corrective process. The process of science itself needs to be self-corrective. These fixes do not require massive resources, just reasonable changes in policy. PLOS should be commended for helping lead the way, at least with respect to open access to data.

Posted in: Science and Medicine

Leave a Comment (43) ↓

43 thoughts on “Open Data

  1. windriven says:

    ” This may be unsettling – to learn about all the flaws in the practice of science.”
    “The process of science itself needs to be self corrective. ”

    Print journals are generally the products of large and generally quite profitable publishing houses such as Elsevier. Their prime directive is to return dividends to shareholders. This predictably biases against replicants yet replication is a cornerstone of good science.

    The peer review process used is enough of a mess that it is discussed in the popular press.

    The print journal paradigm has arguably run its course. Models like PLOS, especially with the demand for free access to underlying data, are likely to deliver better science more efficiently. Quality peer review post hoc is better than lukewarm peer review a priori.

    1. Self Skeptic says:

      Thanks for the link.

  2. john says:

    This is the first and only worthwhile thing I’ve read on this blog all year.

    Step. It. Up, Docs.

    1. WilliamLawrenceUtridge says:

      Feel free to submit a guest post, and help bring up the overall standard of the blog.

    2. Harriet Hall says:

      “This is the first and only worthwhile thing I’ve read on this blog all year.”

      Is that because it’s the most critical of science-based medicine? Do you think some non-science-based approach is better? Would you care to explain why you think everything else on this blog is not worthwhile?

    3. windriven says:

      “This is the first and only worthwhile thing I’ve read on this blog all year. ”

      Is this the first entry you’ve read on this site this year? If not, what has kept you around?

  3. rork says:

    1) “all the data they collected, and every way it was analyzed,”
    Unenforceable. PLoS isn’t saying that I think. There’s a reason. Example: If an experiment did not look so good, and we repeat it and the data looks much better, we just don’t mention the first one. Also, it’s not clear that the PLoS statement goes very far in asking us to divulge how data was analyzed. So I’ll have their data, but perhaps never be able to reproduce their results, and won’t know what fool thing they did. Saying their analysis is wrong is difficult in that situation, since I don’t know what it was.

    More trivial:
    2) Data availability and open access are pretty separate issues in my mind.
    3) I’ve had papers where we measured protein abundances using arrays, and also using massive plate-read sandwich ELISA to follow up interesting candidates. Journals make us divulge the array data but not the ELISA data, even if the ELISAs are on 6 times more samples. I’ll bet allot of people can’t even extract the raw data out of their plate-readers – folks record the estimates after having run data though a calibration curve. Also note: current standard for Affy arrays is to share the .CEL files, not the raw images, and we find that acceptable enough.
    4) I think people will have trouble with showing their analysis methods. Some biologists near me use (and if naive, aren’t too embarrassed to say that they used) Prism. What tests they actually did and if they knew what they were doing is usually unknown. I often can’t reproduce their results. For example, they have to decide whether to assume “sphericity” or not – it took me a while and reverse-engineering to figure out that this is asking if the variances in the groups are assumed equal or not. For fancier folks there’s stuff like sweave (embedded R code in latex documents) – bit of a learning curve. The PLoS requirements are terribly vague.
    5) It’s gonna be fun to see what people leave us in those supplements.

    1. Angora Rabbit says:

      “If an experiment did not look so good, and we repeat it and the data looks much better, we just don’t mention the first one. ”

      Rork, you are exactly right. Our posts crossed in the ether.

      Besides, there is the old joke that if you ask two statisticians a question, you get three answers. I’ve sat on study sections where ten minutes is spent arguing about a statistical test, and no one agrees. Damn, I’ve had to teach reviewers statistics. I can’t imagine the fresh hell wherein every internet idiot wants to debate my data. I’m happy to share my data with a knowledgeable scientist, but to share it with some of the posters here in SBM??? Egads.

      1. windriven says:

        “I can’t imagine the fresh hell wherein every internet idiot wants to debate my data. ”

        To share is required. To debate is not.
        – An internet idiot

  4. Scott says:

    In some fields, it’s not viable to make the original data available. Take experiments like the LHC – the overwhelming majority of the data is immediately discarded instead of being persisted to disk (out of necessity – no storage hardware ever made could keep up). Even what’s left over is something like 15 petabytes a year, which is just too much to try and make publicly available.

    I won’t argue that what can be, should be – but there are a lot more limitations than mentioned here, or in the PLOS announcement.

    1. David Gorski says:

      Indeed. The same thing applies with a lot of basic molecular biology studies. Is PLoS going to require, for instance, that the original, uncropped, autoradiographs be included in the supplements, for instance? Original lab notebook analyses, either copies or transcribed to print? Or the step, by step, analysis of data, which in some cases can be many, many pages long? Data transparency is great in concept, but when you start considering the nuts and bolts of what, exactly, data transparency means, it gets very, very messy very quickly.

      Moreover, as rork points out, the policy as currently written is so vague as to be almost completely unenforceable, and it’ll be really interesting to see what gets dumped in those supplemental data sections. I’m all for data transparency, but this is a mess. My guess is that fewer investigators are going to submit their work to PLoS journals. It’s already enough of a pain as it is to do so, and this is just one more hurdle. For example, I just had one paper published by PLoS One, with another one in the can to appear soon. It was a big enough pain in the rear to submit to PLoS to begin with, not even counting the $1,300 or so per manuscript in page charges due to the journal being open access. If I wasn’t sure I would be doing it again before this announcement, now I really don’t know if I will do it again.

      Finally, there is going to be one drawback to this that perhaps the PLoS powers-that-be didn’t think of. (Actually, I’m not sure they thought through the consequences of this all that well.) The cranks, quacks, and antivaccinationists will have a field day with this. They already do their damnedest to get the original datasets for various studies they don’t like, the better to “analyze” them the way they want or to find flaws in them. You could argue that, knowing that anyone, including cranks, can see their original data will motivate scientists to produce a higher standard in their publication. Maybe so in some cases. In fact, probably so in some cases. However, what’s more likely to happen in most cases is that scientists in controversial fields frequently attacked by cranks just won’t publish in PLoS journals anymore because, however, rigorous their analyses are, they’ll have to put up with the hassle of cranks”re-analyzing” their data to discredit them. Most scientists care far more about what other scientists think about them than what cranks do, but on the other hand it’s understandable not to want the hassle of dealing with, say, antivaccinationists.

      I think DrugMonkey nailed it:

      1. Angora Rabbit says:

        David, our two hearts beat as one. :) Along with Drugmonkey (thanks for the link, Liz).

        Here’s an even greater horror show: Now that PLoS is doing it, Congress now has a model in which they can consider expanding the program. And they already are. This will mean yet another half-dozen (minimum) university compliance officers to track that the data are posted, because the university will be fined into the next century if even one mistake is found. Their salary $ will mean another bump-up in Indirects, and less money spent actually doing research.

        And, if you think you’ll get off with no effort on this new initiative, guess again. Plan to spend yet more precious days making sure the student’s data are spit-assed perfect and you will have to do all the uploads and curating yourselves. Raise your hand, everyone who has never published a paper with an error in it.

        My Tier1 institution currently has 20 – yes, twenty – distinct compliance committees mandated by the Feds just that focus on research. We have no idea the size of the iceberg under this.

        1. David Gorski says:

          Yup. I can see it not being long before the federal government requires something like this of all federal contracts—with no additional financial support to cover the cost of the additional data curation that will be necessary to comply with such a policy. I can also foresee failure to adequately abide by this policy gradually becoming equivalent to scientific fraud in the eyes of the government.

          On the other hand, the requirement that federally-funded investigators deposit their manuscripts into PubMed Central has been fairly widely ignored without much consequence, probably because the NIH and other federal funding agencies are overstretched enough as it is and can’t adequately monitor compliance. Ditto the requirement that all clinical trial results be deposited in

  5. Angora Rabbit says:

    I’m sorry but I really have to disagree with the PLoS policy, and I am speaking as a published PLoS ONE author. The opportunity for this “transparency” to cause significant mischief is, in my opinion, woefully understated if not ignored.

    If you publish something controversial, then a competitor who disagrees, or worse, a commercial operation with a significant financial interest, could make the author’s life hell. In my field of toxicology, this is absolutely happening and does happen to myself and my colleagues. PLoS is offering another opportunity for harassment. And although the charges will be unfounded, just the stress itself can make an author’s life miserable – been there, got the medal.

    Point the Second is that who would offer the raw, raw data? Any thinking person is going to present sanitized data that can’t be argued against. This in my opinion is bad for science. Reviewers already insist on the perfect western and the perfect immunostain, when the reality is such a thing seldom exists due to biological variance. I strongly argue that this naive policy is going to promote fraud, not reduce it. It’s the Law of Unintended Consequences.

    Any reader of SBM should understand that this is a two-edged sword. Sure, you can go after the PLoS chiropractic paper (as if there would be such a thing), but they can in turn come after you.

    From now on, the only thing I will submit to PLoS journals are topics that are “safe.” Which frankly is bad for science. But I don’t need every idiot on the internet trying to argue about my data. I have better things to do with my time, like conduct more experiments.

    1. David Gorski says:

      I strongly argue that this naive policy is going to promote fraud, not reduce it. It’s the Law of Unintended Consequences.

      Excellent point. I agree. I hadn’t thought of it, but this policy arguably through the law of unintended consequences perversely increases the incentives for scientific fraud, to make the conclusions completely “bullet-proof.”

    2. rork says:
      is about the Witwer paper, saying you get the intended consequences – but it’s just correlation between data availability and paper quality.
      The unintended ones are just theoretical so far.

  6. LIz Ditz says:

    Also see Drugmonkey

    The first problem with this new policy is that it suggests that everyone should radically change the way they do science, at great cost of personnel time, to address the legitimate sins of the few. The scope of the problem hasn’t even been proven to be significant and we are ALL supposed to devote a lot more of our precious personnel time to data curation. Need I mention that research funds are tight and that personnel time is the most significant cost?

    And then DM goes on to identify 3 more problems with this new PLoS policy. You should read it.

  7. I understand the potential practical problems with this approach. It is an experiment, and it will be interesting to see how it plays out and how it is enforced. I’m sure if PLOS submissions plummet they will tweak the policy.

    The cranks, however, will always abuse open access. They do it now. Do we want to craft policy to avoid being harassed by cranks, or do we want to craft policy to optimize the transparency and review of science, and then figure out ways to deal with the cranks?

    This is analogous to the internet and open access publishing itself. The cranks now have easy access to published papers and are using that to make mischief. But I wouldn’t go back to the old ways just because of this.

    The issues raised about practicality are separate and will obviously have to be addressed. PLOS will likely have to craft and iterate more a detailed policy to deal with the range of specific situations.

    1. Angora Rabbit says:

      I understand that, Steve, but now it is easier for them to make mischief because they are being handed the opportunity to declare fraud or shout fire, which is a very different response than if someone misquotes my papers on the internet. I have the luxury of ignoring the misquoter. I am not trying to give cranks ideas but this is a very real concern, because the individual and the institution must pay attention, as required by law. There are cranks and then there are cranks who get the attention of the Feds. I’m going to guess that you have not yet had to deal with the latter. Trust me, you don’t want it.

      Extend this to now animal research – my god, the animal activists must be drooling at the possibilities (a nice Pavlovian behavior).

      I’ve always been happy to share my data with legitimate investigators with legitimate interest. This does not give that same courtesy to idjits who can’t tell a hawk from a handsaw, let alone correctly distinguish paired and unpaired t-tests.

      in my opinion, this is a cannonball to swat a mosquito. There are already well-established avenues to deal with legitimate concerns about data integrity. This ain’t it.

  8. Harriet Hall says:

    Hasn’t the real problem been that some authors have not been cooperative with legitimate researchers who might be better qualified to analyze the data or whose own research would benefit from access to it? Might there be another way around that problem without making all that material available to everyone? Probably not. It would hard to define and impossible to enforce. Maybe the PLOS policy is the best we imperfect humans can do at the moment. As Steve says, it will be interesting to see how it plays out.

  9. WilliamLawrenceUtridge says:

    My first thought was “oh man, someone is going to release a massive privacy breach. At some point, individual patients and their personal medicla information are going to be identified as part of this.”

    Oy vay, the prize and perils of Big Data will be another factor. At some point, somebody will dump a massive amount of this into some sort of statistical analysis package and pull out gold, or feces.

  10. The concerns all seem legitimate. I just think we should explore solutions that still allow for easier access to data. Perhaps journals can have the data and grant to those with a legitimate interest, as a compromise.

    I think that no matter what, we are going to have to deal with the abuses of greater transparency and access. Look at the debacle with climate change and access to raw temperature data.

    My sense is that we need to figure out solutions that do not involve limiting access to data. That will be a losing battle in the age of computers and information access. That genie, basically , is out of the bottle.

    Limiting access to data just drives more conspiracy mongering and crankery. So does granting access to data. We need to deal with the cranks, not let them determine policy with regard to transparency.

    In terms of intrusive government oversight, the burden of data handling, etc. – those are also legitimate concerns, but don’t seem unsolvable.

  11. Mike says:

    If you didn’t see, here is a phd who looked back at his studies and questions them:

  12. Crankyepi says:

    I don’t have much new to add except that I agree with Rork, Angora Rabbit and William Lawrence Utridge. We should also consider the possibility that the released datasets will be used to generate new manuscripts from the incompetent/lazy passed off as “original” research. (see RetractionWatch)
    If the authors are honest I can typically spot statistical and design errors from the “printed” article alone. If the authors are not honest then they will provide a dataset which matches their findings. And yes, there are areas of statistical methods where fierce battles are raging among the various “camps.”
    I think I would prefer that instead of a data dump the authors be allowed to expand in great detail so that experiments and studies can be replicated properly. Journals have such strict word count limits many times you can’t truly explain the study; perhaps the actual nuts-and-bolts details of the methods could be provided in a supplement? And then if someone had a question even after that the authors could address it?

  13. rork says:

    It’s with trepidation that I maybe-contradict some of the heavy people here but I think divulging data (and code) is a great thing. Credentials of the consumer be damned. I’ve had to cough up the array data and clinical variables, along with statistical analysis, for years. It helps insure you are honest. I could name groups who fail to divulge, and how I think I know why, but that would be suicidal. (For really big data, they will even use extortion: if I want to see the data we have to make them a co-author; or we don’t get to see the data but can ask them for a particular analysis – and this is data that journal guidelines say they were obliged to make public. Maybe they made array data public, but not the clinical variables like time to death – most reviewers don’t catch that.)

    The biologist or statistician holding and analyzing data should have it in a good format in reasonably good order to begin with (biologists may not though), and I’m going to like that I get to better confirm that my own coauthors aren’t cheating before we submit. My colleagues are often very shy about sending me their data, cause I don’t shrug off fudging (if my stats reputation is that of a cheater, it discredits all of my past and future collaborations). I’ve asked for data when suspicious and been dropped from projects several times – and that’s a blessing, I want to get the hell away from such people before I know for sure they are outlaws, because if I know and don’t tell on them, I am also an outlaw (scientific misconduct).

    The array community debated these kind of things allot in the “Duke Scandal” era, and it was no doubt covered here. John Ioannidis had a paper out where they try to reproduce the first result in high-profile papers, and usually fail, probably also covered here.

  14. Bryan says:

    I see a huge practicality issue here. In my lab we do a fair-bit of superresolution microscopy. The raw data of a single colour channel of a single image is, on average, about 8 gigabytes. Meaning your standard 3-colour image’s raw data is 20-24 gigabytes in size. For any one experiment, for each condition, we try to image a large number of cells – 30 or so in general. That means that one experiment containing only negative control and the introversion, you’re talking about 1 terabyte of data. The most recent study we’ve submitted had a total imaging data set (i.e. not counting blots, etc) of over 10 terabytes. It cost us about $1.5K just for the RAID infrastructure to store that data.

    Who is going to archive that for open access? Who is going to pay the cost of transmitting that data? How are you going to distribute it? If I make the data available via my servers, whose going to pay for the extra band-width? Is publishing in PLoS going to obligate me to keeping multi-terabytes of data available on-line, at my expense, indefinitely? Alternatively, is PLoS really going to pay to do that on their end? While the idea is laudable in some cases, it is completely impractical in the era of big data.

    1. rork says:

      I think solutions can be found in many cases via public repository of data according to standards set by the community expert in the methods. GEO and ArrayExpress meets this need for divulging “array data”. Protein mass-spec and massive sequencing are other examples, perhaps less nailed down. None of these want the full truly-raw data, but something a bit more compact. The journals quickly require that big data should go there rather than to them.
      If there’s no such standard and public repository for your experiments, the journal might let you write a few sentences in the paper about how whatever you are offering for data is a good compromise between precision and the internet catching on fire.

  15. Andrey Pavlov says:

    I certainly feel a bit out of depth to really weigh in on this topic as I am a pretty novice researcher myself and have never worked on any federally funded research projects.

    However, I happen to be procrastinating working on my first first-authorship and it happens to include a rather large database from which I am culling my data. The raw data is 1134 patients, all prospectively enrolled, and with a lot of parameters such that it originally contained ~25,000 cells of data. To that I added another ~30,000 data points using retrospective chart abstraction to calculate APACHE scores for each patient. In the process of doing this I discovered errors in the database – nothing huge or systematic, just some incomplete data, some data that was duplicated in a manner in which I couldn’t tell which set of data actually belonged to which patient, and some cases where the patient simply somehow ceased to exist in our computer system for me to look up additional information. Between those and deciding to exclude patients presenting with cardiopulmonary arrest, my final n=979.

    Along the way, I had to make judgment calls on precisely which values to include in computing APACHE scores. I certainly wasn’t about to copy *all* the data into some master database and then use the one value I wanted (justifiably so, IMO, of course). This is an example of those “researcher degrees of freedom” that Dr. Novella discusses.

    Along the way, we had meetings, I explained the issues with the database, how I fixed them, and what my rationale was. I had my PI do periodic spot checks to make sure I agreed and we had all the criteria for the chart abstraction laid out a priori.

    So now that I am procrastinating actually authoring the dang thing, what would I submit as my raw data to PLoS if that is where we were looking to publish (we aren’t, so it is purely an academic point for me)? The raw-raw data is sort of a mess and uses a lot of internal shorthand. The cleaned up data looks different and is, in our opinion, justifiably “cleaned up,” but is not raw. And my own chart abstraction data is simply whatever the hell I put in there.

    At any point, if someone wanted to contest my results, they could question my rationale for excluding this or that or argue that dropping irreconcilable patient data skewed the results or would simply be confused as to how raw data became the data I actually used. All of it would be, IMO, explainable and justifiable. And luckily my research topic is not hotly contested by ideologues and trolls (thankfully there is no “anti-EGDT in sepsis” group like the anti-vaxxers). But if there was, and someone did want to expound on the details in such a manner, what would I do? As first author I would be the one to have to field and address these issues, which would take enormous time. If I ignore it (could I?) then what would that say?

    I really don’t know where I stand on it given the comments. I tend to agree with Dr. Novella in terms of transparency and making data more open. And I think it is also inevitable. And I think that holding data “hostage” for ransom is not the funding model we should use in science. But I can also see the very real concerns presented by Angora and others. I can see how if I did have to submit my dataset, it would, at a minimum, mean that I would spend many extra hours making it look really clean and polished and easy to understand. I would do it with integrity – as would, I believe, the vast majority of researchers – but those who would do otherwise will have the perfect excuse to present, as Dr. Gorski says, “bullet proof” data.

    I guess there is just a lot of space in the “unintended consequences” aspect that may just have to be sussed out. At the end of the day, I think the most reasonable approach is a trial run – with the foreknowledge that it is a trial run – and see what happens. Once it is tweaked and improved on a smaller scale it can be expanded. The caveat is that this needs to be done intentionally rather than just saying “Data transparency is paramount, submit all data.” and thinking is is an effective bandaid slapped on the problem and then ignoring it some more until we can no longer ignore it again. In other words, being proactive and seeking laudable goals rather than being reactive when shit goes awry.

    Or I could be totally off in my thinking.

  16. PMoran says:

    What kinds of scientific question are in the minds of those involved in the above discussions?

    I would have thought that wherever there are important practical medical consequences to the conclusions being drawn there is little excuse for not having primary data immediately available upon request (perhaps not necessarily immediately published).

    If the matter relates to tiny correlations between this anomaly and that influence and, as is being now implied by some, expert statisticians cannot agree on how to decide the matter anyway, there seems little point to publishing massive amounts of data; the strategy would be to rely upon replication (or not).

    Why would not various fields develop their own conventions, based upon reasonable compromises for their own purposes? Rarely is any study the final word on anything.

    I can agree with keeping government committees out of it.

    1. rork says:

      Big clinical trials rarely cough up data – let others correct me if I err – I’ve never been satisfied.
      For the other 99% of papers, more data is good, and I’d expect more honesty to result. Even earthbound statisticians agree that there are way over 50 ways to be vague, suboptimal, or false.
      Massive amounts of data is the perfect example of sharing being good – maybe I don’t care what you think about your data, I have been charged with different questions that your data may shed light on.
      The reasonable compromises sentence is very good.

  17. Coco says:

    As a social scientist who works with qualitative data, this just sounds like a waking nightmare. Journals in my field already have a huge bias towards quantitative projects; this will just reduce the willingness of qualitative researchers to consider PLoS as an alternative to the established journals.

  18. windriven says:

    So far I see many complaints in the comments but little in the way of constructive suggestions. If the problem isn’t being discounted and if this isn’t the solution, then what is?

    1. rork says:

      It’s a project we have to work toward solving. I am not an expert on data in every domain, but people that are have opinions about what is appropriate. It may be hard. For most of the data I deal with (like array data, PCR data, protein assays, other routine stuff like size of tumors vs. time, repeated measures of biomarkers, number of cells vs. time, when and how the mice died or got tumors, how invasive the cells were in an assay) I will be able to comply easily and agree reader deserves no less. It could actually be an opportunity to up the game of every lab, by encouraging best practices in reporting and analysis.

      1. windriven says:

        With all due respect, this isn’t a new issue. If this is “a project we have to work toward solving” I would expect a bit more constructive input, even if no less bitchin’.
        I’m sensitive to the many concerns raised above. But those concerns amount to ‘don’t gore my ox’ rather than ‘(the PLoS initiative) was one idea, here’s a better one …’

  19. Frederick says:

    Good read, It remember the article Dr. Novella posted on JREF website :
    Also a good article.

    The problem with open and too much transparency is the exact same problem that the internet as created, As good a tool internet can be, it has helped fear mongerers, conspiracy, rumours of all short, and fraud to persist, And AS Dr. Gorski Point out, Anti-vax, Pro CAm and all sort of cam will twist that to their advantage, they already “own” the media, and manipulated them as they want.
    On this Subject, Gerald Bronner, a French sociologist, have releases in 2012 a good book on that. “La démocratie des crédule”, I don’t think it was translate in English. He Explore how Internet and the media help in spreading rumours, lies and fraudulent facts. How the mainstream media, who have difficulty competing with internet, do not filter and verify informations as much as the use too.
    Anyway, even if PLOS is a different media, the increasing mass of information on the internet ( true, false and all kind of Info) are a good tool, and good for the democracy, but it create big problems also.
    I think with all the trolls that came here, we can all see it in our own eyes.

  20. Angora Rabbit says:

    Correction to an earlier post: just learned my campus is already conveying an expert group to figure out how to meet apparently-forthcoming Federal mandates to deposit data. So that train is already on the track.

    I don’t know what the solution is. If I did know, they’d put me on the damn committee, which is no reward.

    I personally think the solution is collaboration and replication, as it has been for many decades. Moreover, collaboration and replication should be rewarded, not discouraged. I am happy to share, for example, our lab’s terabytes of RNA Seq data, provided we’re given a co-authorship (could be buried) because we’re the ones who hustled to generate the funds and work effort to generate the data. (That’s an answer for Rork.) And for people who don’t share? Well, in the old days they used to get ostracized and word got out.

    1. rork says:

      Rabbit – we hustled to get the tumors and create the array data too, but we’ve had to cough it up upon first publication since about 2006, and voluntarily before that. To have it be otherwise means that the people who do not share their data are rewarded. It also improves honesty – the people not sharing are the same ones with the most dubious claims in their papers, but they hide behind “it was so much work and expense to get that data”, as if that weren’t the case for all the other groups. (They use our data early and often, but I can’t get theirs.) I refuse to give in to inappropriate extortion since I don’t want to reward outlaws for their misdeeds, thus encouraging others to be outlaws. I think their actions are bad for science and the patients. That can be disputed I admit, but I’ll bet you NIH doesn’t dispute it – many grants given are more to buy the data (and code) generated for the broader community, than to fund a few particular lab’s investigation of that data. There’s been no public backlash in the array community, just isolated outlaws.

    2. Sawyer says:

      “Well, in the old days they used to get ostracized and word got out.”

      I find it odd that scientists don’t make it known to the public that this wonderful little tool is already in place and that it works pretty damn well. Every field of science seems to have that one jackass that people whisper about – “Don’t work with Professor Plum, he’s really sloppy and takes all the credit for your discoveries. You’ll manage to get one good paper out of the collaboration and then his data will turn out to be complete crap.”

      I suppose scientists don’t like to reveal the ugly truth because it tarnishes the image that academia is a perfectly objective beacon of light and hope, but I don’t think anyone buys into this illusion anyway. Just tell people that cheaters are already punished by their peers and that no one saves them donuts at morning meetings.

  21. Self Skeptic says:

    Dr. Novella,
    Good post.

    Readers, see also:
    by Dr. Novella.

    That post starts with a link to this article, in The Economist:

    The last paragraph of the Economist article:

    “And scientists themselves, Dr Alberts insisted, “need to develop a value system where simply moving on from one’s mistakes without publicly acknowledging them severely damages, rather than protects, a scientific reputation.” This will not be easy. But if science is to stay on its tracks, and be worthy of the trust so widely invested in it, it may be necessary.”

    1. Self Skeptic says:

      Note: acknowledgment is due to Frederick above, who already linked to the randi dot org post by Dr. Novella. Thanks – that’s how I got there.

  22. DE Sheridan says:

    Here’s the catch to PLOS.
    PLOS charges a publication fee to the authors, institutions or funders for each article published.
    Publication fees vary by journal and are payable for articles upon acceptance.

    PLOS Biology $2,900 USD
    PLOS Medicine $2,900 USD
    PLOS Computational Biology $2,250 USD
    PLOS Genetics $2,250 USD
    PLOS Pathogens $2,250 USD
    PLOS Neglected Tropical Diseases $2,250 USD
    PLOS ONE $1,350 USD

    I commend PLOS for coming up with a different method of publication – to allow for free public access to scientific information.

    But I see this type of a paywall – at the researcher level – as a further invitation to skewed data. This is already a problem in the main scientific media – where the likelihood that the article would be paid for to read is used as a tool to select which will be published.

    Are researchers going to pay to publish research to check someone else’s findings? Are researchers going to pay to publish research where their hypothesis is proven incorrect? Doesn’t this mean that the researchers and institutes or industries with the most money will likely determine what is published?

    I don’t think the PLOS method of publication is the answer.
    I think this scheme will add to the skewed data that is out there.

    A better scheme would involve universities publishing all of their research data, without charge, online in their own digital journals.

    Research done by business and industry should be required to pay universities a fee for their data to be published in a university’s journal (and industry funded research should only be allowed to be published in one journal – to limit this as a potential advertising method). Their data (and all data) should be subject to scientific peer review that scrutinizes method, analyses and conclusions.

    The funding source(s) for all research should be obligatory at the end of the paper.

    If scientific information was made publicly available free of charge like this – under the umbrella of educational institutions – this would have multiple advantages:

    1. Scientists and the public would have free access to a wide range of peer reviewed data. This is likely to encourage further research and advances in knowledge.

    2. This would reduce the perverse selection methods by publishers – and the skewed effect arising from these choices.

    3. Just as Paid Propagandist News is an anathema, so are Paid Propagandist Research research papers. Using this method would also likely increase scrutiny of research made by businesses and industries and published in their interests.

Comments are closed.