Sunday 8 December 2013

Pre-print servers: dangerous ways to lose your ideas, or a way to accelerate science and your career?

In this post Dr Jacob Scott, a third-year D.Phil. student at the WCMB, discusses the use of pre-print servers in biology. You can find Jake's own research blog here.

Dr Jacob Scott, WCMB

My name is Jake Scott and I am a third-year D.Phil. student in the WCMB. I'm a whole lot more B than I am M, or as we say, I am a mathematical Biologist rather than a Mathematical biologist. Really - that's not actually fair either, as I'm not a biologist, I'm a medic, a member of that part of scientific society least likely to share our data and knowledge until it is safely published...

However, I'm writing today to support the use of pre-print servers (something I've blogged about before) - open access repositories for our work BEFORE it is published officially.  The mathematics and physics communities have known about and used pre-print servers for a long time, in the form of the arXiv, as a way to speed up the pace of science.  It was the high energy folks who started the whole thing, as they found it frustrating that it could take up to a year or more before their results were available to their colleagues, a lag time that causes work to be duplicated and effort wasted.  This movement has made some progress into other sciences, like computer science, quantitative finance and recently, quantitative biology, but you'll notice the common theme: it's all quantitative. So, how do these quantitative theories/hypotheses get communicated to the experimentalists? In short: they don't.

The open access movement, starting with the PLoS library of journals, has begun to chip away at the problems associated with access in general, but they do nothing to support the speed up of science that these pre-print servers allow; they still have the lengthy peer-review process. There have been a number of attempts to get biological scientists involved, most notably by Nature with their "Precedings", which offered much the same services as the arXiv: a place to post non-peer reviewed research. This venture failed however, because a significant proportion of the papers put on to the server early on were thinly veiled pseudo-science: creationism and the like. There are a number of other options, including F1000, PeerJ and cancer commons, but they also haven't taken off in the way that the arXiv did. (These are nicely reviewed in PLoS Biology, here.) I think that part of the reason for this is that they are all journals with pre-print repositories on the side - it isn't their primary focus.

To answer this, Cold Spring Harbor, a highly regarded biology lab in New York, has started a dedicated biological pre-print server called the bioRxiv. They have adopted a system by which pseudo-science should be weeded out - where "affiliates" quickly screen pre-prints, but importantly, DON'T REVIEW THEM. Articles are given DOIs and are date stamped. You can choose your level of copyright based on your own comfort (like on the arXiv). Once the pre-print is there it can be cited, shared with potential collaborators and found through Google Scholar. They have also implemented a number of altmetrics, as well as the capability to share your articles using the social networking platform of your choice.  Further, as far as scientific precedent is concerned, it is established upon uploading the pre-print.

For me, it is a no-brainer. When I am ready to submit a manuscript to a journal, I post the pre-print and then submit in the same breath. On each successive round of revisions from reviewers, I upload a new version to the pre-print server. You aren't allowed to upload the journal's version of the manuscript with their typesetting, but beyond that, the version that you'll find of my papers on the arXiv (or bioRxiv now) will be the same as you'll find published - circumventing the paywall for those who can't afford it and making the research available that much sooner. I personally don't understand ANY of the excuses I hear for not posting. I actually think they are most based on habits and unfounded fears. I did a survey about this and you can find the results here.

I think for young scientists, adopting this practice is especially beneficial. Obviously, before you post anything, you have to obtain permission from all authors, so your PI's decision will still have to take precedence, but utilizing these servers can get your name and your work out to the community months or even years before it would be if it went through normal channels. Further, you can get valuable feedback from potential reviewers and others in your field, especially if you use one of the pre-print discussion forums like Haldane's Sieve for population and evolutionary genetics or Warburg's Lens for mathematical oncology.

In summary, pre-print servers speed up the pace of science, get your work out there free of charge to all, are time/date stamped and establish precedence, and are just generally good practice. If you work in a purely quantitative field, the physics arXiv is probably your best bet, but if your theory has any biological underpinnings and you want experimentalists to have a chance to see it, I suggest giving the new #bioRxiv a try.

Some helpful links:


  1. Interesting stuff. I Personally I still have to make up my mind around the preprint submission idea. I wonder if this will have an impact on citation dynamics. Take, for example a manuscript that will turn into a highly regarded, ground braking report. People will cite it very soon, probably before the peer-reviewed version comes out in a 'regular' journal. Once the preprinted version gets citations, many people will keep citing this one instead of the final paper. This is not necessarily bad, but I wonder if it's going to be a bit of chaos.

    1. My view is that citation counters are, or will become, good at including citations for the preprint when counting article citations. Plus, the preprint can easily be updated to the peer-reviewed version later on, and this frequently happens.

    2. I agree with Linus - google scholar already picks up the citations for the preprints and you can merge the pre-print with the standard pub yourself already and the citations sum up. I'm sure it won't be long before google scholar does this by itself, and then scopus and pubmed will have to catch up!

    3. I see the ability to post new revisions to preprint servers as the ONLY sensible approach to split citations. As a programmer I'm buffled how the publishing process fixates on the *event* of publishing - a single version frozen in time - and ignore the paper as an evolving artifact.

      If the paper is corrected / expanded / published in another venue, it's left to the others to discover all the versions accross the internet and guess which is "best" to read—or cite. There is no single "home" that covers the whole life of a paper.
      Preprint servers allow citing a specific revision but make older/newer revisions easily accessible, which is the closest we now have to such a "home".

      (In many cases the authors never update a paper, but if tens of people have read it, surely they had useful comments! Accumulating comments is also a way for a paper to evolve; it's a shame it's under-supported and under-used where supported...)

  2. This comment has been removed by the author.