What is reproducibility, anyway?

One of the first things many people learn in science class is why reproducibility and replicability are two huge tenets of science and the validity of the scientific method. Replicability is being able to take someone's experimental design and completely create their experiment again and get the same results. Replicability, on the other hand, is just being able to take someone's final results from their experiment, re-create their analysis, and end up with the same answer (Begley & Ioannidis, 2015).

For practical reasons, we often talk about reproducibility over replicability, because people often repeat people's experimental procedures to make sure they are replicable, but re-doing a researcher's experiment for the purpose of making sure you can get the result again is often cost-prohibitive, not to mention there are a variety of reasons you might get different results at a different time or in a different place, especially in biological research (Begley & Ioannidis, 2015). What's important is that their methods are thorough and that someone else could use them (as well as a general inspection to make sure there is no evidence of fabrication in the data).

So then reproducibility should be easy, right? At the very least, a lot easier than replicating the whole experiment. And true, it would be a lot more time, money, and inconvenience for us to replicate the experiment in addition to doing the bioinformatic analyses. However, in computational work, a lot of the workflow actually happens after all of the physical samples have been processed. And because of things like software versioning, system compatibility, and just plain difficulty of explaining code, especially in the early stages, reproducibility in computational fields can be hard.

This tutorial is a step-by-step tour of reproducing a bioinformatics paper. If you like what you see, you can do the whole thing, end-to-end, for the Reproducibility Challenge!

Begley, C. G., & Ioannidis, J. P. A. (2015). Reproducibility in science: improving the standard for basic and preclinical research. Circulation Research, 116(1), 116–126.