What kind of a bioinformatics project is this?

So far, we've heard about zebrafish larvae, but where does the actual bioinformatics come in?

This paper is a transcriptomics study, so the authors used RNAseq to measure the expression of genes. Even if you don't do transcriptomics, though, the methods of this analysis can be relevant to your own research. Ultimately, the workflow is about how to put together software tools and process many sequences in batch.

The authors explain their workflow under "Bioinformatic analysis of RNA-seq data" in the paper, and that's the series of steps we will later address using bash scripting (Long et al., 2013). This is a form of pipelining, or sending your data through a series of steps for processing, managing, and analyzing your data. In the case of this project, the series of steps is from raw reads to merged reads, and then using a merged reference file to perform differential expression analysis on each of the original files. Differential expression analyses are sort of like bread and butter in the world of transcriptomics (or really most sequencing studies). The basic idea is that we're comparing how many copies of a given gene are needed by the organisms under different conditions.

If, for example, the zebrafish is under cold stress, we might expect genes involved in metabolism to be downregulated as the organism decreases its growth in response to adverse conditions: a last-ditch effort to survive. So if we do a differential expression analysis, we might find genes responsible for metabolism when we compare our cold-stressed to our control organisms, even without knowing where in the genome or the transcriptome these genes are, or even what they do. Genes that show increased expression or substantially less expression are the interesting ones that warrant further investigation.

The authors deposited their sequences to the Sequence Read Archive (SRA) managed by the National Center for Biotechnology Information (NCBI). The fact that they did this enables us to go back to the beginning and retrace their steps. They have also published a number of files of output, including p-values from statistics that they ran. All of these data are in the GitHub repository in the data folder, as the original Excel files provided by the authors as well as the transformed csv files that we used in our reproduction.

Long, Y., Song, G., Yan, J., He, X., Li, Q., & Cui, Z. (2013). Transcriptomic characterization of cold acclimation in larval zebrafish. BMC Genomics, 14(1), 612.