Bioinformaticians use a lot of jargon when they write up their methods, and for relatively good reason: soon you'll be Unix acrobat and a coding master, and some of those terms will save you a lot of time. At the same time, bioinformatics is a unique field within biology because almost the whole paper is a series of methods done inside the computer. Often, as in our sample paper, there is not a lot of experimental information presented, because there wasn't much too it. We can get a ton of information out of just one sample sent for sequencing, so in this case, the authors cold shock their larvae and call it a day. The bioinformatics part of the methods, though, that's a whole other story.
This is everything that the authors say about the bioinformatics bit of their methods:
The raw reads were trimmed and filtered using PRINSEQ (version 0.19.3) . Low quality (Q < 20) and ambiguous bases (N) were first trimmed from both ends of the reads and the trimmed reads were filtered with Phred quality score (Q ≥ 20 for all bases) and read length (≥ 25 bp). Paired reads were extracted using cmpfastq (http://compbio.brc.iop.kcl.ac.uk/software/cmpfastq.php). Read mapping, transcript assembly and differential expression analysis were performed according to the protocols described previously . Briefly, the preprocessed reads were mapped to the genome sequence of zebrafish (Zv9.68) using TopHat (version 2.0.4)  with default parameters except “--segment-mismatches 1” and “--segment-length 18”. The aligned reads were assembled into transcripts using Cufflinks (version 2.0.2)  with the following parameters “--frag-bias-correct, --multi-read-correct, --library-type fr-unstranded, --upper-quartile-norm, --total-hits-norm”. The assembled transcripts were merged with the reference annotation (Danio_rerio.Zv9.68.gtf, downloaded from Ensembl) using cuffmerge. Differential expression analysis was performed using cuffdiff with the parameters “--upper-quartile-norm” and “--total-hits-norm”; the merged assembly and the fragment alignments generated by TopHat were used as input files. Calculation of mapping statistics, sorting and indexing of the read alignment files were performed using SAMtools (version 0.1.18) . The mapping and assembling results were viewed via the IGVtools (version 2.1) . (Long et al., 2013)
Unfortunately for us, that's a whole lot of steps. Don't worry, in the next part of the tutorial, we'll go through each piece, one by one. After that, we'll compare what we got to the files that they provide in their supplementary data, and try our hand at making some of the visualizations that they include in their paper.
- Long, Y., Song, G., Yan, J., He, X., Li, Q., & Cui, Z. (2013). Transcriptomic characterization of cold acclimation in larval zebrafish. BMC Genomics, 14(1), 612.