{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Breaking Down the Process\n",
"\n",
"Here, we'll make sense of the bioinformatics process used by the authors, step by step, and we'll provide one example of a workflow that you can use to do the initial \"bioinformatic analysis\" that the authors are talking about yourself. To do this, we'll use the `bash`, which is a way to talk to the machine when you're using an operating system that's driven by some form of _Unix_. This includes Mac OSX, if you're using the Terminal on Mac, or the command line interface on some kind of Linux system. `bash` isn't so much of a programming language as it is a way to talk to the computer. It's almost an `interpreter`, or a translator of commands, but it also supports the types of patterns we expect in code, including _loops_, _if-statements_, and _variable assignment_. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Don't worry, we won't do anything too crazy in `bash`. On this page, you'll find everything you need to know to be able to use `bash` well enough to create the script that we did for analyzing the authors' data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Starting a `bash` script: `bash` headers\n",
"\n",
"Especially if you have any experience coding in other languages, the `bash` header can look pretty confusing. Below is what our header looks like for this tutorial and something that you might expect to see or use when you're building `bash` code more generally. \n",
"\n",
"```shell\n",
"#!/bin/bash\n",
"#SBATCH --partition=scavenger\n",
"#SBATCH --qos=unlim\n",
"#SBATCH --time=10000\n",
"#SBATCH -n 8\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The first line, \n",
"\n",
"```shell\n",
"#!/bin/bash\n",
"```\n",
"\n",
"signals that we're using `bash` to run this file. If we were running `Perl` or another language, we could also put that path here. `/bin/bash` is the path to the _bash application_ , which most of the time we assume is installed at that default path (but if, for some reason, you only had `bash` installed somewhere else, you would change that. \n",
"\n",
"The `#!` character combination has an awesome name, the **_Shebang_**! It's just the script's way of saying \"_Run me using XX application_\", in this case `bash`. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The rest of the lines in the header relate to the scheduler. In this example, we're using `SLURM`, a very popular scheduler, or job submission manager, used on _high-performance computing_ systems. \n",
"\n",
"The first line tells the scheduler what _partition_ we want the run to happen on, which is basically just one of a series of sub-computers available on the big computing system. The second line tells the scheduler about our permissions, the third tells the scheduler the maximum amount of time we want this job to be able to run (it will stop early if the job finishes) and the last gives the number of _threads_ we want to be able to use, for tasks that we can run in _parallel_, or multiple tasks at the same time."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Defining Variables\n",
"\n",
"`bash` is a little particular about how certain variable types are defined. \n",
"\n",
"* Never leave spaces between the variable name, the equals sign, and what you're assigning the variable as\n",
"* Variables never have to be typed (think Java, or don't worry about it if you have no idea what this is talking about)\n",
"* Lists are defined as space-separated strings\n",
"\n",
"Here's an example list: \n",
"\n",
"```shell\n",
"myList=\"dog cat frog\"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Loops in `bash`\n",
"\n",
"Loops in `bash` have three major lines that they must have: `for`, `do`, and `done` (yay!)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```shell\n",
"for curr in $myList\n",
"do\n",
" echo \"$curr\"\n",
"done\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You'll notice we put a dollar sign before \"myList\". This is how we refer to a variable in `bash`. We also have to use quotations to call our variable in the loop, because this differentiates the variable name from the commands, which can be easily called from `bash`, unlike in programming languages like Python, where you probably have to call a command from a package, as in `os.system` (Python; again, don't worry about it if you don't know what this means! You'll learn!)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}