Programming

The programming languages that we most commonly use for computational biology research are Bash, R, and Python. We build pipelines for processing raw data in Bash, as well as a workflow language (e.g. WDL). We generate plots, perform statistical analyses, and analyze preprocessed genomic data using R. We develop novel statistical models in R, often along with Stan. We use Python for text processing, data formatting, as well as constructing deep learning models. We sometimes re-implement parts of a program in C++ in order to improve its computational speed and memory efficiency; additionally, we use C++ and Rust to implement programs to solve problems for which computational efficiency is key.

General

In order to ensure that our work is reproducible and our code is high quality, we adhere to common best practices of software development, including unit testing and version control.

The Software Carpentry has many high-quality lessons geared towards beginners and self-learners for Shell/Bash, Python, and R. After working through their lessons, you can check out the resources below for deeper dives.

Bash

Quick Bash

R

Python

Other languages

The niche languages below can be very useful for specific applications.

Rust

Rust Programming Language

Julia

Introduction to Julia

Exercises

R
Python
C++
Bioinformatic pipelines: exome-seq and RNA-seq
Statistical modelling

Twitter Facebook LinkedIn