Why should biologists use GitHub?

Screen Shot 2016-05-08 at 3.45.28 PM
from GitHub repository here

Today, all biologists use computers on a daily basis, produce and analyse data. A lot of us now have to learn programming (even just some bits of it).

I am not a computer scientist/engineer, and I am far from a bioinformatics person (yet), but I have started to discipline myself to make my published research as reproducible as possible and this is not only by depositing DNA sequence data to NCBI, but also analysis pipelines and command lines made available to the public on GitHub.

GitHub is a great public repository hosting service for publishing programming source code, but it can also be used to detail your analysis pipeline and code, and even create tutorials on softwares or pipelines for others.

For example, the Trinity tutorial from Brian Haas was a life saver for a biologist like me that had never touched any next-generation sequencing data.

As a biologist, to support my recent publication on scale insect phylogenetics, I created a GitHub repository that details all the steps and command lines in MrBayes and R analyses. This provides transparency to the reader and more rapid reproducibility.

Tip: If you are worried about your analysis pipeline or data being online during the manuscript review process, academic researchers can apply for free space for 5 private repositories. For more information, check here.

Nowadays, a lot of biologists will come to work in a multidisciplinary environment, and it implies learning new skills. In bioinformatics in particular, workshops are available but the internet is a great resource to learn skills by ourselves and GitHub can help both learning how a software works, but also making the details of informatics methods available to other biologists that are also learning how to use these softwares (from command lines for de novo assembly using Trinity or making a simple plot with R).