Training course: Tools for reproducible research
One of the key principles of proper scientific procedure is the act of repeating an experiment or analysis and being able to reach similar conclusions. Published research based on computational analysis, e.g. bioinformatics or computational biology, have often suffered from incomplete method descriptions (e.g. list of used software versions); unavailable raw data; and incomplete, undocumented and/or unavailable code. This essentially prevents any possibility of reproducing the results of such studies. The term “reproducible research” has been used to describe the idea that a scientific publication should be distributed along with all the raw data and metadata used in the study, all the code and/or computational notebooks needed to produce results from the raw data, and the computational environment or a complete description thereof.
Reproducible research not only leads to proper scientific conduct, but also enables other researchers to build upon previous work. Most importantly, the person who organizes their work with reproducibility in mind will quickly realize the immediate personal benefits: an organized and structured way of working. The person that most often has to reproduce your own analysis is your future self!
In this course you will learn how to make your data analyses reproducible.
In particular, you will learn:
- Good practices for data analysis and management
- How to use the version control system Git to track edits and collaborate on coding
- How to use the package and environment manager Conda
- How to use the workflow manager Snakemake
- How to use R Markdown to generate automated reports
- How to use Jupyter notebooks to document your ongoing analysis
- How to use Docker and Singularity to distribute containerized computational environments
NBIS Reproducible research course from ELIXIR Sweden