Reproducible Research
is
Research Software Engineering
David Mawdsley, Robert Haines and Caroline Jay
RSE 2017 Conference
Reproducible research
Publishing reproducible research
- How it changes the publication model
- Why this is a good thing for research
- Why this is a good thing for research software engineers
Self-contained reproducible research papers
- We can write our paper in LaTeX or Markdown, including R code as required
- Everything is in R; the only external dependency is the data
- In principle (pretty much) a solved problem
- Some friction points:
- software versions → Docker or Packrat
- formatting, e.g. tables
- collaboration; e.g. working with Overleaf
What if you can’t do everything in R?
- Complex dependencies
- Time consuming-analyses
- Long pipelines
Our approach
- Make modular by containerising each step using Docker
- Reusable, reproducible
- The final module makes the paper
- Join outputs of containers with Makefile
- Or a workflow management tool
Example - IDInteraction
- Automate the coding of behaviours
- This is really slow and tedious to do by hand.
Docker images
- Each module contains its own Makefile
- Example: object tracking
Docker images
- Each module contains its own Makefile
- Example: object tracking
Docker images
- Each module contains its own Makefile
- Example: object tracking
Challenges
- Additional complexity
- Pipeline can be difficult to debug
- Requires Docker
- Error handling
- Top-level Makefile can become unwieldy
- Have started using Nextflow for parts of the process
Benefits
- Transparency
- Allows others to re-run and extend our analyses
- Re-run, to verify
- Re-run, to modify (e.g. bounding box)
- Re-run, to extend (e.g. new data, new tracking methods)
- Moves away from the static “one-shot” publication
A new publication model
- Improves reliability
- Improves efficiency
A new publication model
- Improves reliability
- Improves efficiency
- Improves effectiveness
A new publication model
- Improves reliability
- Improves efficiency
- Improves effectiveness
- Accelerates progress
Reproducible research / research software engineering
- Narrative remains important
- paper construction is becoming less a literary work, and more a software engineering project
Reproducible research / research software engineering
- Narrative remains important
- paper construction is becoming less a literary work, and more a software engineering project
- RSEs are integral to this process