Reproducible Research

is

Research Software Engineering

David Mawdsley, Robert Haines and Caroline Jay

RSE 2017 Conference

Reproducible research

Publishing reproducible research

  • How it changes the publication model
  • Why this is a good thing for research
  • Why this is a good thing for research software engineers

Self-contained reproducible research papers

  • We can write our paper in LaTeX or Markdown, including R code as required
  • Everything is in R; the only external dependency is the data
    • In principle (pretty much) a solved problem
  • Some friction points:
    • software versions → Docker or Packrat
    • formatting, e.g. tables
    • collaboration; e.g. working with Overleaf

What if you can’t do everything in R?

  • Complex dependencies
  • Time consuming-analyses
  • Long pipelines

Our approach

  • Make modular by containerising each step using Docker
    • Reusable, reproducible
    • The final module makes the paper
  • Join outputs of containers with Makefile
    • Or a workflow management tool

Example - IDInteraction

  • Automate the coding of behaviours
  • This is really slow and tedious to do by hand.

Docker images

  • Each module contains its own Makefile
  • Example: object tracking

Docker images

  • Each module contains its own Makefile
  • Example: object tracking

Docker images

  • Each module contains its own Makefile
  • Example: object tracking

Challenges

  • Additional complexity
    • Pipeline can be difficult to debug
  • Requires Docker
  • Error handling
  • Top-level Makefile can become unwieldy
    • Have started using Nextflow for parts of the process

Benefits

  • Transparency
  • Allows others to re-run and extend our analyses
    • Re-run, to verify
    • Re-run, to modify (e.g. bounding box)
    • Re-run, to extend (e.g. new data, new tracking methods)
  • Moves away from the static “one-shot” publication

A new publication model

  • Improves reliability

A new publication model

  • Improves reliability
  • Improves efficiency

A new publication model

  • Improves reliability
  • Improves efficiency
  • Improves effectiveness

A new publication model

  • Improves reliability
  • Improves efficiency
  • Improves effectiveness
  • Accelerates progress

Reproducible research / research software engineering

  • Narrative remains important
    • paper construction is becoming less a literary work, and more a software engineering project

Reproducible research / research software engineering

  • Narrative remains important
    • paper construction is becoming less a literary work, and more a software engineering project
  • RSEs are integral to this process