Reproducibility in Statistics and Data Science (Discussion)
Yihui Xie, RStudio
2016/08/03 @ JSM Chicago
Me (not really a nice guy)
Jenny
- I admire Jenny’s courage to tackle this problem
- problems with spreadsheets
- can store any mess
- actions not recorded
- The Importance of Reproducible Research in High-Throughput Biology (Keith Baggerly) https://youtu.be/7gYIs7uYbMo
- two possible solutions
- spreadsheet users, we hate you
- or perhaps we can help
- I wish spreadsheets could die eventually, but this is probably not possible or will take an extremely long time
- how do we encourage people to switch to better tools/approaches?
Karl
- clever shortlink http://bit.ly/jsm2016
- steps toward reproducible research
- personally I tend to avoid the phrase “reproducible research” (e.g. in my book “Dynamic Documents with R and knitr”)
- click click click vs type type type
- In Code We Trust
- Karl is pretty good at writing short tutorials, so please don’t buy my book but read this instead http://kbroman.org/steps2rr/ (and a few other ones)
Karthik
- computational reproducibility
- reproducible != correct, but better than not being able to reproduce
- testing in a nutshell
if (output != expected) stop("did not get expected output")
- e.g. you can use testthat from the Hadleyverse (I use my little package testit)
- research with big data / intensive computing?
- making reproducible research tools more accessible
- incentives
Mine
- perspective from education
- reminded me of my (dark) life as a PhD student
- an old and evil idea to make students realize the convenience of reproducible research
- technological advances