Reproducibility in Statistics and Data Science (Discussion)
Yihui Xie, RStudio
2016/08/03 @ JSM Chicago
Me (not really a nice guy)
Jenny
- I admire Jenny’s courage to tackle this problem
data:image/s3,"s3://crabby-images/af83e/af83e9da44f950b42e48b296e65ce048d658cede" alt=""
- problems with spreadsheets
- can store any mess
- actions not recorded
- The Importance of Reproducible Research in High-Throughput Biology (Keith Baggerly) https://youtu.be/7gYIs7uYbMo
- two possible solutions
- spreadsheet users, we hate you
- or perhaps we can help
- I wish spreadsheets could die eventually, but this is probably not possible or will take an extremely long time
- how do we encourage people to switch to better tools/approaches?
data:image/s3,"s3://crabby-images/84edb/84edb9709ee44ab4b76c4bd9e4674ee01cde395d" alt="fig:"
Karl
data:image/s3,"s3://crabby-images/bab6c/bab6cabbda5ee1cba58d166c431a7e3dd88592ac" alt=""
- clever shortlink http://bit.ly/jsm2016
- steps toward reproducible research
- personally I tend to avoid the phrase “reproducible research” (e.g. in my book “Dynamic Documents with R and knitr”)
- click click click vs type type type
data:image/s3,"s3://crabby-images/f8a1f/f8a1f82ab08fb76ccf91b692dd441c07d88803af" alt=""
- In Code We Trust
- Karl is pretty good at writing short tutorials, so please don’t buy my book but read this instead http://kbroman.org/steps2rr/ (and a few other ones)
Karthik
- computational reproducibility
- reproducible != correct, but better than not being able to reproduce
- testing in a nutshell
if (output != expected) stop("did not get expected output")
- e.g. you can use testthat from the Hadleyverse (I use my little package testit)
- research with big data / intensive computing?
- making reproducible research tools more accessible
- incentives
Mine
- perspective from education
- reminded me of my (dark) life as a PhD student
- an old and evil idea to make students realize the convenience of reproducible research
- technological advances