Reflections on the 12 Years of R Markdown from a Developer’s Perspective

1 What’s R Markdown?

A 10-second introduction:

Write _narratives_ in Markdown, and _computer code_ in backticks:

```{r}
mean(rnorm(100))
```

Write **more** code chunks and narratives.

The idea come from Knuth’s Literate Programming.

Literate programming = documentation language + computing language

2 How it started

I was fascinated by Sweave¹ in R when I came across it around 2007
- Sweave = LaTeX + R
but found it too limited after using it for 4 years (for most of my homework assignments)
I started developing the knitr package in 2011, which became a backend of R Markdown
LaTeX as the documentation language is too difficult for beginners, and Markdown is much simpler
R Markdown = Markdown + computing languages (not limited to R)

3 Did Markdown’s simplicity work?

Appears so.

Number of files on Github:
- *.Rnw (R + LaTeX): 0.02M
- *.Rmd (R Markdown): 0.78M
- *.ipynb (Jupyter): 5.3M
Another piece of evidence: 1600+ books on bookdown.org.

4 R Markdown in 2012 vs 2024

	2012	2024
Output formats	HTML	HTML, LaTeX, PDF, Word, PowerPoint, EPUB, …
Applications	Reports	Reports, slides, articles, books, websites, dashboards, …
Backend	sundown (a tiny C library)	Pandoc
R package dependencies	0	25
Package size	~2Mb	~83Mb (Pandoc: 152Mb)

The above comparison does not include R Markdown extension packages such as bookdown and blogdown, which are even heavier.

5 What makes reproducibility hard?

Reproducibility is hard without stability, i.e., we don’t want something worked yesterday but breaks today.
As a software developer, I’ve gradually become tired of adding features endlessly and managing dependencies.
It feels like playing an infinite game.

5.1 Features

We chose Markdown for simplicity, right?
If we make Markdown do everything, is it still simple?
- To some degree, we are turning the simple Markdown into something complicated like XML.
The more features we add, the more challenging it is to maintain the software. We may change our mind or regret, and deprecate certain features in the future, which means… breakage!

We all hate breakage

The ActivePapers case

[…] While it has achieved its mission of demonstrating that unifying computational reproducibility and provenance tracking is doable and useful, it has also demonstrated that Python is not a suitable platform to build on for reproducible research. Breaking changes at all layers of the software stack are too frequent. The ActivePapers framework itself (this project) uses an API that was removed in Python 3.9, and while it can be updated with reasonable effort, there is little point in doing so: Published ActivePapers cannot be expected to work with a current Python stack for more than a year.

[…] If you came here to learn about reproducible research practices, the best advice I can give is not to use Python.

5.2 Dependencies

Some examples in my software development career:

Pandoc (relatively stable, but breakage/regression happens)
Bootstrap (v2, v3, v4, v5, …)
jQuery (v1, v2, v3, …, and security issues)
Hugo (moving too fast and hard to follow)
GitBook (impossible to maintain after importing into bookdown)

5.3 Hey, I know how to fix the dependency hell

Yes, users could adopt tools like renv, virtualenv, or even Docker to manage dependencies, but that could bring another problem:

If your results are reproducible only in a highly specific environment, can we really call them reproducible?
Are they really useful?

5.4 In praise of CRAN

Computer languages often have package repositories such as CPAN, CRAN, CTAN, PyPI, and NPM, etc.
A perhaps unique feature of CRAN is that package maintainers must check their reverse dependencies before they are able to publish new versions of their packages to CRAN, which means normally you are not allowed to break packages that depend on your package.
This is a huge pain for me as the maintainer of some popular R packages including knitr and rmarkdown (which have > 10K reverse dependencies), but is enormously beneficial to the whole R community.

6 Minimalism: my new journey

litedown: https://github.com/yihui/litedown
The goal: litedown = min{knitr + evaluate + rmarkdown + bookdown + blogdown + pagedown + xaringan + tufte} - Pandoc - Hugo - GitBook - Bootstrap - jQuery

Re-implementing 12 years of work with fewer features and dependencies
- Easier to develop, install, and manage
- Also easier to be embedded in other applications (e.g., WebAssembly)
Be determined to say No (if you want rich features, you should use rmarkdown or Quarto instead)
Choose a stable foundation to develop software on top of
- e.g., base R, commonmark
Hoping to declare “feature-complete” in a few months

Reflections on the 12 Years of R Markdown from a Developer’s Perspective

Yihui Xie

2024-05-31