class: center, middle, inverse, title-slide # A Tour Through the R Markdown Ecosystem ## Creating Documents, Presentations, Papers, Dashboards, Books, and Websites ###
Yihui Xie, RStudio
### 2017/05/30 @ Data Science Week, Melbourne --- # About myself - First time visting Australia -- - Native Chinese (1st language: Chinese; 2nd? 3rd?) -- - Statistician by training (Iowa State University; advisors: Prof Di Cook & Heike Hofmann) -- - Software engineer working remotely for RStudio (Omaha, NE) -- - Authored and co-authored a few R packages (knitr, animation, rmarkdown, shiny, bookdown, blogdown, and so on) -- - [Enjoys](https://yihui.shinyapps.io/voice/) programming more than data analysis (no, I'm not a data scientist) ```r x11() library(animation) ani.options(nmax = 213, interval = 0.03) quincunx(balls = 200, col.balls = rainbow(200)) library(fun) x11(type = "Xlib"); mine_sweeper() ``` --- # About this tutorial -- .pull-left[ ![](gif/dozing-duck.gif) ] -- .pull-right[ ![](gif/ghost-in-water.gif) ] --- background-image: url(https://images.unsplash.com/photo-1428908728789-d2de25dbd4e2?dpr=2&auto=format&fit=crop&w=1500&h=1000&q=80&cs=tinysrgb&crop=) background-size: cover # Outline Part I: Inside [R Markdown](http://rmarkdown.rstudio.com) - The Knit button - [Pandoc's Markdown](http://pandoc.org/MANUAL.html) - Built-in R Markdown output formats Part II: R Markdown Extensions - [**rticles**](https://github.com/rstudio/rticles): LaTeX journal articles - [**tufte**](https://github.com/rstudio/tufte): the Tufte style - [**flexdashboard**](https://github.com/rstudio/flexdashboard): HTML output with dashboard layouts - [**xaringan**](https://github.com/yihui/xaringan): using the JS slideshow library remark.js - [**bookdown**](https://github.com/rstudio/bookdown) / [**blogdown**](https://github.com/rstudio/blogdown): multiple R Markdown documents --- background-image: url(http://rmarkdown.rstudio.com/images/bandOne.png) background-size: contain class: inverse, center, middle # Inside R Markdown ## The rmarkdown Package --- class: middle, center # http://rmarkdown.rstudio.com --- # An R Markdown document --- title: "A Simple Regression" author: "Yihui Xie" output: html_document: toc: true --- We built a linear regression model. ```{r} fit <- lm(dist ~ speed, data = cars) b <- coef(fit) plot(fit) ``` The slope of the regression is `r b[1]`. ??? In fact, I have used this minimal example for a few years, and one day I suddenly realized I made a mistake here: the slope should be `b[2]` instead of `b[1]`. Although it is a little ironic, it is also a good example of why you should generate reports from source code instead of pasting a fixed number. If it is just a fixed number, say, 3.87, it will be difficult to realize that you copied a wrong coefficient. --- class: middle, center ![](gif/duck-dog-mask.gif) --- # The Knit button - It calls `rmarkdown::render()` - R Markdown ≈ knitr (R) + Pandoc (Markdown) - `rmarkdown::render()` ≈ `knitr::knit()` + a `system()` call to `pandoc` - R Markdown (`.Rmd`) -> `knit()` -> Markdown (`.md`) -> `pandoc` -> - `.html` - `.pdf` (LaTeX) - `.docx` - `.epub` - `.rtf` - ... --- .pull-left[ ![](https://raw.githubusercontent.com/yihui/knitr-talks/master/useR2014/images/grommit.jpg) - **knitr**: https://yihui.org/knitr - the most important page: https://yihui.org/knitr/options ] -- .pull-right[ ![the knitr book](https://db.yihui.org/imgur/yYw46aF.jpg) ] --- # Original Markdown - by [John Gruber](http://daringfireball.net/projects/markdown/) - primarily for HTML - paragraphs, `# headers`, `> blockquotes` - `**bold**`, `_italic_` - `- unnumbered lists` - `1. numbered lists` - `[text](url)` - `![text](image)` - code blocks (indent by four spaces) --- class: middle, center ![](gif/dog-reach.gif) --- # Pandoc's Markdown - You should read the Pandoc Manual at least once to learn the possibilities of Pandoc's Markdown: http://pandoc.org/MANUAL.html#pandocs-markdown - Two major contributors of Pandoc: Markdown extensions, and multiple output formats --- # Pandoc's Markdown - Markdown extensions - YAML metadata - LaTeX math `$\sum_{i=1}^n \alpha_i$` = `\(\sum_{i=1}^n \alpha_i\)` - syntax highlighting of code blocks (three backticks followed by the language name, e.g. ```` ```r ````) - raw HTML/LaTeX (limitation: raw HTML only works for HTML output, and raw LaTeX only for LaTeX/PDF output) <div class="my-class"> ![image](url) </div> _emphasis_ and \emph{emphasis} - tables - footnotes `^[A footnote here.]` - citations `[@joe2014]` (I recommend you to use BibTeX) --- background-image: url(http://pandoc.org/diagram.jpg) # Pandoc's Markdown - Types of output documents - LaTeX/PDF, HTML, Word (MS Word, OpenOffice) - beamer, ioslides, Slidy, reveal.js - E-books - ... -- - probably one of the worst visualizations: http://pandoc.org/diagram.jpg --- class: inverse # Examples of Documents and Presentations - Links at https://github.com/rbind/yihui/issues/14 - Tables - Citations - Notebooks - Different language engines - Other issues? --- # Command-line usage of Pandoc Some examples: ```sh pandoc test.md -o test.html pandoc test.md -s --mathjax -o test.html pandoc test.md -o test.odt pandoc test.md -o test.rtf pandoc test.md -o test.docx pandoc test.md -o test.pdf pandoc test.md --latex-engine=xelatex -o test.pdf pandoc test.md -o test.epub ``` To run system commands in R, use functions `system()` or `system2()`. The **rmarkdown** package provides a helper function `rmarkdown::pandoc_convert()` to convert Markdown documents to other formats using Pandoc. When you click the Knit button in RStudio, you will see the actual (usually very long) command that is executed. ??? You can use `rmarkdown::render(clean = FALSE)` to keep the intermediate .md, and run the `pandoc` command for debugging purposes. --- # R Markdown output formats - An output format is an abstraction in **rmarkdown** as a uniform (programming) interface to deal with - **knitr** options (chunk options, hooks, package options, ...) - pandoc options (`--from`, `--to`, `--latex-engine`, ...) - pre/post-processors - and other options (e.g., whether to keep the intermediate .md) - Can be created via `rmarkdown::output_format()` - Note the `base_format` argument: output formats are _extensible_. If you only want to modify a few options of an existing format, you can use it as the base, e.g., you can add a custom post-processor on top of the existing one. --- # Built-in formats - `beamer_presentation` - `github_document` - `html_document` - `ioslides_presentation` - `md_document` - `odt_document` - `pdf_document` - `rtf_document` - `slidy_presentation` - `word_document` --- # YAML options for output formats The YAML metadata ```yaml --- output: html_document: toc: true theme: "united" fig_height: 6 --- ``` will be translated to ```r rmarkdown::render( 'input.Rmd', html_document( toc = TRUE, theme = "united", fig_height = 6 ) ) ``` --- # Example: html_document() ```r str(rmarkdown::html_document()) ``` ``` List of 11 $ knitr :List of 5 ..$ opts_knit : NULL ..$ opts_chunk :List of 5 .. ..$ dev : chr "png" .. ..$ dpi : num 96 .. ..$ fig.width : num 7 .. ..$ fig.height: num 5 .. ..$ fig.retina: num 2 ..$ knit_hooks : NULL ..$ opts_hooks : NULL ..$ opts_template: NULL $ pandoc :List of 6 ..$ to : chr "html" ..$ from : chr "markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash" ..$ args : chr [1:11] "--smart" "--email-obfuscation" "none" "--self-contained" ... ..$ keep_tex : logi FALSE ..$ latex_engine: chr "pdflatex" ..$ ext : NULL $ keep_md : logi FALSE $ clean_supporting : logi TRUE $ df_print : chr "default" $ pre_knit :function (...) $ post_knit :function (...) $ pre_processor :function (...) $ intermediates_generator:function (original_input, encoding, intermediates_dir) $ post_processor :function (metadata, input_file, output_file, clean, verbose) $ on_exit :function () - attr(*, "class")= chr "rmarkdown_output_format" ``` --- # Example: html_document() Some options: - `theme`: you can set it to `NULL` to reduce the HTML file size significantly (because of Bootstrap) - `css`: tweak the styles of certain elements (learn to use the Developer Tools of your web browser) - `template`: a custom Pandoc template --- # Pandoc templates - Official Pandoc templates: https://github.com/jgm/pandoc-templates - **rmarkdown**'s templates: https://github.com/rstudio/rmarkdown/tree/master/inst/rmd - A minimal example ```html <html> <head> * <title>$title$</title> $for(css)$ <link rel="stylesheet" href="$css$" type="text/css" /> $endfor$ </head> <body> * $body$ </body> </html> ``` --- # Simple customization There are many options you can set in YAML. Two types of options: - Options for Pandoc: make sure you read the Pandoc manual to know the possible options (e.g., for LaTeX output: http://pandoc.org/MANUAL.html#variables-for-latex). ```yaml fontsize: 12pt documentclass: book monofont: "Source Code Pro" # for XeLaTeX output ``` - Options for an R Markdown output format under the `output` field in YAML: consult the specific R help page. You can certainly create your own template, but it may not be necessary to do so if your problem can be solved by setting a few options in YAML. --- # Deeper customization A common use case: inject a snippet of code to the HTML `<head>` (e.g., JS/CSS code), or the LaTeX preamble (e.g., load some LaTeX packages before `\begin{document}`). ```yaml output: html_document: includes: in_header: "header.html" before_body: "before.html" after_body: "after.html" pdf_document: includes: in_header: "preamble.tex" ``` Even deeper customization? Sure, write a package with custom output formats! Let's study a few relatively simple examples in **rmarkdown** first. --- background-image: url(gif/jump-off.gif) background-size: contain class: bottom, center ## Take a deep breath and read some source code! --- # Example: odt_document ```r rmarkdown::odt_document ``` ``` function (fig_width = 5, fig_height = 4, fig_caption = TRUE, template = "default", reference_odt = "default", includes = NULL, keep_md = FALSE, md_extensions = NULL, pandoc_args = NULL) { knitr <- knitr_options(opts_chunk = list(dev = "png", dpi = 96, fig.width = fig_width, fig.height = fig_height)) args <- c() if (!is.null(template) && !identical(template, "default")) args <- c(args, "--template", pandoc_path_arg(template)) args <- c(args, includes_to_pandoc_args(includes)) if (!is.null(reference_odt) && !identical(reference_odt, "default")) { args <- c(args, "--reference-odt", pandoc_path_arg(reference_odt)) } args <- c(args, pandoc_args) output_format(knitr = knitr, pandoc = pandoc_options(to = "odt", from = from_rmarkdown(fig_caption, md_extensions), args = args), keep_md = keep_md) } <environment: namespace:rmarkdown> ``` --- # Example: rtf_document - https://github.com/rstudio/rmarkdown/blob/master/R/rtf_document.R - pre-processor (protect raw RTF content) - post-processor (restore raw RTF content) - raw RTF looks like this ```rtf {\rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\pard This is some {\b bold} text.\par } ``` --- background-image: url(https://images.unsplash.com/photo-1466779561253-0a08336ba2ab?dpr=2&auto=format&fit=crop&w=1500&h=844&q=80&cs=tinysrgb&crop=) background-size: cover class: inverse, center, middle # R Markdown Extensions --- background-image: url(https://images.unsplash.com/photo-1477125804055-781b73bcd980?dpr=2&auto=format&fit=crop&w=1500&h=1002&q=80&cs=tinysrgb&crop=) background-size: cover # rticles - The key idea: LaTeX templates for Pandoc (through the `template` argument of output format functions) - Currently supported journals: https://github.com/rstudio/rticles/tree/master/inst/rmarkdown/templates - Most rticles format functions just calls `rmarkdown::pdf_document()` and modifies the returned list (a natural hack), instead of using `rmarkdown::output_format()` --- # tufte - https://github.com/rstudio/tufte - History: originally `rmarkdown::tufte_handout` and only supports PDF articles - **tufte** extended the style to PDF books and HTML documents - Key pieces: - Tufte-LaTeX - Tufte-CSS - **knitr** output hooks --- # flexdashboard - https://github.com/rstudio/flexdashboard - Responsive and interactive dashboards --- background-image: url(https://upload.wikimedia.org/wikipedia/commons/b/be/Sharingan_triple.svg) background-position: 90% 10% background-size: 15% # xaringan - https://github.com/yihui/xaringan - Porting the JavaScript library remark.js for slideshows into R - Dirty hacks - Its Markdown is not Pandoc's Markdown (bypasses Pandoc; how?) - Still uses the **rmarkdown** output format framework because a lot of things are easier: customize CSS, add HTML dependencies (HTML widgets), write out the initialization JS code of remark.js (`remark.create()`), and so on ??? A lot of cool things about remark.js: background images, alignment of elements, keyboard shortcuts, ... --- background-image: url('gif/karl-broman.jpeg') background-size: contain class: center, bottom, inverse # You only live once! ### [`yolo = TRUE`](https://twitter.com/_inundata/status/798970002992873472) ??? This is the Monalisa in rstats world! --- background-image: url(https://images.unsplash.com/photo-1464958674501-4dc3d73d8f46?dpr=2&auto=format&fit=crop&w=1500&h=1000&q=80&cs=tinysrgb&crop=) background-size: cover # bookdown - https://github.com/rstudio/bookdown - Organize multiple R Markdown documents into a "book" - Multiple output formats - Markdown extensions (figure/table numbers, equations, theorems, ...) - Extensive use of regular expressions (`grep()`, `gregexpr()`, `regmatches()`, ...) - <https://slides.yihui.org/2016-CSU-bookdown-Yihui-Xie.html#(10)> --- # blogdown - https://github.com/rstudio/blogdown - Creating websites based on R Markdown and Hugo - https://slides.yihui.org/2017-rstudio-webinar-blogdown-Yihui-Xie.html --- # My life as a software developer - [Saying no](https://twitter.com/xieyihui/status/805898395231617024) is hard, but I have to do it. [Sometimes](https://github.com/rstudio/bookdown/issues/295) it [is](https://github.com/rstudio/bookdown/issues/259) easy to make a deal with the users, but most of time [it](https://github.com/rstudio/bookdown/issues/238), [is](https://github.com/rstudio/bookdown/issues/251), hard. - Bottom line: I don't want to turn R Markdown to another version of LaTeX. .center[[![](https://imgs.xkcd.com/comics/standards.png)](https://xkcd.com/927/)] - The [bookdown FAQ](https://bookdown.org/yihui/bookdown/faq.html) --- # Your life (after this tutorial)? -- .pull-left[ ![](gif/get-into-the-car.gif) ] -- .pull-right[ <img src="gif/kangroo-kick.gif" width="400" /> ] --- background-image: url(gif/logo.png) background-position: 90% 50% background-size: 20% # Contact info - Yihui Xie - https://yihui.org - Twitter [@xieyihui](https://twitter.com/xieyihui) Github [@yihui](https://github.com/yihui) - RStudio - https://www.rstudio.com - Twitter [@rstudio](https://twitter.com/rstudio) Github [@rstudio](https://github.com/rstudio)