class: center, middle, inverse, title-slide # Advanced R Markdown ## Behind the Knit Button ###
Yihui Xie, RStudio
### 2017/01/14 @ rstudio::conf, Kissimmee, FL --- class: middle > PSA #2 for [#rstudioconf](https://twitter.com/hashtag/rstudioconf?src=hash) Git(Hub) tutorial folks. > GitHub up tomorrow? 🎉 GitHub still down? Dodgeball tournament on the lawn.🏆 Come prepared. > — Jenny Bryan (@JennyBryan) [January 13, 2017](https://twitter.com/JennyBryan/status/819951486172090369) --- background-image: url(https://images.unsplash.com/photo-1428908728789-d2de25dbd4e2?dpr=2&auto=format&fit=crop&w=1500&h=1000&q=80&cs=tinysrgb&crop=) background-size: cover # Outline Part I: R Markdown Internals - The Knit Button - [Pandoc's Markdown](http://pandoc.org/MANUAL.html) - R Markdown Output Formats Part II: Examples of R Markdown Extensions - [**rticles**](https://github.com/rstudio/rticles): LaTeX journal articles - [**tufte**](https://github.com/rstudio/tufte): the Tufte style - [**xaringan**](https://github.com/yihui/xaringan): using the JS slideshow library remark.js - [**bookdown**](https://github.com/rstudio/bookdown) / [**blogdown**](https://github.com/rstudio/blogdown): multiple R Markdown documents --- background-image: url(http://rmarkdown.rstudio.com/images/bandOne.png) background-size: contain class: inverse, center, middle # R Markdown Internals --- # An R Markdown document --- title: "A Simple Regression" author: "Yihui Xie" output: html_document: toc: true --- We built a linear regression model. ```{r} fit <- lm(dist ~ speed, data = cars) b <- coef(fit) plot(fit) ``` The slope of the regression is `r b[1]`. ??? In fact, I have used this minimal example for a few years, and one day I suddenly realized I made a mistake here: the slope should be `b[2]` instead of `b[1]`. Although it is a little ironic, it is also a good example of why you should generate reports from source code instead of pasting a fixed number. If it is just a fixed number, say, 3.87, it will be difficult to realize that you copied a wrong coefficient. --- # The Knit button - It calls `rmarkdown::render()` - R Markdown ≈ knitr (R) + Pandoc (Markdown) - `rmarkdown::render()` ≈ `knitr::knit()` + a `system()` call to `pandoc` - R Markdown (`.Rmd`) -> `knit()` -> Markdown (`.md`) -> `pandoc` -> - `.html` - `.pdf` (LaTeX) - `.docx` - `.epub` - `.rtf` - ... --- # Pandoc's Markdown - You should read the Pandoc Manual at least once to learn the possibilities of Pandoc's Markdown: http://pandoc.org/MANUAL.html#pandocs-markdown - Original Markdown ([John Gruber](http://daringfireball.net/projects/markdown/)) - primarily for HTML - paragraphs, `# headers`, `> blockquotes` - `**bold**`, `_italic_` - `- lists` - `[text](url)` - `![text](image)` - code blocks (indent by four spaces) --- # Pandoc's Markdown - Markdown extensions - YAML metadata - LaTeX math `$\sum_{i=1}^n \alpha_i$` = `\(\sum_{i=1}^n \alpha_i\)` - syntax highlighting of code blocks (three backticks followed by the language name, e.g. ```` ```r ````) - raw HTML/LaTeX (limitation: raw HTML only works for HTML output, and raw LaTeX only for LaTeX/PDF output) <div class="my-class"> ![image](url) </div> _emphasis_ and \emph{emphasis} - tables - footnotes `^[A footnote here.]` - citations `[@joe2014]` (I recommend you to use BibTeX) --- background-image: url(http://pandoc.org/diagram.jpg) # Pandoc's Markdown - Types of output documents - LaTeX/PDF, HTML, Word (MS Word, OpenOffice) - beamer, ioslides, Slidy, reveal.js - E-books - ... -- - probably one of the worst visualizations: http://pandoc.org/diagram.jpg --- # Command-line usage of Pandoc Some examples: ```sh pandoc test.md -o test.html pandoc test.md -s --mathjax -o test.html pandoc test.md -o test.odt pandoc test.md -o test.rtf pandoc test.md -o test.docx pandoc test.md -o test.pdf pandoc test.md --latex-engine=xelatex -o test.pdf pandoc test.md -o test.epub ``` To run system commands in R, use functions `system()` or `system2()`. The **rmarkdown** package provides a helper function `rmarkdown::pandoc_convert()` to convert Markdown documents to other formats using Pandoc. When you click the Knit button in RStudio, you will see the actual (usually very long) command that is executed. ??? You can use `rmarkdown::render(clean = FALSE)` to keep the intermediate .md, and run the `pandoc` command for debugging purposes. --- # R Markdown output formats - An output format is an abstraction in **rmarkdown** as a uniform (programming) interface to deal with - **knitr** options (chunk options, hooks, package options, ...) - pandoc options (`--from`, `--to`, `--latex-engine`, ...) - pre/post-processors - and other options (e.g., whether to keep the intermediate .md) - Can be created via `rmarkdown::output_format()` - Note the `base_format` argument: output formats are _extensible_. If you only want to modify a few options of an existing format, you can use it as the base, e.g., you can add a custom post-processor on top of the existing one. --- # Built-in formats - `beamer_presentation` - `github_document` - `html_document` - `ioslides_presentation` - `md_document` - `odt_document` - `pdf_document` - `rtf_document` - `slidy_presentation` - `word_document` --- # YAML options for output formats The YAML metadata ```yaml --- output: html_document: toc: true theme: "united" fig_height: 6 --- ``` will be translated to ```r rmarkdown::render( 'input.Rmd', html_document( toc = TRUE, theme = "united", fig_height = 6 ) ) ``` --- # Example: html_document() ```r str(rmarkdown::html_document()) ``` ``` List of 11 $ knitr :List of 5 ..$ opts_knit : NULL ..$ opts_chunk :List of 5 .. ..$ dev : chr "png" .. ..$ dpi : num 96 .. ..$ fig.width : num 7 .. ..$ fig.height: num 5 .. ..$ fig.retina: num 2 ..$ knit_hooks : NULL ..$ opts_hooks : NULL ..$ opts_template: NULL $ pandoc :List of 6 ..$ to : chr "html" ..$ from : chr "markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash" ..$ args : chr [1:11] "--smart" "--email-obfuscation" "none" "--self-contained" ... ..$ keep_tex : logi FALSE ..$ latex_engine: chr "pdflatex" ..$ ext : NULL $ keep_md : logi FALSE $ clean_supporting : logi TRUE $ df_print : chr "default" $ pre_knit :function (...) $ post_knit :function (...) $ pre_processor :function (...) $ intermediates_generator:function (original_input, encoding, intermediates_dir) $ post_processor :function (metadata, input_file, output_file, clean, verbose) $ on_exit :function () - attr(*, "class")= chr "rmarkdown_output_format" ``` --- # Example: html_document() Some options: - `theme`: you can set it to `NULL` to reduce the HTML file size significantly (because of Bootstrap) - `css`: tweak the styles of certain elements (learn to use the Developer Tools of your web browser) - `template`: a custom Pandoc template --- # Pandoc templates - Official Pandoc templates: https://github.com/jgm/pandoc-templates - **rmarkdown**'s templates: https://github.com/rstudio/rmarkdown/tree/master/inst/rmd - A minimal example ```html <html> <head> * <title>$title$</title> $for(css)$ <link rel="stylesheet" href="$css$" type="text/css" /> $endfor$ </head> <body> * $body$ </body> </html> ``` --- # Simple customization There are many options you can set in YAML. Two types of options: - Options for Pandoc: make sure you read the Pandoc manual to know the possible options (e.g., for LaTeX output: http://pandoc.org/MANUAL.html#variables-for-latex). ```yaml fontsize: 12pt documentclass: book monofont: "Source Code Pro" # for XeLaTeX output ``` - Options for an R Markdown output format under the `output` field in YAML: consult the specific R help page. You can certainly create your own template, but it may not be necessary to do so if your problem can be solved by setting a few options in YAML. --- # Deeper customization A common use case: inject a snippet of code to the HTML `<head>` (e.g., JS/CSS code), or the LaTeX preamble (e.g., load some LaTeX packages before `\begin{document}`). ```yaml output: html_document: includes: in_header: "header.html" before_body: "before.html" after_body: "after.html" pdf_document: includes: in_header: "preamble.tex" ``` Even deeper customization? Sure, write a package with custom output formats! Let's study a few relatively simple examples in **rmarkdown** first. --- background-image: url(gif/jump-off.gif) background-size: contain class: bottom, center ## Take a deep breath and read some source code! --- # Example: odt_document ```r rmarkdown::odt_document ``` ``` function (fig_width = 5, fig_height = 4, fig_caption = TRUE, template = "default", reference_odt = "default", includes = NULL, keep_md = FALSE, md_extensions = NULL, pandoc_args = NULL) { knitr <- knitr_options(opts_chunk = list(dev = "png", dpi = 96, fig.width = fig_width, fig.height = fig_height)) args <- c() if (!is.null(template) && !identical(template, "default")) args <- c(args, "--template", pandoc_path_arg(template)) args <- c(args, includes_to_pandoc_args(includes)) if (!is.null(reference_odt) && !identical(reference_odt, "default")) { args <- c(args, "--reference-odt", pandoc_path_arg(reference_odt)) } args <- c(args, pandoc_args) output_format(knitr = knitr, pandoc = pandoc_options(to = "odt", from = from_rmarkdown(fig_caption, md_extensions), args = args), keep_md = keep_md) } <environment: namespace:rmarkdown> ``` --- # Example: rtf_document - https://github.com/rstudio/rmarkdown/blob/master/R/rtf_document.R - pre-processor (protect raw RTF content) - post-processor (restore raw RTF content) - raw RTF looks like this ```rtf {\rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\pard This is some {\b bold} text.\par } ``` --- background-image: url(https://images.unsplash.com/photo-1466779561253-0a08336ba2ab?dpr=2&auto=format&fit=crop&w=1500&h=844&q=80&cs=tinysrgb&crop=) background-size: cover class: inverse, center, middle # Examples of R Markdown Extensions --- class: center # How to make a nice-looking package .pull-left[ ![](gif/nice-vol-pic1.jpg) ] -- .pull-right[ ![](gif/nice-vol-pic2.jpg) ] ### focus, and work hard --- class: center # How to make a nice-looking package .pull-left[ ![](gif/nice-ski-pic1.jpg) ] -- .pull-right[ ![](gif/nice-ski-pic2.gif) ] ### sometimes you just cannot avoid tricks or hacks --- background-image: url(https://images.unsplash.com/photo-1477125804055-781b73bcd980?dpr=2&auto=format&fit=crop&w=1500&h=1002&q=80&cs=tinysrgb&crop=) background-size: cover # rticles - The key idea: LaTeX templates for Pandoc (through the `template` argument of output format functions) - Currently supported journals: https://github.com/rstudio/rticles/tree/master/inst/rmarkdown/templates - Most rticles format functions just calls `rmarkdown::pdf_document()` and modifies the returned list (a natural hack), instead of using `rmarkdown::output_format()` --- # tufte - https://github.com/rstudio/tufte - History: originally `rmarkdown::tufte_handout` and only supports PDF articles - **tufte** extended the style to PDF books and HTML documents - Key pieces: - Tufte-LaTeX - Tufte-CSS - **knitr** output hooks --- background-image: url(https://upload.wikimedia.org/wikipedia/commons/b/be/Sharingan_triple.svg) background-position: 90% 10% background-size: 15% # xaringan - https://github.com/yihui/xaringan - Porting the JavaScript library remark.js for slideshows into R - Dirty hacks - Its Markdown is not Pandoc's Markdown (bypasses Pandoc; how?) - Still uses the **rmarkdown** output format framework because a lot of things are easier: customize CSS, add HTML dependencies (HTML widgets), write out the initialization JS code of remark.js (`remark.create()`), and so on ??? A lot of cool things about remark.js: background images, alignment of elements, keyboard shortcuts, ... --- background-image: url(https://github.com/yihui/xaringan/releases/download/v0.0.2/karl-moustache.jpg) background-size: contain class: center, bottom, inverse # You only live once! ### [`yolo = TRUE`](https://twitter.com/_inundata/status/798970002992873472) ??? This is the Monalisa in rstats world! --- background-image: url(https://images.unsplash.com/photo-1464958674501-4dc3d73d8f46?dpr=2&auto=format&fit=crop&w=1500&h=1000&q=80&cs=tinysrgb&crop=) background-size: cover # bookdown - https://github.com/rstudio/bookdown - Multiple R Markdown documents - Multiple output formats - Markdown extensions (figure/table numbers, equations, theorems, ...) - Extensive use of regular expressions (`grep()`, `gregexpr()`, `regmatches()`, ...) --- # Life as a software developer <img src="gif/ride-donkey.jpg" width=250 style="float:right;" /> - Make [one person](https://github.com/tidyverse/reprex/blob/3f824b842/README.md#other-work) extremely happy first. Do not aim at making everybody (even mildly) happy. - Occasionally little [Easter eggs](http://www.njtierney.com/r/rbloggers/2017/01/11/reprex-magic/) and [humor](https://github.com/yihui/knitr/issues/1322). - Stand on the shoulders of giants. - I'm not a JavaScript expert, but I love discovering cool libraries like remark.js. I know little about C or C++,<sup>\*</sup> but I know how to make use of other people's work, such as the **httpuv** package, so I wrote the **servr** package. .footnote[[\*] Some people feel relieved on hearing this. I used C [t](https://github.com/jeroenooms/jsonlite/pull/85)w[i](https://github.com/viking/r-yaml/pull/32)ce, and Python [o](https://github.com/yihui/ubuntu-bin/blob/master/delete.py)nce in my life.] -- - [Saying no](https://twitter.com/xieyihui/status/805898395231617024) is hard, but you will have to do it at some point. [Sometimes](https://github.com/rstudio/bookdown/issues/295) it [is](https://github.com/rstudio/bookdown/issues/259) easy to make a deal with the users, but most of time [it](https://github.com/rstudio/bookdown/issues/238), [is](https://github.com/rstudio/bookdown/issues/251), hard. - Keep calm, and say no. And say sorry. - Alone, we are all weak. Luke, use the ~~force~~ source! (and pull requests) ??? - Since you are here attending the Advanced R Markdown tutorial, hopefully you want to be a developer, so let's talk about some non-technical stuff. - There are many types of humor. The most effective one is unexpectedness. Think differently, even if a question has an obvious answer, e.g., what is the center of the universe? - Also say no to the roller coaster. --- background-image: url(gif/tiny-human.jpg) background-size: contain ??? - We are so weak compared to the universe. - Here is the center of my universe. --- background-image: url(gif/logo.png) background-position: 90% 50% background-size: 20% # Contact info - Yihui Xie - https://yihui.org - Twitter [@xieyihui](https://twitter.com/xieyihui) Github [@yihui](https://github.com/yihui) - RStudio - https://www.rstudio.com - Twitter [@rstudio](https://twitter.com/rstudio) Github [@rstudio](https://github.com/rstudio)