class: center, middle, hide-count count: false # Machine Learning in R ### Dynamic Documents with Rmarkdown ___ **Simon Schölzel** Winter Term 2021/2022 .small[(updated: 2021-09-27)] <br><br> <a href="https://www.wiwi.uni-muenster.de/"><img src="https://www.wiwi.uni-muenster.de/fakultaet/sites/all/themes/wwucd/assets/images/logos/secondary_wiwi_aacsb_german.jpg" alt="fb4-logo" height="45"></a> <a href="https://www.wiwi.uni-muenster.de/ctrl/aktuelles"><img src="https://www.wiwi.uni-muenster.de/ctrl/sites/all/themes/wwucd/assets/images/logos/berenslogo5.jpg" alt="ftb-logo" height="45"></a> <a href="https://www.wiwi.uni-muenster.de/iff2/de/news"><img src="https://www.wiwi.uni-muenster.de/iff2/sites/all/themes/wwucd/assets/images/logos/logo_iff2_en2.jpg" alt="ipb-logo" height="45"></a> --- ## Agenda **1 The Story So Far: `R` Scripts** **2 R Markdown Documents** > 2.1 The `YAML` Header 2.2 The Text Body 2.3 Code Chunks & Output 2.4 Generating High-Quality Reports **3 Interactive Data Science in `R`** > 3.1 Introduction to Computational Notebooks 3.2 R Markdown Documents vs. R Notebooks --- ## 1 The Story So Far: `R` Scripts <img src="./img/r-script.PNG" width="70%" style="display: block; margin: auto;" /> ??? - file extension `.R` - top right: see the name of the opened RStudio project - explain the different panes - a script is really just plain R code, maybe annotated by comments - the code is executed by sending it from the editor pane to the console - output is either shown in the console or in plot/viewer pane - send the whole code to console or just chunks - not really a good approach because you are going back and forth between console and script the whole time - and what does this workflow look like if you have to write a report based on your code (e.g., your thesis)? -- .center[ When using `R` scripts (`.R`), a report of your analysis, including plots as well as results, is usually separated from your code (e.g., contained in a Microsoft Word document). As a result, each change of your code base requires a whole review of the written report! ] --- background-image: url(https://camo.githubusercontent.com/de0519dd8e4ebc982eb0ddfaa9c6cd0924149e6c/68747470733a2f2f626f6f6b646f776e2e6f72672f79696875692f726d61726b646f776e2f696d616765732f6865782d726d61726b646f776e2e706e67) background-position: 97.5% 2.5% background-size: 7.5% layout: true --- ## 2 R Markdown Documents .pull-left[ **The `rmarkdown` package:** - Markdown-based authoring framework for data science - Intertwine `R` code and plain text in one document - Generate high-quality reports from `.Rmd`-files <br> **Document elements:** - The `YAML` Header - The text body - Code chunks & output ] .pull-right[ <iframe src="https://player.vimeo.com/video/178485416?color=428bca" width="640" height="400" frameborder="0" allow="autoplay; fullscreen" allowfullscreen></iframe> ] --- ## 2 R Markdown Documents ### 2.1 The `YAML` Header The `YAML` header appears at the top of each `.Rmd` document. It contains a bunch of metadata as well as formatting options for your final report. The syntax for the header is based on [YAML](https://en.wikipedia.org/wiki/YAML). ``` --- title: "Rmarkdown Demo" # Title/headline of your markdown report author: John Doe # Name of the author date: 2021-10-11 # Date of the report output: # YAML options specifying the output html_document: # Output format toc: true # Table of contents (toc) toc_float: true # Floating toc on the left side of the document toc_depth: 2 # Depth of headers to generate toc --- ``` Importantly, `.Rmd` can be converted to a plethora of different file formats, e.g., HTML, PDF, Word, even Powerpoint slides, by specifying the `output` argument (for more details see [Xie/Allaire/Grolemund (2021)](https://bookdown.org/yihui/rmarkdown/html-document.html)). ??? YAML: "YAML Ain't Markup Language" Only a small extract of options you can specify as part of the YAML header. --- ## 2 R Markdown Documents ### 2.2 The Text Body In principal, `.Rmd`-documents allow you to format text in the same way as Microsoft Word would, only that you insert and format text using a special syntax: *markdown*. .panelset[ .panel[.panel-name[Text Formatting] .pull-left[ Text in \*italics\* or in \*\*bold\*\* - Text in *italics* or in **bold** Text in S^uperscript^ or S~ubscript~ - Text in S<sup>uperscript</sup> or S<sub>ubscript</sub> Text as `` ` ``fancy R code`` ` `` - Text as `fancy R code` ] .pull-right[ \> Meaningful quote by some wise person > Meaningful quote by some wise person Hyperlink to the \[cheat sheet\]( < URL > ) - Hyperlink to the [cheat sheet](https://raw.githubusercontent.com/rstudio/cheatsheets/master/rmarkdown.pdf) ] ] .panel[.panel-name[Equations] *Inline LaTeX equation:* Text before `$x_{1/2} = -\frac{p}{2}\pm \sqrt{(\frac{p}{2})^2-q}$` Text after Text before `\(x_{1/2} = -\frac{p}{2}\pm \sqrt{(\frac{p}{2})^2-q}\)` Text after <br> *Centered LaTeX equation:* ` `$$x_{1/2} = -\frac{p}{2}\pm \sqrt{(\frac{p}{2})^2-q}$$` ` `$$x_{1/2} = -\frac{p}{2}\pm \sqrt{(\frac{p}{2})^2-q}$$` ] .panel[.panel-name[Headlines] \# You can use headers (H1) #### You can use headers (H1) \#\# You can use headers (H2) ##### You can use headers (H2) \#\#\# You can use headers (H3) ###### You can use headers (H3) ] .panel[.panel-name[Images] .pull-left[ External image: !\[\]( < URL > ) ] .pull-right[ External image: ![](https://i.pinimg.com/originals/32/49/65/32496554a4f599228677bbf7919024bb.png) ] ] .panel[.panel-name[Lists] .pull-left[ ``` - You can - create - an unordered - list of items ``` - You can - create - an unordered - list of items ] .pull-right[ ``` 1. Or even 2. better 3. you create an ordered list 1. which should 2. also look familiar! ``` 1. Or even 2. better 3. you create an ordered list 1. which should 2. also look familiar! ] ] .panel[.panel-name[Footnotes] And finally, since we are in an academic context you might also want to use some footnotes. [^1] And finally, since we are in an academic context you might also want to use some footnotes.<sup>1</sup> <br> [^1]: Note: Find more information about the `rmarkdown` syntax by consulting the official [cheat sheet](<url>). <br><br><br><br> .footnote[ [1] Note: Find more information about the `rmarkdown` syntax by consulting the official [cheat sheet](https://raw.githubusercontent.com/rstudio/cheatsheets/master/rmarkdown.pdf). ] ] ] --- ## 2 R Markdown Documents ### 2.3 Code Chunks & Output Code chunks are self-contained parts of `R` code for which, upon code evaluation, the output of your code is shown directly below the chunk instead of in your console, the RStudio *Viewer* pane or the *Plots* pane. .pull-left[ ```` ```{r fibo} fib <- function(n) { x <- numeric(n) x[1:2] <- c(1, 1) for(i in 3:n) { x[i] <- x[i-1] + x[i-2] } return(x) } fib(10) ``` ```` ``` > [1] 1 1 2 3 5 8 13 21 34 55 ``` ] .pull-right[ - Create a new code chunks using the `Ctrl + Alt + I` keyboard shortcut (on Windows).<br><br> - Code chunks even allow you to write and evaluate code in other languages, such as Python, SQL or C++, depending on the computational engine (e.g., `{r}`).<br><br> - You can even insert code output dynamically into your text using inline code: The 5. element of the series is `` ` `` r fib(5)[5] `` ` `` The 5. element of the series is 5 ] ??? The code is send from the chunk to your console where it is evaluated. The output is then send back to your code chunk where it is displayed below. Note: - each chunk is surrounded by three tick-marks (at start and end) - first word in curved bracket indicates computational engine - second word gives a name to code chunks for referencing --- ## 2 R Markdown Documents ### 2.3 Code Chunks & Output Chunk options are (mostly) set as boolean (**`T`**`rue`/**`F`**`alse`) arguments in the chunk header `{r ...}` to further customize the behavior of your code chunks when they are rednered to the final output format. <table style='width:100%;'> <thead> <tr> <th style="text-align:left;"> Argument </th> <th style="text-align:left;"> Description </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> include=F </td> <td style="text-align:left;"> Code and Output do not appear in the final report (yet, code is still evaluated). </td> </tr> <tr> <td style="text-align:left;"> eval=F </td> <td style="text-align:left;"> Output does not appear in the final report as the code is not evaluated. </td> </tr> <tr> <td style="text-align:left;"> results="hide" </td> <td style="text-align:left;"> Output does not appear in the final report but the code is evaluated. </td> </tr> <tr> <td style="text-align:left;"> echo=F </td> <td style="text-align:left;"> Code does not appear in the final report. </td> </tr> <tr> <td style="text-align:left;"> message=F </td> <td style="text-align:left;"> Messages generated by the code do not appear in the final report. </td> </tr> <tr> <td style="text-align:left;"> warning=F </td> <td style="text-align:left;"> Warnings generated by the code do not appear in the final report. </td> </tr> </tbody> </table> --- ## 2 R Markdown Documents (`.Rmd`) ### 2.4 Generating High-Quality Reports Render the `.Rmd` document into your desired output format by using the `rmarkdown::render()` command or click the *Knit* button in your RStudio toolbar. <img src="https://d33wubrfki0l68.cloudfront.net/61d189fd9cdf955058415d3e1b28dd60e1bd7c9b/b739c/lesson-images/rmarkdownflow.png" width="50%" height="50%" style="display: block; margin: auto;" /> **Workflow in the background:** 1. `knitr` takes your `.Rmd`-file, executes the code chunks as well as inline code, and converts everything into plain markdown format (`.md`). If you have an error somewhere in your code, `knitr` will revolt! 2. The `.md`-file is processed by the document converter `pandoc` and converted into the final output format (e.g., PDF, HTML, Word). .footnote[ *Note: For an intuitive distinction between `markdown`, `rmarkdown`, `knitr` and `pandoc` see this [stackoverflow post](https://stackoverflow.com/a/40563480).* ] ??? pandoc is a software tool itself which is installed along with `rmarkdown` --- ## 2 R Markdown Documents (`.Rmd`) <img src="https://raw.githubusercontent.com/allisonhorst/stats-illustrations/master/rstats-artwork/rmarkdown_wizards.png" width="70%" height="70%" style="display: block; margin: auto;" /> --- layout: false class: center, middle # Live Demo<br><br>👨💻 🕹 👩💻 --- background-image: url(https://camo.githubusercontent.com/de0519dd8e4ebc982eb0ddfaa9c6cd0924149e6c/68747470733a2f2f626f6f6b646f776e2e6f72672f79696875692f726d61726b646f776e2f696d616765732f6865782d726d61726b646f776e2e706e67) background-position: 95% 5% background-size: 7.5% layout: true --- ## 3 Interactive Data Science in `R` ### 3.1 Introduction to Computational Notebooks Computational notebooks are virtual interfaces used for [Literate Programming](https://en.wikipedia.org/wiki/Literate_programming). They combine features of word processors (e.g., Microsoft Word) and the programming language kernel (e.g., the `R` shell which you find in the RStudio *Console* pane). .pull-left[ **Features:** - Interactivity - Iteration - Sharing & Communication - Transparency - Reproducibility ] .pull-right[ .pull-left[ <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/3/38/Jupyter_logo.svg/518px-Jupyter_logo.svg.png" width="80%" height="80%" /> ] .pull-right[ <img src="https://www.r-project.org/logo/Rlogo.svg" width="100%" height="100%" /> ]] ??? As emphasized before: knitting only works, if your code is free of errors Literate Programming: - Writing human-readable computer programs - Boils down to having the source code as well as natural language documentations in one file Notebook Alternatives: Jupyter Notebook (language agnostic, but especially widespread in the Python area), R Notebook, Google Colab --- ## 3 Interactive Data Science in `R` ### 3.1 Introduction to Computational Notebooks .center[This sounds pretty familiar after learning about R Markdown documents right? <br> So what is an R Notebook all about?] <br> <img src="https://tenor.com/view/obama-wtf-why-president-wut-gif-12221156.gif" width="40%" height="40%" style="display: block; margin: auto;" /> --- ## 3 Interactive Data Science in `R` ### 3.2 R Markdown Documents vs. R Notebooks Primarily, both formats are based on `.Rmd`-files. A notebook (in the `R` universe) can be viewed as a special version of an R Markdown document (it has a special way of executing the code included in the document). .pull-left[ **R Markdown documents:** ``` --- title: "R Markdown Document" author: John Doe date: "27 September, 2021" output: html_document --- ``` - All code is executed when clicking the `Knit` button (batch execution). - Hence, R Markdown documents require your code to be free of errors! ] .pull-right[ **R Notebooks:** ``` --- title: "R Markdown Notebook" author: John Doe date: "27 September, 2021" output: html_notebook --- ``` - `Knit` is replaced by a `Preview` button. - A self-contained HTML-snapshot of your R Markdown document is created whenever you save it or click `Preview` (see `.nb.html`-file). - No `R` code is executed - the preview is created as-is (including errors). ] ??? The use case is determined by your goal: - generating high-quality reports with R Markdown documents - doing interactive data science / machine learning using R Notebooks --- layout: false class: center, middle # Live Demo<br><br>👨💻 🕹 👩💻 --- layout: true --- ## Additional Resources **RStudio (2021):** Introduction to `Rmarkdown`. URL: https://rmarkdown.rstudio.com/lesson-1.html. **Xie, Y./Allaire, J.J./Grolemund, G. (2021):** R Markdown: The Definitive Guide. URL: https://bookdown.org/yihui/rmarkdown/. **Xie, Y./Dervieux, C./Riederer, E. (2021):** R Markdown Cookbook. URL: https://bookdown.org/yihui/rmarkdown-cookbook/.