This post has been written by Gunnvant Singh- Jigsaw Academy Faculty
The usual data science project involves following stages:
Data preparation and exploration
Data analysis and model building
Report making and presentation
Usually, the report making part of the whole exercise is conducted at the end of data analysis and is always divorced from the model building and data analysis process. The reports are created painstakingly using Microsoft Power Point and Word. A lot of copy pasting of statistical results takes place from either a SAS or an R console. If God forbid, any changes are required in the report one goes back to running the R or SAS code again, copy pasting the results back into the presentation and word document!!! Can’t there be an easy way to deal with this painfully mundane exercise? Turns out there is a way, one can use ‘knitr’ package available in R.
knitr provides two broad report generation frameworks: One can either prepare reports using LaTeX or using Markdown. In this post I will illustrate how reports along with embedded R code can be generated using both Markdown and LaTeX
Generating a LaTeX report:
In order to generate dynamic reports one needs to install the package ‘knitr’ from cran and also needs to install LaTeX engine. Miktex is the most popular LaTeX distribution and can be downloaded from https://miktex.org/download.
Once this is done one needs to setup the RStudio to use knitr and Miktex as the default report generating tools. Go to ‘Tools’ then ‘Global Options’ and then in the pane select ‘Sweave’. In the dialogue box that appears, make sure that for the option “Weave Rnw files using” “knitr” is chosen and for the option “Typeset LaTeX into pdf using” “pdfLaTeX” is chosen
Once this is done, one can begin the process of creating a LaTeX report. The good thing about using LaTeX is that the output is always in a .pdf format and if your report contains a lot of mathematical formulae, then LaTeX has a very good support for that too.
In order to create a LaTeX report, go to “File” then “New File” and select “R Sweave”. A new console window will appear.
Let’s make a sample report, type in the following code:
After typing this code click on “Compile PDF” option, and you will see your first report. If you look carefully the code looks a bit different from a vanilla R code. Infact the only recognizable R code is in the grey box. This grey box is called a “chunk”. The R code will always go in the chunk portion the text will go inside the normal LaTeX code. One can have a lots of control as far as the formatting of a report are concerned. If the option “echo” is given as “false” in the chunk, then one will not see any R code in the pdf generated, if the “eval” option is given as “true” in the chunk option, then one can view the actual output of the R code. The upside of doing all the analysis in this way is that one can change the code and expect to see the change in report itself, no more separate running of code and then copy pasting of results. Reports can also be edited very fast.
Generating a Markdown Report:
To generate a Markdown report go to “File”, “New File” and select “R Markdown”. A new console window appears. Write down the following code:
Click on the “Knit HTML” option and you will see the HTML output. The R code in this case is written within a chunk which starts with three backticks “` and also ends with three back ticks “`. The options for controlling the R output is same as was discussed above, one can use “echo” and “eval” options to control if code and output are to be shown in the report or not.
The upside of using Markdown reports is that unlike LaTeX reports you don’t need to know anything about what tags and headers. The downside is you can’t really control the formatting of your text in the same way as you can with LaTeX report.
knitr is not limited to R only, infact it can be used with Python as well as SAS. So the next time you start an analysis don’t forget to use knitr to ease the workflow.