Pen & Paper

Rmarkdown for Scientific Papers

14 Mar 2015

  library(knitcitations); library(bibtex); cleanbib()
  cite_options(citation_format = "pandoc", check.entries=FALSE)
  write.bibtex(c(citation("bibtex"), citation("knitr")[1], citation("knitcitations"), 
    citation("xtable"), citation("RefManageR"), citation("rmarkdown")), file="init.bib")
  bib <- read.bibtex("init.bib")
  bib.1 <- read.bibtex("citavi.bib")


This document was created based on a .Rmd template from a blog post by (Keil 2017). The R package “knitr” (Xie 2015) is used to convert the Rmd file to html, pdf or word format. “knitr” has its strengths in reproducible research but it is not designed to produce citations for a scientific paper. The .Rmd template makes use of an xml file for formatting citation styles, “knitcitations” (Boettiger 2017) and “bibtex” (Francois 2017) for generating citations from DOI lookup or bibtex entries.


The template gives examples of writing mathematical equations using $\LaTeX$, formatting R outputs using either knitr::kable (Xie 2015) or “xtable” (Dahl 2016), producing graphic plots using the base “plot” function and generating citations using automatic DOI lookup or “bibtex” (Francois 2017) with knitcitations::citep and knitcitations::citet (Boettiger 2017).


Rmarkdown (Allaire et al. 2017) has full documentation for its syntax. Statistical analysis with plots and tables can be easily created in a .Rmd file by embedding and running “R code chunks” while math equations are produced using $\TeX$ or $\LaTeX$. Rmarkdown (v2) has built-in support for citation as it is based on Pandoc, but it does not have automatic DOI lookup and is better suited to work in conjunction with a citation manager from which bibliography files can be generated and exported for its use.


Inline equations are enclosed by $ with no space following or preceding. A separate paragraph for equations is enclosed by $$ following/preceding with a single space. Sharelatex has detailed documentation for creating mathematical expressions.

The binomial coefficient is defined as

$$ \binom{n}{k} = \frac{n!}{k!(n-k)!} $$
\[ ... \]

$$ \binom{n}{k} = \frac{n!}{k!(n-k)!} $$

These are all Greek α, β, θ0, ε2, η, λ2, μ, τ, σ

In least squares prediction models, we estimate β0, β1, β2, ...βn by minimizing the RSS

\[ RSS=\sum_{i=1}^{n} \Big( y_i - \beta_0 - \sum_{j=1}^{p}\beta_{j}x_{ij}\Big)^2 \]

$$ RSS=\sum_{i=1}^{n} \Big( y_i - \beta_0 - \sum_{j=1}^{p}\beta_{j}x_{ij}\Big)^2 $$


fit <- lm(wage ~ poly(age, 4), data=Wage)
kable(summary(fit)$coef, digits=2, caption="This is a 4th degree polynomial. Coef output knitr::kable")
This is a 4th degree polynomial. Coef output knitr::kable
Estimate Std. Error t value Pr(>|t|)
(Intercept) 111.70 0.73 153.28 0.00
poly(age, 4)1 447.07 39.91 11.20 0.00
poly(age, 4)2 -478.32 39.91 -11.98 0.00
poly(age, 4)3 125.52 39.91 3.14 0.00
poly(age, 4)4 -77.91 39.91 -1.95 0.05


## fit<-lm(Wage$wage~poly(Wage$age,4),data=Wage)
age.grid<-seq(from=agelims[1], to=agelims[2])
preds <- predict(fit, newdata=list(age=age.grid), se=TRUE)
se.bands <- cbind(preds$fit+2*preds$, preds$fit-2*preds$
## par(mfrow=c(1,2), mar=c(4.5,4.5,1,1),oma=c(0,0,4,0))
plot(age, wage, xlim=agelims, cex=.5, col="darkgrey")
title("Degree-4 Polynomial", outer=F)
lines(age.grid, preds$fit, lwd=2, col="blue")
matlines(age.grid, se.bands, lwd=1, col="blue", lty=3)
Fig. 1 - Degree-4 Polynomial. Relationship between Wage and Age (data(Wage) in ILSR. The dotted lines are 95% confidence intervals.
Fig. 1 - Degree-4 Polynomial. Relationship between Wage and Age (data(Wage) in ILSR. The dotted lines are 95% confidence intervals.


In “The Elements of Statistical Learning”, Hastie, Tibshirani, and Friedman (2009) explain with practical examples the application of ridge regression/lasso. The book covers some advanced materials in data mining, inference and prediction. For a less technical treatment of the same subjects, “An Introduction of Statistical Learning” (James et al. 2013) should be a good start.

The two types of citation above are respectively generated by

citet(bib.1[["Hastie.2009"]]) and
citep("DOI 10.1007/978-1-4614-7138-7")

citet and citep may refer to either DOI or a bibtex entry, citet(bib.1[["Hastie.2009"]]) generates Hastie, Tibshirani, and Friedman (2009) where “bib.1”" is a R object created by bib.1<-read.bibtex("name of bibliography file") and “Hastie.2009” is the bibtex entry ID.


Plots, tables, math equations and citations are indispensible elements of any scientific papers. The Rmd template is a quick and convenient way to produce them. The finished Rmd file can then be “knited” to html, pdf or word format for submission in RStudio. To publish this in a jekyll blog, what I did was to knit it to html and include the html file in a post.

Finally, the reference list below is produced by using “bibtex” to write out all citations made in the paper write.bibtex(file="references.bib"). reference.bib and a style file are declared in the front matter.


Allaire, JJ, Joe Cheng, Yihui Xie, Jonathan McPherson, Winston Chang, Jeff Allen, Hadley Wickham, Aron Atkins, Rob Hyndman, and Ruben Arslan. 2017. Rmarkdown: Dynamic Documents for R.

Boettiger, Carl. 2017. Knitcitations: Citations for ’Knitr’ Markdown Files.

Dahl, David B. 2016. Xtable: Export Tables to LaTeX or HTML.

Francois, Romain. 2017. Bibtex: Bibtex Parser.

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning. Springer Series in Statistics. Dordrecht: Springer.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. Springer New York. doi:10.1007/978-1-4614-7138-7.

Keil, Petr. 2017. “» Simple Template for Scientific Manuscripts in R Markdown.” Petr Keil.

Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC.