Help for Collaborators for Editing Reproducible LaTeX / knitr Documents | Collaborative Editors | Workflows | Requirements

NOTE: More up to date information may be found in BiostatisticianResponsibilities

The Department of Biostatistics practices reproducible research by creating dynamic, professional statistical reports using LaTeX along with the R knitr package. This involves creating documents with mixtures of regular text to be typeset in a pdf report and R code enclosed by a line containing <<>>= and a line containing @. When knitr is run it produces a regular LaTeX document with all the results of the R computations and graphics inserted at the appropriate spots. knitr also works with the easy-to-learn markdown markup language which is especially suitable for student homework assignments and simple reports. But markdown lacks the ability to handle complex tables, vector graphics that survive magnification, marginal notes, user extensions via macros and complex custom styles, and a full range of bibliographic citation styles. Also, markdown will not produce manuscripts that meet journal style rules.

knitr with LaTeX or markdown makes statistical reports fully reproducible, but when writing a manuscript several problems arise, including
  1. collaborators insert parts of the report into the manuscript using sometimes error-prone copy and paste operations
  2. collaborators include a good many derived numeric quantities in sentences, and statisticians have difficulty telling who produced which calculations, plus they don't know which portions of the manuscript need checking
  3. the manuscript is not reproducible, and if anything is recomputed several manual operations need to be repeated

In order to create reproducible manuscripts, it would be ideal if a single master file can be edited by both subject matter experts and statisticians, with statisticians being responsible for all the code in the document in addition to descriptions of statistical methods and interpretation of analyses. By creating a reproducible manuscript, all calculations can be re-done, tables and graphs recreated, and the new final manuscript produced by issuing a single command.

LaTeX has been the premier system for typesetting documents since the 1980s. It has its own markup language and is very easy to customize. The end result is pdf or html. LaTeX is used by many journal and book publishers. It is heavily used in the physical and mathematical sciences. An excellent introduction to LaTeX may be found here and the LaTeX Cheetsheet may be found here. When you are working with a statistician, the statistician will have already created the document and the bibliographic database to cite, so collaborators need only concern themselves with editing paragraphs, sentences, section and subsection titles, and figure and table captions, and citing from the database. Here are some notes that show what non LaTeX users need to know to do such editing. The chunks of R code are edited by the statistician(s) on the team and should not be modified by collaborators except for figure captions. The goal here is to produce entire manuscripts that are reproducible, with final editing (including possible conversion to Word) deferred until submission time. During final editing, the journal's style is incorporated and any statistical results that are for supporting documentation and not for inclusion in the body of the paper are made to not be printed or are printed in an appendix at the end.

Manuscript writing involves a great deal of back-and-forth thinking and writing by collaborators. The philosophy here is that comments, questions, answers, to-do lists, and marginal notes are contained in the body of the paper. At the very end a command at the top of the document is changed so that these elements are not rendered in the final document. This process uses a custom LaTeX style we call spaper for "paper with statistical results". The style may be downloaded from the attachments at the bottom of this paper.

Here are the main considerations that non-statistician collaborators will need to know when editing/adding sentences, paragraphs, sections, subsections, and figure captions or when citing papers or books. Most of the sentences you write will be just ordinary text as you always write. There are a few exceptions:
  • The symbols $ % & # _ have special meaning to LaTeX, and if you want them to appear you need to escape them by preceeding them by a backslash * Here is an example sentence: The treatment reduced elasticity by 23\% but cost \$9,580 per patient.
  • To keep a line break from happening at a space, replace the space with ~, e.g., See Figure~\ref{fig:anova}.
  • To put a phrase in italics use \emph{some text}, and use \textbf{for boldface}
  • To start a new paragraph, leave a blank line (no indenting needed)
  • To double quote a phrase, begin it with two single back ticks (``) and close it with two apostrophes (''). To single quote, use one of each of these characters.
  • To cite an article use \cite{key} where keys will be given to you by the statisticians
  • A subscript is written as $_{thing to subscript}$ . What is between $ and $ (no escape \) is typeset in math mode.
  • A superscript is specified as $_{thing to superscript}$.
  • A word or letter ending in . will cause insertion of a space. To prevent this, follow the dot with \
  • To put a comment into the document that is to be ignored in every way when typesetting/printing the document, start the line with %. You can also put a % in any line of regular text, without escaping it with \. The text to the right of % will be ignored.
  • If you want to compose a numbered list, use the following example:
\begin{enumerate} \item First thing \item Second thing \item Third thing \end{enumerate} To compose a bullet list instead, replace enumerate with itemize.

Here are some special features of the spaper style for inclusion of comments / discussions / questions / to do lists / short marginal flags in the document.
  • To put a short string of text in the margin of the document at a certain point, type for example \snote{your initials} or \snote{CHECK}.
  • To write a comment of any length at a certain point, use the \com* command that the statistician defined for each author, e.g. \comfh{Some text}. To use the general \com command for an undefined author use \com{author initials or name}{comment of any length, line breaks ignored}.
  • To turn off comments and marginal notes so that they do not appear in the pdf document, put the following line anywhere in the document after \usepackage{spaper}. This way you do not need to remove the \com* or \snote commands in case you ever want to see their output again.
\ignorecom

The following are special to R and knitr
  • To reference a figure produced by R in the text, write e.g. \ref{fig:chunkname} where the referenced R chunk started with <>=
  • Sometimes the statistician will tell R to compute something such as a sample size of P-value. Once you know the names of the variables holding these values, you can insert them dynamically into the text using for example The evidence for a treatment effect was strong (P=\Sexpr{pvalue}).

In the attachments below you will find the script for an example report (such scripts have a file type suffice of .Rnw or .rnw), the resulting final pdf file, and the spaper style file that is referenced by the document.

spaper.sty requires Sweavel.sty which is available here
Topic attachments
I Attachment Action Size Date Who Comment
spaper.RnwRnw spaper.Rnw manage 2.7 K 07 Jun 2015 - 09:00 FrankHarrell Example LaTeX/knitr working manuscript
spaper.pdfpdf spaper.pdf manage 232.5 K 07 Jun 2015 - 09:00 FrankHarrell Example working manuscript using LaTeX, R, knitr, spaper style
spaper.stysty spaper.sty manage 2.1 K 07 Jun 2015 - 09:02 FrankHarrell LaTeX style definition file for manuscripts with statistical analysis with R/knitr
Topic revision: r7 - 10 Jul 2017, FrankHarrell
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback