Statistical Reporting, Linking S Output with Report Documents, Literate Programming, Managing Analyses, and Documenting Programs and Data | Reproducible Research | RR Planet | Department Reproducible Reporting Activities
Note: This page is out of date. Go to
https://hbiostat.org/rr
- Statistical Tables and Plots using S and LaTeX; FE Harrell (PDF with hyperlinks and bookmarks). This document shows all of the LaTeX and S code needed to produce the entire document. Heavy use is made of the
Hmisc
library's summary.formula
function for semi-advanced table making and conversion of selected tables to graphics. The document shows how to automatically get hyperlinks in the final .pdf
file using the LaTeX hyperref
style. It also shows how easy it is to use the LaTeX \input
command to include tables and computed values created by S, which allows a report to easily be updated, reassembled, re-cross-referenced, etc., if any component tables or plots change. The Heiberger-Harrell latex
function in Hmisc
is used to interface S with LaTeX.
- Zipped Word version of
summary.pdf
created by http://www.pdftoword.com | PDF version created by Word
- OpenOffice Example of Table 12 of the above document, created by opening the html document created by the
htlatex
command of the TeX4ht
package (see SweaveConvert) and removing links to images so they are saved in-line. LaTeX code for that table was first saved into a file and a prologue containing \usepackage{color,calc,epic}
was inserted (for tables using the ctable
style, ctable
would have to be included in this prologue).
- Examples of David Whiting's additions to the
latex
function to allow more fine control of typesetting
- The rreport package for reporting clinical trials analyses
Features of LaTeX that make it excellent for report composition include:
- the
\input{filename}
command for assembling LaTeX-coded tables created by S or SAS, or for including definitions of variable values to be included in the report (e.g., a P-value to insert in the text)
- the
\verbatimtabinput{filename}
command (part of the LaTeX moreverb
package) for including verbatim tabular listings that are not to be reformatted by LaTeX
- the
\includegraphics{filename}
command for including graphics files (usually postscript for LaTeX and pdf for pdflatex)
- automatic recompilation of the entire report by re-running the
latex
or pdflatex
command whenever any of the component figures, variable values, or tables change, with automatic regeneration of cross-references, page numbers, etc.
- excellent facilities for cross-referencing, table of contents, table of tables, table of figures, indexes, and bibliographic citations
- the
hyperref
style in LaTeX makes all bibliographic citations, references, sections, subsections, tables of contents, indexes, etc., automatically hyperlinked in the final pdf document, with no work by the user; the table of contents is bookmarked
- you can have "if-then" constructs in LaTeX for conditional output generation
- by adding the
pdfscreen
style to the LaTeX master file you can produce a hyperlinked pdf document specially formatted for on-screen viewing
- you can produce somewhat complex tables, e.g., tables showing small numerators and denominators next to percents, that are very readable
One of many features of S that makes it an excellent tool for producing statistical reports is its object orientation. By writing both a table-producing method and a plot method for each type of analysis you can convert tables to plots. See the
Hmisc
library's
summary.formula
function for many examples of this.
Click
here for information about how the University of Wisconsin Statistical Data Analysis Center uses S and LaTeX for producing primarily graphical reports for Safety and Data Monitoring Committees.
- Literate programming: Writing documentation containing computer code. Documentation (and perhaps a statistical report) and code are maintained in one file. An extractor program such as FunnelWeb splits out the code to compile. H.P. Wolf and P. Naeve have done a lot of work in this area. In Peter Wolf's words
- Some years ago we have developed a system for reporting the steps of a data analysis. The system is based on the ideas of literate programming.
Noweb
and LaTeX are used to generate nice output. The result of the tangle path can be reloaded by our function revive()
into the S-Plus interpreter. Then you can select and extract the elements of the old analysis, you can modify them and you can activate the statements again. Therefore, our tool can be used for teaching, demonstrations, case studies, ... We have constructed a lot of papers for our statistics courses in this way.
Click
here for their papers that are written in English. These include a nice "live" statistics text (
A Revivable Book of Statistics) in which their
revive()
function is used to extract S-Plus code from the book for interactive replay by the user who is connected to an S-Plus session.
- The Sweave approach to literate programming using R | SweaveLatex
- Converting Sweave LaTeX documents for use in word and posting on web pages, and using the R
odfWeave
package to create OpenOffice documents
- The Statdocs project at UC Berkeley, based on integrating XMLS, HTML, JavaScript, and R
- Managing analysis projects using conditional processing of sections of S code: see Chapter 13 of Alzola & Harrell for information about the S
do
function and using Makefiles
on Windows. (This text also contains an example of using another, more flexible, tool for managing program execution: Perl). do
makes it easy to run only the sections of the analysis that you want to re-do. Each section can automatically generate its own listing output file which is not overwritten by output files from other sections. The graphics files generated by each code section can automatically be given a section-specific file name prefix. do
works especially well with batch job processing.
- Reproducible electronic documents from Matt Schwab and Jon Claerbout of Stanford University. This approach is based on the
make
utility readily available for Unix, Linux, and Win95/98/NT. Final figures and calculations are easily regenerated by running make
, which senses file dependencies and creation/modification dates to re-run whatever needs to be re-run to build the final product. Quoting Schwab and Claerbout, "It takes some effort to organize your research to be reproducible. We found that although the effort seems to be directed to helping other people stand up on your shoulders, the principal beneficiary is generally the author herself. This is because time turns each one of us into another person, and by making effort to communicate with strangers, we help ourselves to communicate with our future selves."
- Example of a reproducible publication and here
- Literate statistical practice by Anthony Rossini and Friedrich Leisch
- Papers on reproducible research from the Bioconductor project
- Reproducible research: The Bottom Line by Jan de Leeuw
- Charles Geyer's excellent page on literate programming and related areas
- Roger Peng's examples of reproducible research
- Why should you avoid using point-and-click methods in statistical software packages by C Baum and S Sirin, Boston College
- PERSPECTIVE, Hypertext Data Analysis Mapping software from Pharmaceutical Outcomes Research, Inc. This software allows analysts to track, review, communicate, and document analysis results using HTML.
- Reproducibility in Econometrics Research by Roger Koenker. A document on that page describes many useful approaches, including an S-Plus function
how.created
that makes it easy to attach to an object the following information: comments, user name, date, and the environment in effect (e.g., the search list) when the object was created.
- Tony Rossini, a developer of ESS ("Emacs Speaks Statistics") is working on using
Noweb
with Emacs bookmarking facilities for tying S output chunks and figures to reports.
- University of Michigan ICPSR Guide to Social Sciences Data Preparation and Archiving - includes information on data entry, quality control, data management, codebooks and other documentation, and archiving
- Emacs/LaTeX/UNIX tools Information
- Globally Accessible Statistical Procedures (web-based statistical programs)
- "A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data" by Titus Brown, a benchmark example of reproducible research in genomics
- Excellent
knitr
examples from The Statistical Sleuth