Statistical Reporting, Linking S Output with Report Documents, Literate Programming, Managing Analyses, and Documenting Programs and Data | Reproducible Research | RR Planet | Department Reproducible Reporting Activities

Note: This page is out of date. Go to https://hbiostat.org/rr

  • Statistical Tables and Plots using S and LaTeX; FE Harrell (PDF with hyperlinks and bookmarks). This document shows all of the LaTeX and S code needed to produce the entire document. Heavy use is made of the Hmisc library's summary.formula function for semi-advanced table making and conversion of selected tables to graphics. The document shows how to automatically get hyperlinks in the final .pdf file using the LaTeX hyperref style. It also shows how easy it is to use the LaTeX \input command to include tables and computed values created by S, which allows a report to easily be updated, reassembled, re-cross-referenced, etc., if any component tables or plots change. The Heiberger-Harrell latex function in Hmisc is used to interface S with LaTeX.
    • Zipped Word version of summary.pdf created by http://www.pdftoword.com | PDF version created by Word
    • OpenOffice Example of Table 12 of the above document, created by opening the html document created by the htlatex command of the TeX4ht package (see SweaveConvert) and removing links to images so they are saved in-line. LaTeX code for that table was first saved into a file and a prologue containing \usepackage{color,calc,epic} was inserted (for tables using the ctable style, ctable would have to be included in this prologue).
    • Examples of David Whiting's additions to the latex function to allow more fine control of typesetting
  • The rreport package for reporting clinical trials analyses

Features of LaTeX that make it excellent for report composition include:
  • the \input{filename} command for assembling LaTeX-coded tables created by S or SAS, or for including definitions of variable values to be included in the report (e.g., a P-value to insert in the text)
  • the \verbatimtabinput{filename} command (part of the LaTeX moreverb package) for including verbatim tabular listings that are not to be reformatted by LaTeX
  • the \includegraphics{filename} command for including graphics files (usually postscript for LaTeX and pdf for pdflatex)
  • automatic recompilation of the entire report by re-running the latex or pdflatex command whenever any of the component figures, variable values, or tables change, with automatic regeneration of cross-references, page numbers, etc.
  • excellent facilities for cross-referencing, table of contents, table of tables, table of figures, indexes, and bibliographic citations
  • the hyperref style in LaTeX makes all bibliographic citations, references, sections, subsections, tables of contents, indexes, etc., automatically hyperlinked in the final pdf document, with no work by the user; the table of contents is bookmarked
  • you can have "if-then" constructs in LaTeX for conditional output generation
  • by adding the pdfscreen style to the LaTeX master file you can produce a hyperlinked pdf document specially formatted for on-screen viewing
  • you can produce somewhat complex tables, e.g., tables showing small numerators and denominators next to percents, that are very readable

One of many features of S that makes it an excellent tool for producing statistical reports is its object orientation. By writing both a table-producing method and a plot method for each type of analysis you can convert tables to plots. See the Hmisc library's summary.formula function for many examples of this.

Click here for information about how the University of Wisconsin Statistical Data Analysis Center uses S and LaTeX for producing primarily graphical reports for Safety and Data Monitoring Committees.
  • Literate programming: Writing documentation containing computer code. Documentation (and perhaps a statistical report) and code are maintained in one file. An extractor program such as FunnelWeb splits out the code to compile. H.P. Wolf and P. Naeve have done a lot of work in this area. In Peter Wolf's words
    • Some years ago we have developed a system for reporting the steps of a data analysis. The system is based on the ideas of literate programming. Noweb and LaTeX are used to generate nice output. The result of the tangle path can be reloaded by our function revive() into the S-Plus interpreter. Then you can select and extract the elements of the old analysis, you can modify them and you can activate the statements again. Therefore, our tool can be used for teaching, demonstrations, case studies, ... We have constructed a lot of papers for our statistics courses in this way.
Click here for their papers that are written in English. These include a nice "live" statistics text (A Revivable Book of Statistics) in which their revive() function is used to extract S-Plus code from the book for interactive replay by the user who is connected to an S-Plus session.
  • The Sweave approach to literate programming using R | SweaveLatex
  • Converting Sweave LaTeX documents for use in word and posting on web pages, and using the R odfWeave package to create OpenOffice documents
  • The Statdocs project at UC Berkeley, based on integrating XMLS, HTML, JavaScript, and R
  • Managing analysis projects using conditional processing of sections of S code: see Chapter 13 of Alzola & Harrell for information about the S do function and using Makefiles on Windows. (This text also contains an example of using another, more flexible, tool for managing program execution: Perl). do makes it easy to run only the sections of the analysis that you want to re-do. Each section can automatically generate its own listing output file which is not overwritten by output files from other sections. The graphics files generated by each code section can automatically be given a section-specific file name prefix. do works especially well with batch job processing.
  • Reproducible electronic documents from Matt Schwab and Jon Claerbout of Stanford University. This approach is based on the make utility readily available for Unix, Linux, and Win95/98/NT. Final figures and calculations are easily regenerated by running make, which senses file dependencies and creation/modification dates to re-run whatever needs to be re-run to build the final product. Quoting Schwab and Claerbout, "It takes some effort to organize your research to be reproducible. We found that although the effort seems to be directed to helping other people stand up on your shoulders, the principal beneficiary is generally the author herself. This is because time turns each one of us into another person, and by making effort to communicate with strangers, we help ourselves to communicate with our future selves."
  • Example of a reproducible publication and here
  • Literate statistical practice by Anthony Rossini and Friedrich Leisch
  • Papers on reproducible research from the Bioconductor project
  • Reproducible research: The Bottom Line by Jan de Leeuw
  • Charles Geyer's excellent page on literate programming and related areas
  • Roger Peng's examples of reproducible research
  • Why should you avoid using point-and-click methods in statistical software packages by C Baum and S Sirin, Boston College
  • PERSPECTIVE, Hypertext Data Analysis Mapping software from Pharmaceutical Outcomes Research, Inc. This software allows analysts to track, review, communicate, and document analysis results using HTML.
  • Reproducibility in Econometrics Research by Roger Koenker. A document on that page describes many useful approaches, including an S-Plus function how.created that makes it easy to attach to an object the following information: comments, user name, date, and the environment in effect (e.g., the search list) when the object was created.
  • Tony Rossini, a developer of ESS ("Emacs Speaks Statistics") is working on using Noweb with Emacs bookmarking facilities for tying S output chunks and figures to reports.
  • University of Michigan ICPSR Guide to Social Sciences Data Preparation and Archiving - includes information on data entry, quality control, data management, codebooks and other documentation, and archiving
  • Emacs/LaTeX/UNIX tools Information
  • Globally Accessible Statistical Procedures (web-based statistical programs)
  • "A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data" by Titus Brown, a benchmark example of reproducible research in genomics
  • Excellent knitr examples from The Statistical Sleuth
Topic attachments
I Attachment Action Size Date WhoSorted ascending Comment
latexFineControl.pdfpdf latexFineControl.pdf manage 187.3 K 19 Dec 2005 - 08:01 FrankHarrell Fine Control of LaTeX Typesetting with latex()
s6a.odtodt s6a.odt manage 16.6 K 20 Jul 2007 - 16:17 FrankHarrell OpenOffice version of advanced table translated from LaTeX
summary.doc.pdfpdf summary.doc.pdf manage 561.0 K 24 Nov 2009 - 13:35 FrankHarrell summary.pdf -> summary.doc -> pdf created with Word
summary.pdfpdf summary.pdf manage 567.0 K 20 Jul 2007 - 16:46 FrankHarrell Statistical Tables and Reports using S and LaTeX
summary.zipzip summary.zip manage 357.9 K 28 Oct 2009 - 18:01 FrankHarrell summary.pdf from pdflatex converted to Word using PDFtoWord - zipped .doc file
summaryPDF2doc2PDF.pdfpdf summaryPDF2doc2PDF.pdf manage 770.8 K 21 Dec 2011 - 13:21 FrankHarrell summary.doc converted to pdf using wordtopdf.com
Topic revision: r29 - 23 Jun 2020, FrankHarrell
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback