Suggested Guidelines for Rigorous, Reproducible Research: Pre-specifying a Statistical Analysis Plan

Randomized controlled clinical trials (RCCT) require a pre-specified statistical analysis plan (SAP), to insure the study's integrity and believability while enabling the reproducibility of the final analysis. This may be even more important for observational research in which there is no unique path to choosing the statistical analysis. Regardless of the study design, the investigator and collaborating biostatisticians should follow a rigorous scientific approach12. This is aligned with the ASA's Ethical Guidelines for Statistical Practice which suggest professional citizenship and responsibility.

One step in operationalizing this approach is to formulate full analyses plans before looking at the data, with the only modifications to the plan resulting from finding too much missing data or unreliability in variables involved, through objective examination [These kinds of examinations can also be done before the analysis plan is written.]. As of November 2011 it is the policy of the Department of Biostatistics that all studies and experiments in which department biostatisticians are involved, other than brief and/or simple consultations, have an investigator and lead statistician approved, dated, pre-specified SAP on file (ideally our wiki under the central projects area or a server) and any changes documented with date and reason. Care should be exercised when dealing with confidential plans.

Analysis Plan Strategy

A detailed analysis plan is vital to conducting quality research. It communicates the goals, hypotheses, and methodology of the study, assists with the scientific integrity, and helps establish reproducibility. The researchers should spend adequate time in designing the study to address the research question. The plan should be limited to the scientific goals, questions, and/or hypotheses. Bias caused by exploration should be estimated (e.g., using the bootstrap or 100 repeats of 10-fold cross-validation), taking into account (by repeating) all analysis steps for each resample 3. It is generally not sufficient to end an article with a "need to replicate" statement. The researchers should:

  • Get buy-in from senior collaborators2
    • Here are some steps for meeting with researchers to develop analysis plans, and for following through with them.
    • Deviations from the plan should be discussed with the appropriate individuals before work is begun. All agreed changes should be documented.
  • Pre-specify strategy for formulating the final analysis
    • Specify a priority order for analyses and reporting.
    • The plan should include but is not limited to the scientific goals, strategies, hypotheses, outcome(s), covariates, and confounders.

Changes to Plan

The analysis plan is not considered to be a contract, but rather a set of goals and guidelines for the analysis of research outputs. Nevertheless, deviations of the plan should be documented so that they may be taken into consideration when statistical analyses and modeling are carried out.

Possible circumstances under which the plan might be changed include:

  • Changes in the objectives of the research team not predicated by data snooping
  • Loss of data, or failure to acquire intended data
  • Excessive amount of missing data in a pre-specified analysis variable, without an opportunity to improve data completeness
  • Additional data that is made available, which was not presumed to be available during the formulation of the analysis plan
  • The inability of acquired data to satisfy the assumptions anticipated in the analysis plan
  • Results suggest research phenomenon is more/less general than had been anticipated
  • Results suggest estimates will be insufficiently precise to meet objectives
  • Results suggest planned statistical approach will be inappropriate (e.g. more complicated model required, perhaps with additional covariates)
While changes due to these (and other) eventualities are sometimes inevitable, failing to document such changes can result in invalid statistical inference in some circumstances.


[1] Donald B. Rubin. The design versus the analysis of observational studies for causal effects: Parallels with the design of randomized studies. Stat Med, 26:20-36, 2007.Key:rub07des
[2] Laine Thomas and Eric D. Peterson. The value of statistical analysis plans in observational research: Defining high-quality research from the start. JAMA, 308:773, 2012.Key:thomas
[3] Frank Harrell Jr. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer New York, 2001.Key:harrell


American Statistical Association statement on how to improve grant applications that use statistics.
Topic revision: r11 - 01 Jun 2017, FrankHarrell
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback