Good Statistical Practice


Analysis File Sharing

Sometimes a researcher requests a copy of analysis files we produce for long-term projects. If the researcher uses this only for administrative analyses (e.g., analysis of patient accrual by site and by time, descriptive statistics on baseline variables), this usually presents no problems. Otherwise the statistician needs to be sure that the following principle is respected: any analysis that involves inference or discovery of features needs to be reproducible from a single script. The statistician needs to know about and document all analyses that were ever performed that were used to discover features and to make inferences (e.g., compute P-values, confidence limits, and Bayesian posterior probabilities). The statistician needs to be aware that some analyses done by others may find promising leads that when further analyzed by the statistician result in undocumented multiplicity problems that are difficult to correct for. In many cases the only way to practice reproducible research, which is one of the key philosophies of the Department of Biostatistics, is for the statistician to be the guardian of the analysis file. In some cases having the statistician and the subject-matter researcher work from a common detailed written statistical analysis plan can provide adequate protection.


At a department meeting on 18Oct06, statisticians in the department voted in favor of the following policy:
  • In a small dataset (e.g., < 15 observations per category) it is mandatory to show the raw data in a graphic in a publication, and this cannot be done with dynamite plots (bar charts with error bars). Example

The vote was 22 in favor, 3 against. All 3 against would be in favor if "mandatory" were replaced with "usually". Therefore the following is department policy:
  • Dynamite plots often hide important information. This is particularly true of small or skewed data sets. Researchers are highly discouraged from using them, and department members have the option to decline participation in papers in which the lead author requires the use of these plots.
Topic revision: r5 - 19 Nov 2008, FrankHarrell

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback