Recommendations, Analyses, and Data for Health Services Research, Diagnosis, and Prognosis Clinic
Notes 2015

2015 Dec 14

Sachin Patel, Psychiatry

  • Animal model for exposure to stress, long at differential response to stress
  • Interested in susceptibility to stress
  • Measure of anxiety is a key measure (high = more anxious)
  • Each animal has a baseline measure
  • Would be good to do a Tukey mean-difference plot (Bland-Altman plot) to be sure that the delta is an adequate summary of the two measures
    • Also watch for floor and ceiling effects
  • Using the delta as a continuous stress response measure will optimize power and minimize arbitrariness
  • Discussed regression to the mean
  • Problem with choice of anxiety measure out of many
  • A composite measure may help, e.g., average z-score or average rank; can do Spearman rho rank correlation on the result, against another variable; can describe variability in ranks across anxiety measures
  • Otherwise analyses of disparate measures can be hard to reconcile

2015 Dec 7th

Pierce Trumbo

  • Shade tree clinic, where patients do not have insurance or do not have enough insurance can get medical service.
  • Primary outcomes: number of ER visits, length of hospital length of stay. Will compare before and after pts visited the clinic.
  • N=680 patients and estimate to have ~300 meet inclusion (time span between first visit and last visit greater or equal to 1 year).
  • Need estimate for VICTR application. Suggest for $5000.

2015 Nov 30th

Andrew T. Hale, Medical Scientist Training Program

  • Need a quote for biostatistical support for a VICTR grant submission
  • Want to assess the association between brain tumor grade (total 128, 93 1s and 35 2s) and gender, age at diagnosis, Edema (0-3), draining vein, necrosis, location (8 different location).
  • Will apply for VICTR voucher in amount of $2000.

Christopher John Prendergast, Tracy McGregor

  • We will specifically be seeking some guidance regarding graphical representation of data related to statin doses in children and adolescents.

Christopher Lee Brown

  • Discussed analysis for reviewer's comments

2015 Nov 23rd

Mark A. Clay, Divisions of Cardiology and Critical Care

  • The purpose of the study was to evaluate whether patients with single ventricle physiology undergoing the second stage of surgical palliation, whoís length to weight ratio was >90% were at higher risk for increased ICU length of stay, ventilator times, and increased non-invasive ventilation when compared to those whose length for weight was <90%. Analyzing the data with the Mann-Whitney U Test there was a statistically significant difference between ICU length of stay and ventilator hours for those with weight for length >90% compared to those <90%. However, I attempted to analyze the data again with Spearmanís to see if there was a correlation between increasing z-score percentile and there was no statistically significant correlation.
  • Clinic question: Has the data been analyzed appropriately to answer the question? Should I be concerned that Spearmanís correlation did not show a statistically significant correlation between the variables even though there was a statistically significant difference between the groups? Should I use and how might I best demonstrate association or risk related to weight for length z-score >90% with linear regression?

Rebekah Griesenauer (Conley), Biomedical Engineering

  • I am designing a study for a small group of human subjects to test the feasibility of a new tool that I designed for breast cancer assessment using medical images. I would like some guidance on effective study designs for a small number of patients and for determining the accuracy of a new tool when there is no current clinical equivalent to compare to.
  • Need a measureble outcome to calculate the required sample size

2015 Nov 16th

Aaron C. Shaver, M.D., Ph.D. Assistant Professor of Pathology, Microbiology, and Immunology

  • The csv consists of sample ID, the covariates I want to test (age as an integer and categorical variable; poor.risk through transcription, which are all categorical variables; and num.muts, which is an integer) and the OS and PFS data (for censoring rows, 0=censored and 1=dead). I would like to include the interaction between age and poor.risk, because I have biological reason to believe that that interaction is relevant. My questions concern: measuring goodness of fit of the model; how to interpret the interaction term; how to estimate power, given the large number of covariates and small sample size

2015 Nov 9th

Fernanda Maruri

  • "If possible I would like some help interpreting results of 2 Wilcoxon Rank Sum tests in which one is significant and the other is not."
  • Compare

Jessica Kaitlin Campbell

  • The goal of the project is to examine the impact that the palliative care unit has had on the medical intensive care unit in terms of patient length of stay and mortality. I have collected data regarding some parameters per and post opening of the palliative care unit. I am interested in the best approach in analyzing the data.
  • Have data a year before and a year after the unit opened. Want to compare LOS and mortality in MICU. Both groups had palliative consult, only some patients after went to the palliative care unit.
  • Wil apply for VICTR biostat support. Suggest for $5000 study.

2015 Nov 2nd

Gabriella D. Cozzi

  • GDM project analysis. Associaion between hoursehold income and education with the five primary endpoints.

Michael Chomat

  • To discuss experimental study design and data analysis for a project within REDCaps
  • Two REDCap data base can be merged based on common identifier.

Jamie Robine

  • R questions about fitting logistic regression model and plotting the figure.

2015 Oct 19th

Rebecca Cox, Psychology

  • I am working with the data from the National Comorbidity Survey Replication, a nationally representative sample used to estimate prevalence rates of psychological disorders. I have questions about what types of analyses to use with a complex sampling design that includes strata, clusters, and weights.

*Suggest review survey document to specify correct strata, clusters and weightings variables *Set up complex survey design effects in SPSS complex module.
  • Subpopulation command in stead of subset anaysis.

2015 Oct 5th

Stevenson, David, Health Policy

  • Stratified cluster randomized trial.
  • Intervention group: predicted mortality risk score obtained for all the patients, based on which "top patients" will be provided with hospice and will be expected to get better life quality. Control group: standard care. Individual agencies (50 in total) will be randomized to intervention/control group.
  • Cutoff of risk score may vary within and across the sites. Information obtained from prediction model: median life expectancy, probabilities of death during certain time periods.
  • If the primary outcome is continuous, would need SD to calculate sample size. If it's the time to event, we will need expected median time in each group.
  • Consider the flexible/sequential design, having pilot sites included in the final analysis.
  • Biostat resources: VICTR Voucher (35 hours). Dr. Matt Shotwell

Conor McWade, ED, PhD student

  • Apply for Voucher (90 hours)
  • Have collected car collision/victims data, demographics of the passengers, road characteristics.
  • Define collision as fatal vs serious.
  • Aim to develop a prediction model to predict the severity of collision based on location, time, etc.

2015 Sep 28

Katrina, Electrical Engineering

  • Our study is on incidence of eye disease seen at Vanderbilt. We have data on 33,000 patients looking at incidence of disease and I would like to discuss how to best analyze this data.
  • Whether incidence at Vanderbilt can represent incidence in Nashville


  • I was hoping to come to biostats clinic today to get some help with sample size calculations for my project.
  • Cross over design. Within subject correlation is 0.7. Need power calculation.

Wes Clord

  • Power analysis of survival analysis

2015 Sep 21

Jose A Arriola, PGY 3 - Psychiatry

  • "I plan to implement a different type of interview in the first episode psychosis outpatient clinic at Vanderbilt Psychiatric Hospital and investigate how it contributes to improve adherence and management. The type of interview is called Shared decision making approach which is a little bit different to what we are used to. I am planning to train the MDs and providers on this technique and then compare measurable outcomes before and after the training. The outcomes would be no-show clinic rates, hospitalizations, etc. (things that are recorded automatically on the patient's chart). "
  • Statistical tests: Wilcoxon signed rank test for continuous outcomes, and McNemar's test for binary outcome
  • Primary outcome: number of times that pt did not show up within 3 months raning 0-4. Proportional odds logistic model to analyze. num of no show after intervention ~ number of no show before intervention + age + gender
  • Sample size calculation: use PS for paried binary outcome

Tamara Moyo, Hem/Onc

  • Want to correlate the resistanze to therapy based on imaging with cell signal
  • Wilcoxon signed rank test (paired t-test); Wilcoxon rank sum test (two group t-test)
  • Mixed-effects model to adjust for other covariates.

2015 Sep 14

Mhd Wael Alrifai, Neonatology

  • Name of project: Paretneral Protein Calculator (PPC)
  • Type: Randomized controlled clinical trial, un-blinded
  • Help needed: Discussing the primary and secondary outcomes, designing the database
  • Study status: IRB approved, enrollment starting next week
  • Research question: the effect of intervention on the accuracy of protein prescription. The primary endpoint is the ratio of target days to total days (target days are the days when prescriptions are given with correct amount).

Sudipa Sarkar

  • My research topic is on the effect of statins on non-alcoholic fatty liver disease
  • retrospective cohort study.

Michael C. Dewan, Department of Neurological Surgery

  • I am interested in discussing sample size calculations. We are conducting a clinical trial evaluating the effectiveness of a prophylactic antiepileptic drug (levetiracetam) in brain tumor patients. For 14 days following surgery, patients will be randomized to either drug or no drug. The primary outcome is the development of a clinical seizure and the follow-up time to primary endpoint is 14 days.


  • I would like to address a few questions regarding sample size calculation for a translational study on the role of alternate complement activation in sickle cell lung disease

2015 Aug 24

Christopher Brown

  • Retrospective cross-sectional study on heart failure patients.
  • Outcome is the low potassium, related to urine output per hour.

Maya Yiadom, Emergence Medicine

  • Criteria for giving EKG to diagose STEMI.
  • Trigger criteria: typical symptom, atypical S

Megan Pask, Tricia Russ, BME

  • Compare CT values between four groups.
  • Use non parametric test: Kruskal Wallis test (ANOVA), Wilcoxon Rank sum test (two sample -t-test)

2015 Aug 17

Karl Zelik, Assistant Professor of Mechanical Engineering, Assistant Professor of Physical Medicine & Rehabilitation

  • sample size calculations for a grant proposal

2015 Aug 3

Lan Wu, PMI

  • Had questions about VICTR proposal review. Suggest use Wilcoxon Rank Sum test or Wilcoxon Signed Rank test to compare between and within subjects
  • Try to identify subset of b-cell in this set of subjects - will be able to provide descriptive statistics
  • Consent 60 subjects will estimate to have 30 subjects. Will quantify b-cell and compare b-cell among two different locations. First get a percentage of b-cell of the mixture then calculate the absolute number of b-cell per gram tissue.
  • Will find SD from preliminary data and calculate required sample size based on that.

2015 June 29

Aaron Noll, VMS IV

  • retrospective chart review of 1750 patients. correlation between screening exam results with 15 diseases. 923 patients had actual visits within two years (gold standard of disease).
  • Analysis data set: two-by-two tables based on 923 patients. Compare demographics between 923 patients with (1750-923) patients.
  • I am currently finishing a research project that is regarding various diagnoses that are able to be picked up on a screening exam (for diabetic retinopathy). To this point, I have calculated the following values for the 16 diagnoses of relevance: true positives/negatives, false positives/negatives, positive/negative predictive values, and sensitivities/specificities. However, I am unsure what the best test is to determine statistical significance or importance of these numbers--eg, do I use a 95% CI, odds ratio, etc. One issue with these results is that although I have a very large sample size for the initial screened population (over 900), many of the diagnoses have less than 5-10 true positive results.
  • Zero or close to zero number in certain cells. Wilson confidence interval. binom.confint() of binom package.

2015 June 22

Nelleke van Wouwe, Department of Neurology

  • We are working on a grant and we have some questions about a power calculation for a Repeated Measures ANOVA (based on effect size from a previous study).

2015 June 15

Daniel J. Miller, Department of Psychology, Psychological Sciences

  • Discussion about microstimulation data to develop a test of the hypothesis that stimulating two areas in the brain from which evoked movements differ produces a blend of those movements (endpoint neuronal encoding)
  • Need help understanding how to organize the data in order to build a model to explain physiological results (e.g., how the dual stimulation sites interact)
  • Suggest apply for a $5000 VICTR voucher.

Kendall Anne Ulbrich, Pediatrics

  • I am requesting assistance in figuring out statistical significance. We see a trend in the data with the diagnosis of chronic lung disease leading to increased risk of death after trach placement vs other diagnosis.
  • Babies in NICU, outcome is alive/died, want to compare chronic lung disease to other diagnosis.
  • There were ~15 diagnosis, among whom 12 had chronic lung disease.
  • Total 115 babies (25 died in NICU). Primary outcome is the death in NICU. 8 (or 11) babies who had lung disease and died.
  • Plot Kaplan-Meier curve first for description, use log-rank test.
  • Can use Cox proportional hazard model to analyze the association between lung disease and survival in NICU.
  • Could also apply for a $2000 VICTR voucher.

2015 June 1

Robert Lentz, cardiology

  • My project is looking at radiation-induced atrial fibrillation, specifically in patients with breast and lung cancers. I have raw data extracted from the Synthetic Derivative and am hoping for some guidance regarding my data analysis plan and how I might be able to best display my data.
  • There are ~3000 breast cancer pts (125 had AF), ~2000 lung cancer pts.
  • To test the association between radiation and AF, include all pts (y=AF, x=radiation y/n, cancer side); then take subset of pts who had radiation, fit a model of radiation dose/side with AF.
  • Length of follow up is different for all patients. Can use survival analysis. If certain proportion of pts died before developing AF, should treat those pts as competing risks events.
  • If apply for VICTR voucher, suggest $5000.

Robert K. Tunney, Jr., Cardiology Resident

  • Email: My research is investigating statin dose intensification according to the ACC/AHA 2013 Cholesterol Guidelines in post-ACS patients. I am interested in performing logistic regression analysis on ~300 patients and potentially Spearman rank r correlation coefficient.
  • Two groups: historic control and intervention group. Binary outcome. Primary aim is to assess the outcome difference between groups.
  • Chi-sq test and multivariable logistic regression can be used to test the primary hypothesis.
  • Suggest propensity score adjustment.
  • Will apply VICTR voucher in amount of $2000.

2015 May 11

Zac Cox, PharmD, BCPS

  • Email: I would like to request a biostats clinic reservation on Monday 5/11 from 12-1 for a comparative effectiveness research project. The main question is selection of the best primary outcome to maximize power for a population size that will be fixed (secondary to funding and patient enrollment). Our second question is the best statistical analysis method for 3 independent continuous variables (ANOVA vs the 2 experimental groups independently compared to the standard of care comparison arm). Please let me know if you would like me to send anything in advance.
  • Use continuous outcome to maximize power
  • Wilcoxon rank sum test to compare two new treatment groups to the standard care group
  • Multivariable linear model adjusting for baseline weight and treatment regimen

Michelle K. Roach, Obstetrics and Gynecology

  • Email: We will be completing a retrospective chart review looking at pregnancy and delivery outcomes in women with gestataional diabetes. We plan to use RedCap database for data entry.

2015 Apr 27

Aaron Noll, Medical Student

  • I am a third year medical student working on a research project that is evaluating the teleretinal imaging program at the Nashville VA Hospital. I attended one of your clinics about a month and a half ago and greatly appreciate the help I received at that time. I have now completed my data collection and am moving on to the data analysis portion of the research, and would like to discuss my revised project with you to see what the best way is for me to proceed.
  • As an overview, I am looking into the teleretinal screening program to evaluate its efficiency and its accuracy at diagnosing abnormalities other than diabetic retinopathy (the true purpose). I have recorded the data on the following topics:
    • Demographics (Age, sex, ethnicity)
    • Months from consult entry to screening
    • Days from screening until note loaded to chart
    • Screening diagnoses, diagnoses found at subsequent visits, and diagnoses found at previous visits
    • No-show rate for the screenings
    • Consult timing
    • Months since prior screenings and clinic visits
  • Had imaging readings and clinic diagnosis on ~1700 subjects. There were 18 diagnosis categories, looking at their agreement.
  • Will apply for VICTR voucher. Suggest $2000 for up to 35 hours

2015 Apr 20

Lexy Morvant, Pediatric

  • NICU data analysis
  • time trend of gestational age when receiving ECMO (Y2004-2014) for C-section babies. To evaluate the effect of policy change (increase gestational age for C-section baby in 2007) on ECMO.
  • Only have the information on birth year available. Fit a linear regression model
  • Also have the information on the total number of all ECMO babies. With an assumption that the proportion of C-section babies remains the same, could fit a poisson linear regression model.

2015 Apr 13


  • I have a retrospective dataset of patients who underwent a new cochlear implant programming procedure. The data contain pre- and post-intervention objective performance data, demographic data, and information about the cochlear implant type and location. I am trying to develop model(s) that can answer the following questions: 1) How can we predict whether a patient will be a responder to re-programming? 2) Which variables are most predictive of change in performance from baseline?
  • 177 patients.
  • Endpoint: measurement performance (0-100)
  • Predictors: 15 ~ 20
  • Fit a multivariable linear regression model. Predictor importance can be measured based on the model.

2015 Mar 9

Taylor Leath

  • We attended a biostats clinic on February 23rd to develop a statistical plan. Now that we have a dataset completed, we are having difficultly with our regression models and would appreciate your input.

2015 Feb 23

Katie Rizzone, M.D., Clinical Instructor, Orthopaedics and Rehabilitation

  • I would like to request a methods clinic (to review my methods) for a retrospective chart review study on female college athletes and stress fractures I am writing an IRB for.

Taylor Leath

I would like to reserve a time on Monday, February 23rd to develop an appropriate statistical plan for our study and dataset. I've attached the study protocol which details our specific aims and hypotheses. Our primary questions: 1) Is linear regression the appropriate model to use? Predictors would be sex, age, years of education, participant's current health, trauma exposure and religiosity (all continuous except for sex), and the outcome variable would be each of the individual health states (GOSE 2-8). If so, this would mean six different regression models for the six health states? 2) Alternatively, would it be more appropriate to develop one regression model that includes the health state (GOSE 2-8) as an additional predictor? 3) Do we have sufficient sample size to answer our study questions? Current n=2156 after exclusions. 4) We would also like to show whether the utility values for each of the six health states are significantly different from one another-- would that simply be a within-subjects ANOVA with pairwise comparisons? 5) Should we consider transforming the worse-than-death values?

2015 Jan 12

Dr. Heidi J. Silver, Ph.D., R.D Research Associate Professor of Medicine

  • Study of diet intervention, body composition, insulin resistance, lipo.
  • Could apply for a VICTR voucher of $4000.
Topic revision: r1 - 15 Jan 2021, DalePlummer

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback