Recommendations, Analyses, and Data for Health Services Research, Diagnosis, and Prognosis Clinic
Notes 2018


Shawniqua Williams Roberson, Neurology

  • Seeking VICTR biostatistics voucher
  • We conducted a 35-question survey among epilepsy patients in the outpatient clinic to explore racial and socioeconomic differences in attitudes toward epilepsy care. Hypothesis: African Americans express less trust in their providers and greater perception of dangers of surgery than other populations. Question: would like assistance in developing statistical analysis plan and statistician support for completing the analysis.
  • Prelim analysis done on 36 subjects. Would like to complete analysis and produce paper.
  • Survey pulled from literature (prev published in Canada). n=144 (123 able to be analyzed; 20 unable to complete, 1 aborted during interview) Survey delivered as an interview. Qs about epilepsy are categorical/binary. Qs about providers are Likert scale. Demographic Qs are categorical/ordinal. Data are in REDCap, exported into Excel.
  • Goals: validate survey, produce demographics, inferential analysis looking at relationships between race, attitudes towards providers and towards surgery.
  • Next steps: In StarBrite, go to Funding > Apply Here. At one point (under Resources part of application) it'll ask for the type of support you want, specify a biostatistics voucher. The VICTR voucher is flat-priced and will automatically populate the budget. In Documents, will need to put together a 5-page written application. (Tom will send a template for this 5-page document by email.) Correspond with Tom to agree on stats section, before submitting application.

Brenda Pun,

  • Seeking VICTR biostatistics voucher
  • As part of my DNP dissertation I worked on a survey to ICU interprofessionals about teamwork and healthywork environment. My dissertation focused on those data from one site as a pilot study. Since then I have worked with a national professional society to collect the same data from 6000+ ICU professionals nationally. I am planning to submit a VICTR resource request for the funding to support the statistical analyses of the national dataset.
  • Goal: implement critical care bundle. premise: teamwork matters. resurveyed staff 14 months after initial survey. AITCS and HWE scales given to all staff in critical care in 68 hospitals. collaborative is all anonymous; incorporated the dan-rosh (sp?) method to pair pre/post responses. (30% of post-collab responses possibly to be paired with pre)
  • Now: secondary analysis in this project. 1. descriptive at baseline. 2. what factors influence teamwork scores? 3. is there a difference before/after collaborative? 4. are there any predictors of this change pre/post?
  • Funding: funding secured through professional organization. Would need a contract (through the cost center): funding would go to you, the researcher, then would come to biostats as the analysis is done. Able to apply for VICTR voucher, if you like.
  • Deadline: aiming to have manuscript out by end of spring 2019.
  • Thomas Stewart to be in touch via email to follow-up.


Sophia Delpe, Urology

  • Seeking VICTR biostatistics voucher, mentor confirmed.
  • Our study is a cross sectional survey sent to women >18. We would like to look at the prevalence of fecal incontinence and the relationship between that and psychosocial disorders/social interaction.
  • Questionnaire on REDcap assessing toileting behavior. Approx 4789 patients.
  • More of a descriptive study, so should describe the distribution of responses in results (e.g. histograms). To assess bivariate relationships, recommended to present cross-tabulations for categorical and likert-scale questions. Could use regression models: tendency to stay home modeled by symptoms, etc. Next steps for future work would be to control for covariates (age).


Caroline Thomas, Pediatric Pulmonology

  • Seeking VICTR biostatistics voucher.
  • Retrospective chart review of pediatric patients with obstructive sleep apnea, who underwent tonsillectomy and adenoidectomy, and were then placed on positive airway pressure (PAP). We would like to determine whether there are predictors of adherence to PAP, specifically looked at: sex, race, insurance, weight, BMI, developmental status, presence of genetic disorder/autism/and/or psychiatric disorder, age of diagnosis of OSA, initial findings on sleep report, time to initiation of PAP post-surgery, other surgeries, presence of PAP titration study, presence of comorbid sleep disorders, follow up visits to sleep clinic, use of auto or fixed PAP settings, use of psychotropic medications, and data of nightly usage from PAP downloads.
  • Adherence outcome measured as hours in first 6 months of use; adherence is at least 4 hours per night. n=117, download data for 67. Other variables of interest: development/neurodevelopment (prior diagnosis) and binary verbal variable.
  • Statistical software recommendations: previous use of Stata so will continue to use. Missing data will likely be approached with multiple imputation.
  • Suggestions: Descriptive statistics by adherence. Covariates: baseline CPAP score, age, development, verbal, (possibly) weight, interaction between age and developmental status. For analysis, stick to 6-8 parameters in the analysis, due to sample size of 67.
  • Next steps: 1) Send Tom Stewart an email to get started with VICTR biostats voucher and work to get something together for abstract (due December).

Alexander Sherry, Radiation Oncology

  • Seeking VICTR voucher
  • Prospective trial of concurrent chemoradiation in adjuvant treatment of breast cancer. Our question regards a power calculation for our primary aim. Would be happy to provide more details (protocol) prior to meeting.
  • Feasibility study. Primary aim of grade 3/4 clinician-derived toxicity during treatment (binary endpoint) sample size calculation of 17, but VICTR studio questions.
  • Recommendations: Perform precision analysis to give estimate of yield of study regardless of how big a difference is there. With feasibility study, main objectives are to show that you can get patients enrolled (within reasonable time, resources, etc.) and that you can measure what you're trying to measure. To derive and validate another quantitative measure in the feasibility study could allow for more efficient full study. A "feasibility/measurement study".
  • Concerns: To not distinguish grade 3/4 toxicity, requires more samples. Could possibly consider ordinal regression, depending on proportions of 3/4. Ask what estimating and bump SS up by factor of 10, or be aware and transparent about what the current SS can show. Noise requires more samples. Typical SS is 384 for MOE of 0.1. Non-inferiority SS are even larger.


Alan Tate, ENT Clinical Instructor Faculty

  • Trying to export REDCap data with certain criteria and then categorize. Previously attended a REDCap clinic.

  • Study involves voice patients, about five years of data. Four groups, voice therapy alone, PT alone, and combo VT and PT. Observational study; patients selected group, essentially. Two questions: how were they different at baseline, and how were they different after therapy. Could look at differences in groups at baseline using bivariate approach. Then, perhaps multivariate approach to second question.

  • Possible biostat voucher. Email Tom Stewart for VICTR application.

Christopher Gray, Neuro/Stroke

  • Requests assistance with data interpretation for a review of current Kcentra protocol for intracranial hemorrhage.
  • Previous clinic visit on Thurs, Sep 13, 2018: Requested advice on how to present data meaningfully. Advised to 1) look at outcome (death at 30 days) in a logistic regression with the size of the bleed as the independent variable, and 2) look at severity of rankin at 30 days using proportional regression.
*Feedback. Don't use correlation for binary variables. Try to show change in Rankin with profile plot--current plot does not show change well. Trying to define question? Not clear--right now all subjects got Kcentra, weight based dosing. Hospital may switch to standard dosing, if weight based is not effective. Stick to outcome of probability of success of treatment--


David W. Bearl, Pediatric Cardiology

  • My proposed project is evaluation of liver studies (labs, MRI, elastography) pre- heart transplant for Fontan patients (all have liver disease pre, which is known) and then evaluating those patients post- heart transplant (that is not known).
  • n = 31 since started doing transplants for kids in 1987. Repeat evaluation at 6mo and 1yr post-tx.
  • Two steps: (1) feasibility: show you can actually collect the data for the larger study. Estimate pt-to-pt variability; rates/patterns of missing data; (2) larger study with other hospitals: proper sample size needed for this (powered based on feasibility study).
  • Best option for small population: ask 'based on where the patient started, where did they end up?' Can make use of gap between follow-up evaluations as a variable, if the gap varies by patient.
  • With half the patients, better to show descriptive statistics (graphs and tables).

Shawniqua Williams Roberson, Neurology

  • Purpose: Preparing preliminary data for an upcoming career development award submission. Several quantitative EEG metrics have been recorded on patients with ICU delirium of varying etiologies. Would like to use these preliminary data to build a model that uses the qEEG metrics to predict the etiology of delirium. Need guidance on: 1) how much data is needed to build this model 2) what statistical tools to use (multivariate logistic regression?)
  • Applying for Faculty Research Scholarship in next cycle (February). Hoping for guidance on how to analyse preliminary data with respect to an aim in the grant.
  • Aims of research: Evaluating quantitative frontal EEG to monitor for delirium continuously, producing numeric output. (1) Is it better than traditional EEG? (2) Can we distinguish different etiologies for delirium; Is there a dominant one, at which we can direct clinical decision-making? (3) Does qfEEG predict adverse outcomes? (Note: qfEEG is a subset of traditional EEG which doesn't require an EEG reader.)
  • Data: 25 patients, 89 assessments. (Measurements taken at least twice per day, over the course of up to two weeks. Summarised down to a single number within the window for each assessment.)
  • Suggested analyses: (1) random-intercept model (accounts for the fact that observations within a patient will be more highly correlated with each other): RASS ~ covariates + random intercept per pt. In scatterplots, put RASS score on y-axis since its the outcome. Depending on sample size, you could possibly allow for a non-linear association between variables. Could possibly focus on hypoactive patients only. Ordinal regression model with random-intercept, since RASS is ordinal and scale is small. (Not every statistical program will have ordinal regression with random-intercept, so may have to revert back to linear regression with random-intercept.) With 4 different predictors from the EEG and 89 assessment, should have enough to look at non-linear associations. (2) To differentiate etiologies, need patients with all types of etiologies and take the qfEEG. Regression is one method for creating that prediction tool. Once the tool is developed, collect more data and see how the tools perform making those predictions on the new data. Preliminary steps for etiologies: see how etiologies show up in graphical displays. (3) Depends on what you do during steps 1 & 2. Potentially its own separate predictive model with different features seen in the data. A lot of data will be needed for all models.


Joseph Wong, Biomedical Informatics

  • Purpose: Building upon a prior project–we have measured satisfaction, health literacy, and computer attitude regarding the patient portal prior the eStar EHR migration. We now want to measure these same factors with the new, eStar-based patient portal. From the original 6000 survey respondents, 3000 have volunteered to be contacted again.
  • Previous clinic visits on Th 8/16 & Th 9/06: Investigating determinants of patient satisfaction with an online patient portal (My Health at Vanderbilt). Had previously built univariate linear regression models for satisfaction score and had selected factors for a multiple linear regression model. Recommended to add histograms. Recommended to include only up to a quadratic term and a linear term in the model. Recommended to use square root transformation on the Health Result Function, rather than the logarithm, and to include both the square root and linear terms in the model.
  • Previously tested satisfaction usage before eStar update (using old pt portal) using ordered logistic regression. Next step is to test satisfaction with eStar. Needs to know what to measure and get thoughts for data collection.The same individuals agreed to be followed up with the eStar satisfaction. Satisfaction scale is 12-60.
  • To fix odds ratios in ordered logistic regression, need to transform (square root or cube root - cannot use log due to zeroes) count (click) variables and then use interquartile ranges, rather than raw values, as change in one click is negligible. Do transformation before implementing regression model. Easiest: do transformation, divide by iqr, use those values in the regression; interpretation made by change in IQR. (Demonstration of restricted cubic spline in Stata: can calculate odds ratios but beta values cannot be interpreted.) Can model all count variables this way. Test statement will allow to test the overall impact.
  • Compare satisfaction pre-EPIC to post_EPIC with Wilcoxon signed-rank or paired t-test. Not interested in testing computer literacy as it may not have changed in the previous six-months. 3000 individuals agreed to be recontacted. Best to keep the survey short; only plan to ask the 12 satisfaction questions. Keep in mind that people may respond differently to satisfaction questions if previously at end of longer survey and now shorter survey. Other test options: use same model already built, Bland-Altman plot, identify what may correlate with a decline in satisfaction. To calculate session times to include in the model. Could look at post- scores as a function of pre- scores, nonlinearly. Could compare domains of satisfaction score to see if the weighting is equal.


Brooklynn Bailey, MMC Dept of Family & Community Medicine

  • Clinic Follow-up from 8/20:
  • We are exploring the relationship among PTSD symptoms in our sample of young women exposed to interpersonal violence. 17 symptoms are assessed via clinical interview and are scored from 0-8. Prior to our first clinic visit, I had ran network analyses in R, with concerns pertaining to sample size. We have since ran hierarchical cluster analyses as recommended to us to compare to the network results. We are returning to get feedback on these results and recommendations for next steps.
  • Current state: cross-sectional data; n = 68; 17 PTSD symptoms assessed through interview (each scored 0-8): not likely to be any 1s (each scored for presence and severity); histograms have been created by cluster, as recommended from last time (zero-inflated data: should report on number of 0 responses but overlay the probability distribution only over results 1+); results of cluster analysis (ward's method) with bootstrapping (open to suggestions on this front); a second cluster analysis based on presence of symptoms alone. * Analysis performed in R - function cluster methods are ward D and euclidean. Bootstrap method used is unknown (code used not available during clinic).
  • Concerns: some variables may not be grouped because of low variation/smaller sample (C8/B2/C12) are these clustered together because of low variability? These are all rare symptoms; are they grouped together only because of this or are they actually correlated? What is stability of bootstrapping?
  • Previously there was a question about sample size, so wanted to view variability of responses in available data. Matrix of pairwise probabilities for how often symptoms correlate in % of bootstraps. Probabilities should be either close to 1 or close to 0, so you'll see what clusters often go together; if getting values in middle of range, there is more variability in the way variables are clustered.
  • Need to think about the cutoff for what determines a cluster. There are algorithms for determining this cutoff, but they are very computational/use cross-validation. This may be something to handle via email after looking at code. Also need to think about adjusting the number of clusters.
  • Q: How could we tie these back to original network analyses and validate? A: There is a very tight connection between the clustering and the network analyses because both based off correlation matrices. Once you've ID'd that you have stable clusters, you can do network analyses within each cluster to generate partial correlations.
  • Q: Is the last symptom to separate out more central than other symptoms? (Interested in identifying centrality.) A: With longitudinal data centrality of the symptoms makes more sense, so not useful in cross-sectional data. (If unable to reject null, unable to detect clusters in stable way. Clusters perform better with more data, therefore sample size may be issue.)
  • Next steps: Brooklynn to send code to Dr. Stewart for review. Stability question to be answered after viewing code.


Christine Rukasin, Medicine/Allergy, Pulmonary and Critical Care

  • I am doing a survey based study evaluating anxiety and drug allergy testing. This is a series of surveys with repetition of questions at different point in time. I would like assistance in strategies to best analyze the results, visualization/diagrams of results and suggested sample size. * Expected Outcome: Protocol with no expected funding support, Abstract, Other. Possible VICTR voucher? Still time before analysis is needed. *Graphical display of data. Could sum questions for a total score. 100 is a reasonable number of subjects. Could also plot mean score by number of tests/measures per subject to assess learning effects. Compute correlation with subject characteristics and total score. Tom will send email with VICTR application.

Satya (Nanu) Das, Medicine/Oncology

  • We are performing a retrospective analysis assessing whether gastrointestinal cancer patients (at Vanderbilt) who experience immune-related adverse events while on immunotherapy experience improved outcomes (PFS,OS,duration of response) compared to patients who do not experience these events. I would like to briefly touch on my data collection and the statistical methodology for my future analysis.
  • Expected Outcome: Abstract, Other
  • All subjects on immune therapy are eligible. How to disentangle treatment for event and treatment? We don't know how long they need to be on therapy. Could do "landmark" analysis, analyze one outcome (AE), then the next outcome. This is all subjects--with smaller sample, focus on high resolution variables.

WALK IN: Parisa Samimi, Uro gynecology

  • Possible VICTR, prospective study looking at correlation between patient satisfaction and am labs (no lab vs. routine labs). Do not know sample size needed. Do not know baseline satisfaction, or any baseline data. Question needs refinement--need to specify question and definitions. Could also search current literature for baseline satisfaction level--to get baseline data.


Mallory Hacker, Neurology

  • Study Objective: To improve the identification and referral of patients who may have spasticity to a physician who is an expert in the diagnosis and treatment of spasticity through the development of a bedside physical exam referral tool for primary care physicians and nurse practitioners. * Hypothesis: A simple limited bedside physical examination guide enhances the ability of primary care providers to correctly and reliably identify residents in a long-term care facility who may have spasticity and appropriately refer them to a specialist for spasticity evaluation. * Question: Are the sensitivity and negative predictive values the most appropriate to report for this study? * Expected Outcome: Other * Present as a 2x2 table (most will want to see), report PPv and NPV. Calculate SP SN, but not as primary number. Could do a figure to show proportion correctly diagnosed. Should we use Kappa? No--not the point. Also check instructions for authors.

Sean Collon, VUSM Global Health

  • Teleophthalmology screening in Nepal–comparing in person decision making of ophthalmic technicians with limited screening resources to decision making of ophthalmologits reviewing photographs of the same patients. For each patient, technician and MD record a diagnosis for each eye and a plan for each patient based on their respective information (in person exam with limited equipment vs. viewing photos remotely), diagnoses and plans grouped into broad categories, then agreement compared to determine utility of device in the screening camp setting. * Expected Outcome: VICTR Biostatistics voucher *Could separate by anterior and posterior, would make sense in this context. Could also do each diagnosis separately, then order in order of agreement. Agreement on treatment plan not useful when diagnosis did not agree, limit to when diagnosis did not agree. For agreement, could do a 2x2 table (MD/Tech). Calculate agreement. Two eyes from each patient--correlated measures. Could treat as independent. Can compute confidence intervals for all measures.
*Could put voucher in under local mentor name--although students may be eligible.


Brooklynn Bailey, Meharry Vanderbilt Alliance

  • I recently was introduced to the network approach to psychopathology at a conference this year. I would like to explore the network structure of DSM-IV PTSD symptoms in my sample of young adult women who have recently experienced interpersonal violence. I have taught myself how to conduct network and related analyses in R; however, I have some questions related to my small sample size and the adequacy of my findings given this limitation. In general, I could use guidance on methods for analyzing this symptom data to better understand the presentation of posttraumatic psychopathology in this population. *Possible VICTR voucher-contact Tom if interested. * Research question: how are PTSD symptoms related to each other in this sample of young women? *Sample size ~70 *Using Lasso right now * Consider using histograms to examine structure of symptom data. Could do simple variable clustering with bootstrapping with replacement, less complex than current approach.

Rohini Chakravarthy, Meharry Vanderbilt Alliance

  • We have surveyed a cohort of 3000 patients using an IOM survey on social determinants of health. We are interested in seeing which are most predictive of outcomes (as measured by A1C at time of study and potentially its progression). I think multilevel modeling may be useful but am not sure how to proceed and whether this makes sense for a VICTR voucher application. Data collection is complete.
*First step is further refining question; are we looking at med adherence, incidence, or AIC? *Multiple regression may be right approach, even in the presence of colinearity. Could do voucher or continue to come to clinic for more assistance.


Jessica Heft, Urology/Urogynecology

  • We will be conducting a survey of young women and assessing their physical activity and how that relates to pelvic floor dysfunction. We will be using several standardized questionnaires and need assistance with methodology/patient recruitment expectations/statistical planning. Project is in the design phase.
  • VICTR Voucher to cover biostat support-can set up database alone. May ask for VICTR support for gift cards.
  • Propose email based survey examining relationship between athleticism and stress incontinence.
  • Concern is over the representatives of the respondents.
  • Recommend using slider bar for questions when possible.


Zeb White, Hearing & Speech Sciences

  • A new, experimental 40-question parent-report measure was developed by our lab in order to better understand parent-child interaction in stuttering. This instrument was administered to 68 parents of children who do and do not stutter. We are attempting to understand the differences between the two groups (parents of children who stutter vs parents of children who do not stutter) and identify if the instrument correlates with other parent-report measures regarding stuttering severity and consequences. * We would like guidance in selecting appropriate statistical tests to answer relevant research questions. * Data collection is complete. * Range of stuttering severity, not "true" group. Kids could range from ~4% of words to ~15%. Really is a range (0% to ---). About 30 in each "group". BUT, could include previously excluded kids which would increase sample size considerably. * Wish to reduce items on survey, perhaps group questions? * Questions developed from advice given to parents and from literature on parent intervention. This survey administered at time 0, prior to intervention/therapy. * Could look at correlation; parent response by RYCS; does the degree of stuttering correlate with the RYCS? Could use Goodman gamble. * Small sample size to do 40 analyses--use caution with multiple tests like Wilcoxen. * May consider dropping "never" and "always", extreme responses. * 40 x 40 correlation matrix could show what questions are highly correlated, and drop highly correlated. * Redundancy analysis could work * Cronbachs alpha on questions that should measure the same thing * Could force questions into groups based in clinical (e.g. timing)


Margaret Adgent, General Pediatrics

I am updating an analysis from an observational cohort study regarding maternal prenatal vitamin use and childhood asthma. Pregnant women were enrolled and interviewed; they were recontacted 4+ years later to answer questions about their children’s health. There is substantial loss to follow up (70%), and I am interested in applying inverse probability weights to address possible selection bias due to loss to follow up.
~1900 met inclusion for the secondary analysis, ~500 responded and had exposure. Goal is to compare folic acid use before pregnancy to those who started after. Should summarize the differences between groups (those with follow up and without)--see how different they are. There is and fairly even split between groups. Use 1/prob Wt. table to check values, check for large values Contact Jill Shell re: collaboration in peds. Possible Chris Slaughter.


Laurie Samuels, Biostatistics

  • The project uses Medicare claims data to look at regional rates of variation in a particular surgical procedure, and I would love to get feedback from more senior biostatisticians. Looking at regional variation in colon resection. Have three years of data. Several issues, one is difficulty in identifying denominator. Dartmouth health atlas could be useful for methods.


Brenda Pun, Pulmonary

  • As part of my DNP dissertation I worked on a survey to ICU interprofessionals about teamwork and healthywork environment. My dissertation focused on those data from one site as a pilot study. Since then I have worked with a national professional society to collect the same data from 6000+ ICU professionals nationally. I am planning to submit a VICTR resource request for the funding to support the statistical analyses of the national dataset.

  • Stage of project (select one): Data collection completed

  • Data collection method (select one): Survey

  • Data management system (select one): Redcap

  • Expected outcome (check all that apply): VICTR Biostatistics voucher

  • Investigator experience (select one): Independent investigator


Natalie Covington, Hearing & Speech Sciences

  • We are planning a study in which we would like to sub-classify patients with traumatic brain injury based on their memory “profiles” (patterns of impaired and intact memory performance across a battery of tasks); we would like to discuss possible methods for classifying patients into subtypes (e.g. latent profile analysis; k-means clustering; etc).

  • Stage of project (select one): Design

  • Data collection method (select one): Other

  • Data management system (select one): Spreadsheet

  • Expected outcome (check all that apply): Protocol with no expected funding support

  • Investigator experience (select one): Graduate/Medical Student


Wendi Mason, Medicine / Pulmonary

  • To compare a new practice model (prospective) employing telehealth strategies of telemonitoring and telesupport to previous year’s model of standard practice (retrospective chart review) to determine effect on hospitalization rate, illnesses and other complications, compliance, and rate of decline in patients with Idiopathic Pulmonary Fibrosis.

  • Stage of project (select one): Design

  • Data collection method (select one): Case report form/data form

  • Data management system (select one): REDCap

  • Expected outcome (check all that apply): VICTR Biostatistics voucher

  • Investigator experience (select one): Independent investigator


Jessica Heft, ObGyn/Urogyn

  • Retrospective cohort comparing two surgical approaches (open vs. laparoscopic). Will be looking at perioperative complications and outcomes.
  • Stage of project (select one): Design
  • Data collection method (select one): Data are exported in electronic format
  • Data management system (select one): REDCap
  • Expected outcome (check all that apply): VICTR Biostatistics voucher
  • Investigator experience (select one): Resident or fellow
Discussion & Action Items:
  • Perfect confounding between surgeon and surgical technique.
  • Jessica will coordinate with Thomas Stewart to develop a statistical analysis plan for submission of an application for VICTR voucher.


Yolanda McDonald, Human and Organizational Development

  • The editor-in-chief of the American Journal of Public Health asked for us to test for interaction indicating that there is heterogeneity across the 4 size-specific ORs . I found some information on Research Gate. However, I would still like to discuss the test or test(s) option. The manuscript is Minor Revision status.

  • Stage of project (select one): Data collection complete

  • Data collection method (select one): Other

  • Data management system (select one): Other

  • Expected outcome (check all that apply): Other

  • Investigator experience (select one): Independent investigator


James Andry, Neurology - Sleep

  • Please provide a short description of your project and the questions you’d like to address: The primary goal of this study is to evaluate whether the features measured by an aggregated set of consumer-grade activity monitors can predict a given patient’s successful treatment with CPAP. Our study design also supports the secondary goal of validating the sleep parameters measured by these devices in aggregate. Would like to discuss statistical methods for measuring correlation between sleep parameters from consumer-grade devices (test device) and polysomnography (gold-standard).

  • Stage of project (select one): Design complete but no enrollment/data collection

  • Data collection method (select one): Data are exported in electronic format

  • Data management system (select one): Spreadsheet (e.g. Excel)

  • Expected outcome (check all that apply): Protocol with no expected funding support, VICTR Biostatistics voucher

  • Investigator experience (select one): Independent investigator


Alexander Langerman, Otolaryngology

  • Please provide a short description of your project and the questions you’d like to address: Using qualitative research, we’ve identified subgroups of patients who have differing opinions on how they trust their physicians. I’d like to develop a quantitative diagnostic of these perceptions.

  • Stage of project (select one): Design

  • Data collection method (select one): Survey

  • Data management system (select one): REDCap

  • Expected outcome (check all that apply): Other

  • Investigator experience (select one): Independent investigator


Ellen Kelly

  • Please provide a short description of your project and the questions you’d like to address: We have developed an instrument to assess parents’ perceptions of their communicative interactions with their children. We need assistance with evaluating the instrument and analyzing the data we have collected to date.

  • Stage of project (select one): Data collection underway

  • Data collection method (select one): Case report form/data form

  • Data management system (select one): REDCap

  • Expected outcome (check all that apply): Protocol with no expected funding support, Other

  • Investigator experience (select one): Independent investigator

Briana Furch, Infectious Disease

  • Please provide a short description of your project and the questions you'd like to address: I'm not sure which type of analysis I should do in order to compare 4 different disease states and their associated biomarkers (variables) at different time points. I also want to look at these disease states to asses normal variance.

  • Eligible for departmental collaboration plan, if in place?: no

  • Stage of project (select one): Design

  • Data collection method (select one): Other

  • Data management system (select one): REDCap

  • Expected outcome (check all that apply): Protocol with no expected funding support, Grant

  • Investigator experience (select one): Independent investigator

  • Name of Mentor: John Koethe


Paul Slocum with William Stuart Reynolds (mentor), OB/GYN

  • Please provide a short description of your project and the questions you'd like to address:
Assessing pain in women with synthetic pelvic mesh and outcomes after treatment.

Would like to come to biostats clinic to obtain VICTR research voucher. We have a prospectively collected case series of patients with pelvic pain who underwent mesh removal.

  • Eligible for departmental collaboration plan, if in place?: no

  • Stage of project (select one): Data collection completed

  • Data collection method (select one): Data are exported in electronic format

  • Data management system (select one): REDCap

  • Expected outcome (check all that apply): VICTR Biostatistics voucher

  • Investigator experience (select one): Resident or fellow



Ray Blind, faculty, Department of Medicine, with two undergraduate students

There is a lot of data correlating IV drug use with hepatitis, and hepatitis with liver cancer, but no studies have correlated IV drug use with liver cancer, to our knowledge. We used the synthetic derivative to attempt to correlate IV drug use with liver cancer and need help deciding which stats tests to apply to the data.

  • Best approach would be to follow a cohort of IV drug users to see whether they develop liver cancer; next-best would be a case-control study comparing the odds of being an IV drug user among people with liver cancer compared to people in a reasonable comparison (control) group. The hard part is deciding what that control group should be.
  • Here is an introductory explanation of case-control studies:
  • To visualize 2x2 data (for example, IV drug use by cancer) graphically, you can make a jittered scatterplot. This gives the same information as a table, but it can be helpful to see the information presented in more than one way.



Dupree Hatch, Pediatrics

  • Please provide a short description of your project and the questions you’d like to address:

I have two projects that I would like to discuss the design of a statistical analysis (if there is time):

We have a large national database that contains data on ~20% of all very low birth weight infants. We would like to a) describe the use of mechanical ventilation in these infants (# of days, etc.), b) quantify the inter-center variation in the # of ventilator days/infants and c) define the contributions of specific practice variables (ventilator modalities, sedation regimens) to the observed variation in ventilator days/patient. I would like to discuss the statistical analysis to quantify the variation and to test the practice factors to attempt to determine what, if any of them are driving variation.

The second study I would like to discuss if time allows concerns alarms from mechanical ventilators. We have built an internal database of ~30000 hours of ventilator alarms. I would like to describe some of the factors that are associated with high alarm burden (patient size, ventilator mode, time of day, etc.) in a future effort to intervene on those factors that are modifiable. I would like to discuss how to handle the clustering at the patient level since we have hundreds, sometimes thousands, of alarms within a single patient and how to adjust for that when I look at the different patient and practice factors.

  • Stage of project (select one): Design

  • Data collection method (select one): Data are exported in electronic format

  • Data management system (select one): Spreadsheet (e.g. Excel)

  • Expected outcome (check all that apply): Protocol with no expected funding support, Grant, Other

  • Investigator experience (select one): Independent investigator

  • Notes from clinic:
    • Chris Slaughter has a collaboration plan with Pediatrics but is currently busy; Dr. Hatch came to clinic for preliminary discussion
    • For the first project:
      • 200--300 centers; 10--200 very-low-birthweight births per center per year; 6--7 years of data. Temporal trends are likely but seasonal trends are not.
      • Interested in quantifying the resource utilization (number of ventilator days)
      • Even the descriptive statistics are challenging for this project, because some of the babies die, and ventilator use is a measure of both how sick the baby is and of usual practice at that particular center. It's possible that the best approach will consist of a mixture model that incorporates both time to death and ventilator days while alive.
      • Some babies are transferred from their original NICU. Rather than excluding these babies from the study cohort, we recommend including them in the cohort, but censoring them at the time of transfer, to minimize bias.
    • For the second project:
      • The dataset contains 40k ventilation hours for 400--500 babies. About 15% of the patients get switched from one mode to another
      • It will definitely be important to include patient characteristics in the model for this project; it may be less important to do patient-level clustering, depending on the data structure and the overall goal of the analysis.


Nitya Venkat, Undergraduate Student, Vanderbilt Brain Institute.

In our study, we programmed a MATLAB script and Arduino micro-controller to deliver visual and tactile stimuli to subjects. We then collect responses (numerical: 0 -100) from subjects as well as questionnaire responses on a Likert scale (1-6). We are hoping to address how to deal with the issue of normality in the data as it relates to parametric tests. We are also hoping to ask about which post-hoc tests to do, what means to correlate and generally how to make the most of our data. We also have questions about how to correlate our Likert responses to our measurement data that is numerical but not on an interval.

W. Stuart Reynolds, postdoctoral fellow, Urology.

My project is concerned with base-line clinical characteristics of women with and without overactive bladder, including general demographic and clinical data, along with condition-specific data and results of quantitative sensory testing, with which to phenotype participants. I am interested in phenotyping, specifically using data-driven statistical methods, such as clustering, and would like advice regarding these and other novel techniques, including machine learning, that may be applicable to my data. I am planning to submit for a VICTr voucher for biostatistical support.


Lou Posey, medical student

The acute phase response is the body’s biological response to combat bleeding, infection, hypoxia, and tissue dysfunction following an injury. This system is tightly regulated such that a post-injury response that is either too small or too robust can result in deleterious patient outcomes. This trend has long been observed in clinical practice, yet the validation of clinical markers of the acute phase response (also known as acute phase reactants) in correlation with poor outcomes is underreported. Using the synthetic derivative database, we aim to correlate vascular complications (namely venous thromboses) with elevation in acute phase markers such as CRP. Moreover, we will record associated platelet levels surrounding the vascular complications to depict a consumptive coagulopathy.

Currently, there are no quantitative markers to predict the risk of a DVT; as such, we hope to show the divergence of elevated CRP and platelet trough as a novel predictor of thrombosis. This could change VTE prophylaxis guidelines in both the pediatric and adult populations.

Lauren Marlar, PUBH student

Sample size calculation and and test selection for class assignment.

Problem: Calculate the sample size required for a randomized controlled trial comparing two treatment groups and a control group. The primary end point is a 5% weight loss by the last session. Assume that only 5% of the participants in the control group will lose at least 5%. Assume that 20% of the people who start the program will drop out.

Note: We were instructed to go to the Clinic if we needed assistance. (Note from Laurie Samuels: Dan Byrne confirmed that he suggested that students in this class attend clinic for help with power calculations.)
Topic revision: r1 - 15 Jan 2021, DalePlummer

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback