Data and Analysis for Clinical and Health Research Clinic Notes (2016)

2016 December 22

Autumn Bagwell, Vanderbilt Specialty Pharmacy

  • "We are a group of novice researchers that recently began a number of outcomes projects that are primarily retrospective reviews of cohorts of patients on specialty medications. We have completed some of the project proposals and know our endpoints and project aims, but need assistance in a couple areas: 1) estimating the time/funds required to complete the stats for our projects to better build our budget proposals for potential project sponsors, and 2) ensuring we are applying the appropriate statistical tests based on our endpoints."
  • Project 1: Have 200 osteoporosis patients who received Forteo in Endocrinology clinic. Collected data from VSP and non-VSP patients. Outcomes are drug treatment completion (Y/N) and clinical outcomes (DEXA risk score, FRAX score, number of fractures). Planning to do univariate analysis comparing patients who completed drug treatment vs. patients who did not complete drug treatment. Recommend collecting data on duration of treatment completed and adverse events. Dr. Frank Harrell will send rough estimate for budget anticipating 12 projects per year.

Miriam Lense, Otolaryngology

  • Asking for assistance with conducting a power analysis for a growth curve analysis in preparation for a grant. This project is looking at longitudinal trajectories of language development (using questionnaire date) across 2 samples of children (total n~70) with 2-4 variables of interest. I expect to use a random intercept and slopes model and expect to include both linear and quadratic growth terms.
  • Previous studies have show that approximately 20% of children who have a sibling with a language developmental issue will also be diagnosed with a language developmental issue. Children are divided into 2 groups: low-risk or high-risk of language developmental issues. Language questionnaires will be completed by the parent at 4 time points (9, 12, 15, and 18 months of age, +/- 2 weeks around target time). Planning to do growth curve analysis across all patients, without stratification by risk group.
  • Recommend treating 9-month language measure as baseline and including this as a covariate in the regression model. Can estimate means at each of the other 3 time points. Need to estimate correlation using previous data and plug this into formula to calculate standard error. Since time points are equally spaced, may be able to assume AR1 correlation structure. Recommend modeling actual time (age in months) when language questionnaire was completed. When estimating trajectories, can include confidence bands around trajectories. When comparing the two curves, look for differences in coefficients at any of the 3 time points. Another option to randomize children to the number and timing of repeated language questionnaires.
  • Recommend writing proposal and returning to another Biostatistics clinic for additional feedback.

2016 December 15

WITHDREW: Melissa Henry, Department of Hearing and Speech Sciences

Jennifer Erves, Internal Medicine, Meharry Medical College

  • Interpretation of ordinal logistic regression. Race (1=black, 2=other, 3=white, reference). Barriers score is comprised of 7 questions, and Benefits score is comprised of 7 questions. May consider dropping age, income, grade, and race from model and look at impact on odds ratios for scores. Need to report descriptive statistics for scores stratified by ordinal outcome variable. Since Barriers score is significant, it will be useful to tease apart which barriers are significant. Any correlation between independent variables does not affect validity of the model. Degree of collinearity can be assessed by calculating variance inflation factor (VIF).
  • Second model includes 4 interactions between race and scores. Need to exclude race=2 from the model and to test proportional odds assumption in the model. Recommend using STATA syntax to generate interaction terms and using 'lincom' command to calculate meaningful odds ratios. May calculate and plot predicted probabilities for each level of ordinal outcome stratified by race group. Also recommend adding figures to explain model graphically (ex. boxplots for each score stratified by outcome and by race).
  • Additional options: May consider a stratified analysis within race groups. Dichotomizing the outcome variable for binary logistic regression is appropriate to simplify interpretation. Will still have issues with power if use polytomous regression.
  • Reference: Statistical Models for Biomedical Researchers by William Dupont

2016 December 8

Brent Cameron, Radiation Oncology Resident

  • "Our questions are going to be centered around the specific statistical methods for analyzing our data. We plan on applying for VICTR voucher to have a biostatistician analyze the data. The trial involves quality of life metrics and well as more objective clinical metrics for evaluation of stereotatic radiosurgery for medically refractory tremor patients. Patients fill out questionnaires at baseline enrollment of the trial, then 3, 6, 9, and 12 months after the trial. We have enrolled 22 patients thus far. For some patients they have completed the 12 month period. Others have just enrolled and may only have 3 month time point. As in any trial, some patients did not show up for all the appointments so no every patient may have all data points. The questionnaires have 30+ questions at each time point. We don’t know for sure which questions on the survey may be positive. There is published data on similar studies using a different technique. We would like to report the Vanderbilt experience to show that our method is equivalent to others."
  • Goal is to enroll 30 patients total. Clinical assessment of tremor (ordinal 0-4) and handwriting (ordinal 0-4) at baseline and 3, 6, 9, and 12 months. Psychological assessment at baseline and 6 months. There are slightly different questions on the questionnaires. Aims are to show improvement in quality of life and clinical assessments after the procedure and to show efficacy is similar to gamma knife surgery (using published data).
  • You will need to develop a primary analysis strategy. Can generate a combined score for most important factors. Recommend calculating area under the curve for this score over time and taking average by the length of time the patient was followed. Can do a t test or signed-rank test on this calculated average. An exploratory analysis plotting scores for the factors may inform which factors can be clustered together. Recommend applying for 90-hour VICTR award for biostatistics support.

Jennifer Erves, Internal Medicine, Meharry Medical College

  • "I am analyzing data to identify factors influencing parental willingness of adolescent participation in a clinical trial. The current analysis is an ordinal logistic regression, and we are inquiring if we should use a binary logistic regression. We are also inquiring of the Chi-Square analyses we propose to use to identify if racial differences exist in parental willingness of adolescent participation in clinical trials."
  • Started with 290 participants and were advised to remove patients with missing data (n= 51). Sample size is 239 participants. Willingness to participate is ordinal on a 5-point scale. You can use a proportional odds model for this ordinal outcome. Dichotomizing this outcome variable for binary logistic regression is appropriate to simplify interpretation. May want to use a global chunk test comparing models with and without interactions with race.
  • If race is not a significant predictor in the regression model, then it is not recommended to continue with the chi-square test. If do continue with chi-square test, you may need to use Bonferroni's adjustment for multiple comparisons (although this is a very conservative approach). You will need to clearly explain your approach in the methods section.

2016 December 1

Adrienne Roman, Department of Hearing and Speech Sciences

  • "I am a postdoc in the Department of Hearing and Speech Sciences in Audiology. We are applying for a VICTR grant and received feedback on our application asking for us to consult with a biostatistician to better plan analyses. I have attached the pre-review questions we need to respond to as well as our applicationto VICTR. Our methodology is a bit complicated, but the goal of the study is to tell if individuals with cochlear implants (CIs) are affected when we increase individual's access to sound by having them sleep with their CIs on during the night. It will be a 5 week study where 2 weeks, individuals will sleep with their CIs on. We will also have other physiological data (cortisol and actigraphy measures) in addition to self-report surveys. Any feedback or recommendations would be greatly appreciated regarding advisement on statistical counseling."

Miller Tracy, Department of Psychiatry

  • Our project is analyzing data from the Bruininks-Oseretsky Test of Motor Proficiency (BOT2), which is an individually administered test used to measure gross and fine motor skills. I have attached a copy of the scoring page with the subtests that we use circled. We would like to do our analysis separately for kids (7-10 years) and adults (18-35 years). Within these groups we are comparing scores between participants that are typically developing and those that are on the autism spectrum (1 = kid w autism, 2 = TD kid, 3 = adult w autism, 4 = TD adult). We would like to compare differences in point scores between the typically developing and autism groups while controlling for age and possibly WASI score, which is a partial cognitive assessment that measures IQ. Note that point scores are derived from raw scores on the scoring sheet. We’d like to compare point score differences for subtest totals but it might also be beneficial to test differences on tasks within subtests. We are having trouble deciding which model would be best to use for this kind of data and analysis.
  • Each subtest score is ordinal. There are 20-30 subjects per group.
  • Recommend using rank-sum test and proportional odds regression (or polytomous regression which requires a large number of parameters). May consider generating scatterplots and boxplots to visualize data. Permutation tests provide an estimate of the magnitude of the effect, and the p-value is the proportion of permutation tests where the test statistic is as extreme or more extreme than the critical value. Can conduct separate regression analysis with ASD group using ordinal severity score.

2016 November 17

Tenisha Hinners, Pathology Resident

  • "I have been working on a retrospective case-control pilot study with Dr. Alison Woodworth, who was previously a faculty member in the Department of Pathology, Microbiology, and Immunology. The objective of the study is to determine the diagnostic utility of four serum markers (hCG, AFP, CA-125, CRP) and maternal age to predict ectopic pregnancy (EP) at presentation to the emergency department in pregnant women with symptoms of vaginal bleeding, abdominal pain, or cramping. Specifically, we want to know if a combination of these markers into a multivariable logistic regression model will provide a more powerful predictor of pregnancy outcome (viable pregnancy vs. spontaneous abortion vs. ectopic pregnancy) than any one marker on its own. We have collected data on a small sample size of 122 (16 EP, 30 spontaneous abortions (SA), 51 viable intrauterine pregnancies (VIP)) and want to know if this type of analysis is feasible with our numbers."
  • Gold standard for EP diagnosis is transvaginal ultrasound or laparoscopic surgery for EP documented in medical chart. Can look for patterns in scatterplots of each combination of markers with dots color coded by diagnosis. Compare means of biomarkers among diagnosis groups in an exploratory analysis. Recommend regressing each of the 4 markers against EP individually. May consider creating a weighted score of the markers but may still not achieve statistical significance. Will not have adequate power for a logistic regression model given sample size.
  • Recommend applying for 35-hour VICTR voucher for statistical support.

Logan LeBlanc, 3rd year medical student

  • The Vanderbilt Street Psychiatry Program assists mentally ill homeless patients in a street setting. Through a partnership with a local non-profit, qualifying patients are also applied for disability+medicaid coverage, and a small cohort of patients (~60) have been approved and received this coverage. Experientially, our program has recognized that patients have a significant reduction in frequency of VUMC ED visits and hospitalizations once they receive disability+medicaid coverage. We are hoping to better investigate and quantify this reduction.
  • We are conducting a retrospective cohort study among patients who have been approved for disability+medicaid coverage through our program. We have used starpanel records to total the number of ED visits, hospitalizations, and total length of stay for each patient in the cohort in the 1-year period preceding medicaid approval and the 1-year period following approval. We have also collected similar data for the 6-month and 18-month periods before and after approval, and I have questions about the best method of analysis.
  • Recommend calculating rate of ED use per 100 person-months before and after approval. Use total calendar time since approval even if patient has not been seen in the ED for a period of time. This assumes that the likelihood of transiency is the same for a pre-specified time window (ex. 6, 12, or 18 months) before and after approval. May want to consider limiting analysis to group of patients with documented ED visit at least 6 months (or 12m, 18m) prior to approval. Would know that patient lived in the area during that time.
  • Alternatively, could vary time window based on duration of specific patient's records. Potential bias with underestimating ED utilization prior to approval and bias with patients being more likely to stay in the area once have approval and are receiving care.
  • Can return to clinic for additional assistance or apply for 35-hour VICTR voucher for statistical support.

2016 November 10

Jocelyn Durlacher, Medical Student

  • "My faculty mentor is Cecilia Chung MD MPH; she is in the Division of Rheumatology and Immunology. I am hoping to attend Biostats Clinic to discuss my project on Resistant Hypertension in patients with Lupus. I am trying to apply for VICTR Biostatistics funding and would like a quote for the VICTR process. Additionally, we are in the process of extracting data from the synthetic derivative and would like advice on how to best format our dataset for analyses.
  • "The project is based in the Synthetic Derivative, so we have extensive de-identified longitudinal data on a cohort of n=1136 patients with Lupus as defined by an algorithm that has been validated to find patients with Lupus. The primary outcome of the study is to look at the incidence rate and prevalence of resistant hypertension (Blood pressure that is not controlled on 3 or more blood pressure medications) in patients with Lupus. We eventually hope to compare this to a matched control population (but for now we are just focusing on establishing the incidence rate and prevalence in the Lupus cohort). When establishing the incidence rate of resistant hypertension- we will pay close attention to the temporal relationship of Lupus and resistant hypertension. Patients will be considered to have resistant hypertension only in cases where this occurred AFTER the first ICD9 code for Lupus. Incidence rate will be defined as: patients with a history of SLE and development of resistant hypertension after first SLE ICD9 code / person years of observation time. We will extract data on patient age, gender, race, ethnicity, BMI, cholesterol (and the rest of the lipid panel), Creatinine, GFR, Lupus related labs including: ANA, anti-dsDNA (yes versus no), antiphospholipid antibodies (yes versus no), C3, C4. We are also interested in extracting data on comorbidities including Type 2 Diabetes (marked with a flag in the synthetic derivative), end stage renal disease, myocardial infarction, stroke (all based on ICD9 codes). We will also extract data on medications as listed in the synthetic derivative- categories of medications we are extracting are: Lupus medications (immunomodulators), Anti-malarials (used in treatment of Lupus), Corticosteroids and Blood pressure medications. We will possibly compare differences in the covariates listed above in patients with resistant hypertension versus patients without resistant hypertension. We will possibly compare differences in the covariates listed above in patients with resistant hypertension versus patients with controlled hypertension versus patients without hypertension.
  • "I have additionally attached a word document with a draft that has empty tables showing how I was thinking of displaying the data."
  • Recommend applying for 35-hour VICTR voucher for statistical support (data analysis May 2017)
  • Excel database structure for repeated SCr measurements. Recommend using REDCap database if possible.
  • Retrospective study from 1995-present. Expect ~200 out of 1136 patients to have resistant hypertension. Goal to establish incidence and prevalence rates. Survival analysis models 1) unadjusted and 2) adjusted for renal disease, hyperthyroidism, sleep apnea

Timothy Hegeman, Cardiovascular Medicine Fellow

  • "I have a project and would like to review the feasibility if possible. The practice of carotid artery stenosis screening using duplex ultrasound is common, but the impact of such screening is unknown. Question: Does screening with carotid ultrasound improve outcomes in cardiac surgery? Patients: All patient undergoing the CABG, SAVR, MVR, MVRe, TAVR, Aortic arch repair, LVAD, Transplant. Exposure: Receiving a carotid ultrasound at VUMC within 12 months of the procedure. Primary outcomes: Incidence of perioperative mortality and/or stroke. Secondary outcomes: mortality, stroke, for the CABG subgroup the number of grafts, LOS, reimbursement/cost."
  • Have 10 years of data and 10,000 cardiac surgeries with 40% having had an ultrasound. Expect event rates of 2-4% stroke and 2-4% mortality. Collecting age and status of carotid disease, smoking status, hyperlipidemia, and hypertension in 12 months prior to procedure. Do not have data on patients who had ultrasound and physician decided to change procedure or not to do the planned procedure, which will bias results. Recommend altering research question or narrowing population to reduce bias. Limitation of study when reason for ultrasound unknown.
  • Patients at higher risk of stroke/mortality (ex. having carotid disease) may be more likely to have ultrasound. Certain procedures (ex. valve surgery) have a higher risk of stroke. Urgency of procedure may be related to higher mortality. TAVR really only done in last 4 years. May be difficult to control for large number of procedures in statistical model. May consider adding some procedures to exclusion criteria.
  • Recommend applying for 90-hour VICTR award for statistical support. Will need to control for calendar year to account for changes in surgery procedure and the fact that fewer ultrasounds are being done in the last few years.

2016 November 3

Scott Karpowicz, Pharmacy Administration Resident

  • "We previously presented to the Monday clinic in August regarding my project proposal to examine the impact of a discharge prescription service on hospital readmissions at Vanderbilt. We’re currently in the application process for a VICTR grant (VR22383), and we’d like to request biostatistics assistance with data analysis. Dan Byrne provided some initial feedback on our VICTR application and requested that we visit another clinic for further discussion. Our responses to his questions are attached."
  • Note potential for physicians to recommend Meds-to-Beds service more often for sicker patients (indication bias). Possibility that patients who decline service cannot pay for meds at the bedside. Also unable to determine whether patients who decline service actually have the prescription(s) filled at an outside pharmacy.
  • Recommend calculating propensity score and either 1) including propensity score in your regression model or 2) weighting regression by propensity score. Intention-to-treat analysis will help with indication bias.
  • Will need statistical support to calculate propensity scores and build regression models.

Yaa Kumah-Crystal, Department of Pediatrics

  • "I would like to review my criteria for identifying good matching controls for a cluster analysis I will be doing and I would like to run some thoughts by you statisticians to see if you have any additional suggestions.
  • "Aim: This study aims to improve communication between families managing pediatric diabetes and their providers. We hope to demonstrate the using before-visit questionnaires to help families identify their barriers to adherence will lead to better communication, and increased documentation of the family’s barriers to adherence in the provider’s clinical notes.
  • "Population: We have 17 provider participants. We have 102 intervention patients who will complete before-visit questionnaires to identify their diabetes barriers before their clinic visit. Each of the providers will have encounters with 6 different intervention patients that have completed a before-visit questionnaire. We will evaluate the providers notes after the encounter with their 6 intervention patients to see if their notes show an increased frequency of documentation of barriers to diabetes adherence. We will compare the frequency of this documentation of barriers to adherence for the providers notes in their 6 intervention patients that completed the before visit questionnaire prior to their clinic visit compared to the frequency of documentation of barriers for 12 control patient notes that did not complete a before visit questionnaire.
  • "Data collection: We will match the intervention patients to patients seen by the same provider during the intervention, and we will also use historical notes from the intervention patients seen by the same provider prior to the intervention as a basis of comparison. Patients will be matched based on clinical criteria relevant to their diabetes management and related to their potential barrier to adherence. The matching criteria will include: Age, gender, A1C, and duration of diabetes. We will use cluster sampling to determine the changes in documentation per provider for each cluster group. We will perform analysis on clinical notes generated from notes for patients that are in the intervention compared to notes for patients that are not in the intervention, that are generated during the intervention period. Notes generated by providers from patients that are not participating in the intervention will be analyzed to compare changes in documentation.
  • Sample Size Justification and Statistical Analysis Plan: In this study we plan to enroll 17 eligible providers with 103 patients in the intervention group and 17 providers with 206 patients in the control group within a 6 month duration period. We target enrollment in order to achieve an average of 6 intervention patient participants for each provider participant, and 12 control patients per provider participant. We will use Generalized Estimating Equation (GEE) method to adjust for the cluster effect within provider and to determine the intervention effect on the outcome of barriers to adherence, which is a grading scheme on a scale from 0 to 5. The average degree of documentation in the preliminary study was 1.55 (SD 1.72). Based on our preliminary study, in the worst case scenario for our cluster evaluation where the correlation coefficient in the clusters is 1, the valid sample size will reduce to the number of providers, which are 17s in both intervention and control groups. Assuming that the difference in the experimental and control means is 1.5 with standard deviation 1.7, based on the two-sample t-test statistics we will be able to reject the null hypothesis that the population means of the experimental and control groups are equal with probability (power) .829. In a best case scenario where the correlation coefficient is 0 within the clusters, it is equivalent to have a valid sample size of 103 in the experimental group and 206 in the control group. The power to reject the null hypothesis that the population means of the experimental and control groups are equal will increase to 1.000. The Type I errors associated with the previous two power calculation are 0.05.This analysis was performed using PS: Power and Sample Size Calculation version 3.0.43, by William D. Dupont and Walton D. Plummer, Jr."
  • Recommend a cluster randomized trial where the physicians are randomized to intervention or control (8 in each group) without any crossover. Control group will fill out another form or complete no forms. Will need at least 6 patients per physician. Outcome is whether physician documents certain information in the medical chart (ordinal variable with 6 levels). Will need to review physician documentation in a patient's chart even if the patient did not agree to complete the form.

2016 October 27

Mali Schneiter, DO

  • Review of protocol: "Risk of Endometrial Hyperplasia and Carcinoma in Marathon Runners: A Cross Sectional Survey"
  • Risk factors for endometrial cancer: polycystic ovarian syndrome, early menarche, late menopause, high estrogen exposure, anovulatory process, obesity are additional risk factors.
  • Sample size will depend on incidence of endometrial cancer in general population. Control data from WHO, but this does not include BMI data. This could result in confounding by BMI, if BMI affects cancer risk. It would be ideal to match on age and BMI. At which point in time does BMI matter?
  • May consider collecting pilot data demonstrating reduced risk of cancer in marathon runners. May also consider using available data in Synthetic Derivative on endometrial cancer diagnosis. This information could be used in a power calculation to assess feasibility of larger study.

Kelsey Gregory, Pediatrics Resident

  • Review of protocol: "iSLEEP (Improving Safe Sleep Learning and Education in the Early Period)"
  • Plan to enroll 200 mothers in a 2-month period at VUMC nursery. Randomized to 1 of 3 interventions: standard oral teaching, standard oral teaching + Video A, standard oral teaching + Video B.
  • May consider weighting questions when calculating survey summary score. Expect to see improvement in scores and one intervention to show higher level of improvement.

Rebekah Nevel, Pediatric Pulmonary Fellow

  • "I am doing a project on growth in children with one type of rare interstitial pediatric lung disease. I have a specific question regarding completion of a Kaplan Meier curve on duration of continuous supplemental oxygen requirement in those with and without failure to thrive."
  • Nested retrospective cohort within prospective study. Need to be cautious of immortal time bias when patients have been tracked since birth. Patients were enrolled at diagnosis (generally first 1-2 months of life).
  • Measurements: age at coming off supplemental O2, whether currently on supplemental O2, current age, and initial weight percentile.
  • Must define time of cohort entry (initiation of O2), time of exit (come off O2), end of study (8/2016), status (0 = still on O2, 1 = came off O2). Patients still on O2 at end of study are censored at that time (8/2016).

2016 October 13

Joey Starnes, MD/MPH Student

  • I have been helping with a project in the Department of Pediatric Surgery/Trauma under Purnima Unni. I was hoping to come to clinic to briefly discuss analysis of the dataset we have collected. This is a small dataset consisting of golf cart accidents identified from the medical record. We hope to characterize the nature of these accidents and the injuries associated with them, primarily through descriptive statistics and relative risk. We have also considered doing a heat map or GIS with zip code data.
  • Trauma database from 2008-2015 with 30 events (golf cart accident) out of 3300 trauma cases. About half of the cases were referred to Vanderbilt from another hospital. Variables include diagnosis, role of child in event, age, gender, treatment, disposition, zip code of injury location, and injury location on body. National dataset includes 1500 events with diagnosis and injury location on body
  • Primary research question: characterize golf cart accidents involving children. Recommend reporting descriptive statistics (including percentages of broad diagnoses) and discussing the limitation that this data includes tertiary cases

Joe Wick, Medical Student

  • "I am working on a project with Dr. Clint Devin in the department of orthopedic surgery. Our question regards inter-rater reliability. We are working on a project in which we intend to send a survey to physicians at other institutions asking them to determine whether patients need surgery based on imaging (CT scans, MRIs) that we send with the survey. Physicians will be able to select one of three answer choices on the survey. I am hoping that the biostats clinic can help us to determine the proper inter-rater reliability calculation to use (e.g. is Cohen’s kappa appropriate?), the proper number of patients to include on the survey, and the proper number of respondents/raters to send the survey to."
  • Recommendations can include surgery, back brace, or no additional treatment
  • From registry of 800 cases that were initially treated with a brace based on initial images (supine scans), identified 13 cases for which the recommendation changed to surgery after review of follow-up images (upright scans). Survey assumes initial images are sufficient and upright scans are not required. Do upright scans add anything to treatment decision? What is the utility of the follow-up images?
  • Surgeons identified to complete the survey are PI's colleagues
  • The kappa statistic adjusts for the frequency-of-event issue. High levels of agreement would yield low power, but this research question does not warrant a sample size calculation.
  • A prospective design could incorporate a patchwork assignment of reviewers (surgeons) to patients.

2016 September 29

Jeeyeon Cha, MD, PhD

  • I'm preparing a VICTR grant for a small clinical trial. I'd like to discuss experimental/study design, appropriate measurements and data analysis, and interpretation of results, but am open to discussing other measures/aspects as indicated. Please let me know if you require any further information.
  • Previous study of preterm birth in mice. Want to know if pathway is relevant to preterm birth in humans. Methods involve staining of placental sample. Plan to compare staining among 4 groups, preterm laboring vs. preterm non-laboring vs. term laboring vs. term non-laboring. Study has already been done in Japan with 6-8 samples per group.
  • What is the required sample size? Look at standard deviations from published Japanese study and assume a 25% higher SD for the new study. Can utilize PS: Power and Sample Size Calculation software ( Input in t-test tab: output sample size, independent design, input alpha=.05, power=.9, delta=2, sigma=3, m=1, graph difference in population means with x-axis range 1-4 and y-axis range 0-200 (max sample size), review description which explains required sample size per arm. * Recommend applying for 90-hour VICTR award for Biostatistics support beginning with study design through manuscript publication

Lanier Sachs, Special Education

  • "I work for Drs. Laurie Cutting and Sheryl Rimrodt-Frierson at the Education and Brain Sciences Research lab in the department of Special Education. We are interested in attending the clinic to discuss participant randomization procedures for our upcoming IRB-approved clinical trial. We seeking help determining best practices for randomizing 99 patients into three equal groups matched on both age and gender, and help in creating a randomization schedule to provide to VU IDS and CTC."
  • "Our study is a clinical trial involving both behavioral and pharmacological intervention in children ages 10-17 with Neurofibromatosis Type 1 and comorbid reading problems. The question we are asking is whether there are facilitative effects of the pharmacological agent on responsiveness to reading tutoring that improve learning, both at short-term and long-term time points. We plan to use slope-as-outcome hierarchical linear model in our analysis."
  • NF is a genetic disorder with prevalence of 1 in 3500. Goal is to address learning disabilities associated with disease.
  • Plan to recruit 99 subjects over 4 years based on previous power analysis. For randomization, plan to match on age and gender. How should this be done? IDS requested randomization document to explain how to randomize patients as they come in.
  • Classic method to use is randomized block design which produces a randomization scheme per block (ex. males aged 10-13y). REDCap incorporates randomization design. Can generate blocks per year of study enrollment, but this will add another level of complexity to the statistical analysis.
  • Use computer program (ex. R) to create ordered list of treatment assignment within each block. After setting the seed, use function sample() to specify block size and generate condition assignments for each of the blocks.
  • May consider applying for 35-hour VICTR voucher for Biostatistics support beginning with study design through manuscript publication

Wyatt McDonnell, Pathology, Microbiology, & Immunology

  • Planning pilot study with 24 participants measured at 4 time points. Collecting blood sample and sequencing to develop antibody profile; there are 7 features of sequencing that will be documented. Do sequencing changes (biomarkers) predict co-infection state (categorical outcome: no infection, TB only, AIDS only, both TB & AIDS)? Group 1 will hold AIDS status constant; Group 2 will hold TB status constant.
  • Want to calculate power given sample size.

2016 September 22

Tarsheen Sethi, Clinical Fellow in Hematology-Oncology

  • "I am a hematology/oncology fellow and MSCI student and am working on a project titled "MYD88 and PD-1 Pathways in Central Nervous System Lymphoma" and need help with streamlining my data analysis."
  • Recommend constructing Kaplan-Meier curves.
  • Recommend collaborating with researchers at other care centers in order to increase sample size.
  • Use initial data to estimate the sample size needed for future studies.

Ryan Castoro, Physical Medicine

  • Preparing grant for VICTR. Need grantsmanship advice for a grant that does not plan to implement statistical analyses.
  • Recommend pursuing valid, small-sample exploratory analyses.
  • Recommend providing reviewers a plan to pursue scientific studies after initial exploratory phase.

2016 September 15

Grace Umutesi, MPH Student

  • "I am leaving for Kenya this weekend (Sat Sept 17th) to on the Evaluation of Kijabe Nurse Anesthetist training program and I had few question concerning a sample calculation before I leave."
  • Planning to conduct an evaluation of 3 clinics since the placement of CRNAs with advanced training. This will include a facility assessment (supplies, personnel) and interviews with mothers to determine community perception of obstetric care received (ex. where was baby born, what influenced decision for this location, knowledge of available resources such as CRNA nurse). Want to compare historical opinions to opinions after CRNAs were placed (within the last 3 years).
  • How to decide number of individuals to interview? Want to compare rate of mothers who chose to deliver at hospital between historical and current groups. Recommend interviewing as many mothers as possible. We can later generate the necessary power curves.
  • Which individuals to interview? Want representative sample. Have eligibility form for inclusion criteria (age, multiple pregnancies). Will interview mothers at the clinic or visit homes. Limiting interviews to patients in the clinic would limit generalizability to entire community. Churches could be another potential source for interview. May want to look at national data (birth records, immunization records) or consider sampling 6-year-olds and interviewing their mothers. May also want to interview mothers at facilities without CRNAs.

Andrew McKown, Fellow in Pulmonary and Critical Care

  • "I would like some help please in interpreting an analysis. I have a dataset of ICU patients in which I am assessing whether steroid use lowers the risk of ARDS. Using multivariate logistic regression, there is a significant reduction in risk, but a reviewer requested an analysis accounting for competing risk with the outcome death. I am using the cmprsk package in R, but I need some assistance in interpreting the output."
  • Outcome of interest for acute respiratory distress syndrome (ARDS) is death within 96 hours of admission to ICU. Documented whether patient was on steroids prior to admission (prescribed for immunosuppression or asthma treatment). Transplant patients were excluded.
  • Do steroids reduce risk of ARDS? ARDS is a disease of inflammation and is sometimes treated with steroids.
  • Cohort includes 1080 patients, 30-40% developed ARDS, some died within 96 hours of admission. On the morning of ICU Day 2, a patient was enrolled in the study if (s)he had at least one risk factor for ARDS. Days are by calendar day, not 24 hours.
  • Time 0 (inception point) should be Day 2 to avoid immortal time bias. You cannot look forward (after Time 0) to define the cohort. This will exclude patients who had ARDS upon admission.
  • Options for outcome of interest: 1) death within 96 hours of Day 2; 2) ARDS or death; 3) death without ARDS and death with ARDS; 4) given patient died or had ARDS, what is probability that ARDS was diagnosed.
  • Because of confounding by indication, recommend propensity score analysis adjusted for ~50 variables (include splines for continuous variables)
  • Resources:

2016 September 8

William Martinez, Internal Medicine

  • "I would like to review my analysis of survey data we collected from 837 physicians. I have done most of the analysis in SAS. My primary questions is now to do poststratification weights to adjust for survey nonresponse. We have demographics for the total group of physicians surveyed and the demographics of the respondents."
  • Surveyed interns and residents from six different institutions regarding attitudes and behaviors toward speaking up on safety issues. The overall response rate was 50%. Found differences in response rates by gender and postgraduate year (PGY, across institutions). Survey question responses were on a 5-point Likert scale.
  • Demographics included gender, specialty, PGY, study site, self-reported formal training in patient safety. Respondent demographics were self-reported, and some values are missing. Demographics of physicians surveyed were from administrative data (no missing data).
  • Generated 56 strata from institution, gender, PGY, and specialty. A select number of strata had zero physicians represented in sample.
  • Conducted analysis both with and without weights. Recommend comparing standard errors between weighted (expect to be larger) and unweighted analyses.
  • We have assumed that any missing data are missing at random.

2016 September 1

Dr. Karl Moons, Visiting Scholar

Please do not schedule any clients.

2016 August 25

WITHDREW: Jeeyeon Cha, MD, PhD

I'm preparing a VICTR grant for a small clinical trial. I'd like to discuss experimental/study design, appropriate measurements and data analysis, and interpretation of results, but am open to discussing other measures/aspects as indicated. Please let me know if you require any further information.

2016 August 18

Whitney Muhlestein, Medical Student

  • I am doing outcomes research in the Neurosurgery Department, and my mentor is Dr. Lola Chambless. I am using machine learning techniques to predict whether a patient is discharged to home or not following a particular neurosurgical procedure based on preoperative conditions.
  • I was interested in trying a machine learning approach because I hadn't seen it used a lot in neurosurgery outcomes research, and I wanted to see if I could build a more predictive model than a basic logistic regression using different classes of machine learning models in an ensemble approach. I trained my models (34 different machine learning models, including a logistic regression) using a training data set (67% of the data), and then validated those models on a holdout dataset (the remainder of the data). I ranked the predictive power of the models based on the AUC of the ROC curve from the holdout dataset. The model with the highest AUC ended up being an ensemble model combining a Random Forest Classifier, and Elastic Net Classifier (which is a regularized regression), and a Nystroem Kernel SVM.
  • I am also analyzing my data with classic statistics. Specifically, I am comparing characteristics of patients who discharge home and those who do not to look for statistical significance. I did some basic statistics comparing preoperative characteristics of patients who do go home and those who do not. I want to make sure that I am using the correct types of statistical tests and that I am treating missing data appropriately.

2016 August 11

Juan Pablo Arroyo, Internal Medicine

  • Assistance with VICTR application for a study on the role of chloride as a predictor of residual kidney function after donor nephrectomy. The PI is Dr. Gautam Bhave.
  • Have sample of 850 patients. Chloride (Cl) and creatinine (Cr) were measured at pre-surgery, immediately following surgery, and post-surgery.
  • Planning to use linear regression model. The dependent variable will be post-surgery Cr, and the independent variables will be pre-surgery Cl, post-surgery Cl, and pre-surgery Cr. It would be ideal to have lag in Cl and Cr measurements and to study longitudinal measurements from healthy controls.
  • Recommend applying for 90-hour VICTR award for Biostatistics support

Joseph Conrad, Chemistry

  • "I’m working with my research group on a human subjects study design to compare the performance of standard of care rapid diagnostic tests for malaria and enhanced versions of these tests. The study will collect primary blood specimens from individuals in malaria endemic regions in rural Zambia and will be incorporated into the resubmission of an upcoming R01 application. This is a resubmission and previously received a priority score of 37 with criticism that the 900 person human subjects study was too ambitious for the apparent early stage technology. I’d like to attend an upcoming Biostats Clinic to discuss this study and receive feedback on study design and proposed statistical analysis and suggestions for improvement."
  • There is an issue with the rapid diagnostic test having low sensitivity with low parasitemia levels; these infections go unrecognized. Sample will be comprised of people who present to a local clinic with malaria symptoms or people in households with a case of diagnosed malaria. We will not select cases and controls based on gold standard test. We will have paired data.
  • Goal is to demonstrate that the enhanced rapid diagnostic test performs better than standard of care rapid diagnostic test. The gold standard for malaria diagnosis is thin smear microscopy or PCR (reference). Plan to compare results of rapid diagnostic tests to gold standard truth, then compare accuracies, sensitivities, and specificities of each rapid diagnostic test. Need to decide what margin of error for the difference in proportions would be acceptable to make conclusions.

Brad Christensen, Internal Medicine

  • Retrospective study of bone marrow disorder MDS. Sample data from 2005-2015 includes 250 patients.
  • Outcome: measure of scar tissue/fibrotic marrows on scale 0-3. Patients with 0 or 1 generally have 14 years to AML, and patients with 2 or 3 have 5 years.
  • Assess trend in degree of fibrosis using rank correlation. Sample size calculation based on acceptable margin of error for half the width of the confidence interval (see 8-14).

2016 August 4

Tyler Casey, PharmD, PGY-2 Psychiatric Pharmacy Resident

I am a psychiatric pharmacy resident and I am putting together a study proposal and am looking to get advice from the biostats clinic. My study will be looking at whether patients who are CYP2D6 poor metabolizers are more likely to have experienced adverse effects from antidepressants. The study is still in its early stages, I am looking to discuss: whether the study design I've selected is appropriate for my aims; what is the proper statistical analysis; and how do I find the appropriate patient sample size.
  • This is a retrospective study of patients who have experienced a major depression episode and non-response to an antidepressant. The BioVU database will be used to identify the patient's genotype and categorize the patient as a poor, intermediate, extensive, or ultrarapid metabolizer. It is cost-prohibitive to have genotyping done for patients who are not already in the BioVU database.
  • Can conduct a paired case-control study matching cases (poor metabolizers) and historical controls on pre-specified factors (ex. age, sex, race, medication adherence, and other factors that may be indicative of non-response). Match the patients based on probability of non-response. If a validated score has already been developed, this can be used. When calculating the sample size, note the prevalence of the genotype of interest (poor metabolizer) and determine a reasonably sized difference you want to be able to detect.
  • Another option is to adjust for additional factors in a regression analysis on all eligible patients. This will require 10-20 patients per degree of freedom in the model.
  • Need to develop a clear definition of adverse effects (ex. cause the patient to stop the drug, change therapy, or change dosage) because they can be highly varied.
  • Planning to apply for VICTR award (90 hours). Review and fill out the "VICTR Resource Request" at

2016 July 28

Laurel Teller, Doctoral Student, Hearing and Speech Sciences

  • "My research project entitled "Does Complex Syntax in Parent Input Vary by Child Language Status?" will compare parent language input variables for children with different language levels. I need help to develop my regression analysis and correlations. I do not have data yet, but would like to talk through my research questions and how to set up the analyses. My research questions are as follows: 1) What is the relation between measures of parent complex syntax input and child language outcomes? (planning to use mixed effects linear regression); 2) How does the proportion of specific types of complex syntax compare among parents of children categorized in three language groups 16 months prior to the outset of the study? (planning to use ANOVA with linear contrasts); and 3) What is the relation between parent complex syntax and associated parent language measures and maternal education level? (planning to use Pearson's R)."
  • Sample includes families (parent and child pairs) in one of three child language levels. Ten "typical language" children were matched with ten "delayed receptive language" children. There were also five un-matched children with problems expressing themselves. The child's language level was assessed 16 months prior to assessing the complex syntax in the parent's input. To categorize the child's language level, cut points were applied to the child's performance on a test. It is recommended to research whether any published journal articles have justified the cut points that were used. Currently, there is no known global assessment of a child's ability to communicate.
  • The parent and child were recorded speaking in their home for one day. Portions of the recording were transcribed by the researcher until 200 utterances by the parent were transcribed. Complex syntax is defined as a sentence with more than one verb. The proportion of parent utterances classified as complex syntax (out of 200) was documented. The proportion of different types of complex syntax were also documented. Complex syntax does not account for speech rate or repetitiveness of speech. An algorithm was applied to account for the amount of background noise.
  • The researcher was not blinded to the child's language group when transcribing the recordings. It would be better to randomize the order of processing the tapes. Given the small sample size, the proportions need to have been measured with high precision (high test-retest reliability, high observer reliability, and low inter-observer variability).
  • Research Question (RQ) 1: It is recommended to use linear models (with all fixed effects) for each of the dependent variables 1) parent's proportion of complex syntax (mean 25%, SD 15) and 2) number of complex syntax types. The independent variables are child language score, gender, age, and ethnicity. Note that mixed effects linear models could be used if the data were longitudinal (repeated recordings) and clustered on families.
  • RQ 2: As a guideline, there should be ~15 families per research question, so the analysis should be simplified given the small sample size. It is recommended to use variable clustering to explain how the different types of complex syntax (proportions) run together. If any of the proportions were correlated, this would make the analysis more complex and require an even larger sample size to tease out the relationships. As a solution, you can use cluster analysis for the proportions and create cluster scores instead of unentangling the relationships.

2016 July 21

Chirayu Patel, Radiation Oncology

  • I need help to determine the appropriate sample size for a randomized clinical study of the impact of educational intervention in the clinic (using visual presentation during clinic consultation) on regret regarding decision to undergo radiation therapy vs. surgery, perceived side effects, and satisfaction with cancer care. We plan to use the EPIC side effects scale (EPIC 26; after treatment), Ottawa regret scale, and SCA service satisfaction scale for cancer. My mentors are Eric Shinohara and Austin Kirschner.
  • Planning session can happen at various durations after consultation; patients can forget radiation side effect discussion
  • Need to choose a central outcome measure; probably the Ottawa regret scale
  • Can size the study for power or for precision (margin of error; 1/2 the width of the confidence interval for the treatment difference)
  • Ottawa scale review paper of 5 or so studies, gives means and SDs from each study; 16 is a safe estimate
  • The number of patients in each of 2 groups that is necessary to achieve a margin of error of 6 in estimating the difference in means with 0.95 confidence is __
  • If you wanted to achieve half that in the margin of error you would need 4x as many subject

2016 July 14

Amanda Currie, Research Intern, Department of Neurology.

*Regarding the appropriate use of matching for a clinical trial utilizing a historical control group. The data is collected from a prospective pilot clinical trial testing the safety and tolerability of deep brain stimulation (DBS) in early stage Parkinson’s disease (PD). The trial randomized 30 subjects to treatment with DBS + optimal drug therapy (ODT) or ODT alone, and the primary analysis at 2 years is reported in the attached manuscript (Charles et al., 2014). Fourteen subjects with DBS were followed for 3 additional years to gather long-term data.

*We hope to compare data from subjects in the DBS + ODT group to a historical control group (treated with ODT). I believe that data from subjects randomized to the optimal medical management group of a 5-year trial of creatine in early stage PD are the best available control group. We have requested access to this dataset and are awaiting a reply (we hope to gain access to patient-level data). The primary outcomes paper for this trial is attached (NETPD, 2015). Although this was the best available control group I could find, some differences exist between the populations. Most notably, the inclusion criteria for the DBS study require an antiparkinsonian medication duration of 6 months – 4 years (mean 2.0 years), whereas the inclusion criteria for the creatine study require an antiparkinsonian medication duration of 3 months – 2 years (mean 0.8 years).

*During this clinic, I would like to gain guidance on the best way to select a control group from this study of creatine in early PD. Specifically, I have the following questions: *1. How many patients should be included in the control group for this analysis, and which factors should be used to select them (age, sex, antiparkinsonian medication duration, etc.)? As a reference, five-year data is available for approximately 345 subjects in the creatine study. *2. Would it be possible to use some of the data for the 132 patients who completed six years of follow-up by creating a new “baseline” at their 1-year visit? *3. Is there a way to create a model based on the 5-year data from the creatine study that could be used to predict patients’ scores on a number of measures at a time point that is comparable to the 5-year mark in the DBS study?

Jill Chafetz, Center for Professional Health

  • Wants to learn more about proportional odds regression

2016 July 7

Michael Ripperger, Student

  • "I have general questions regarding the optimal inferential statistical methods I could use to analyze a pre/post intervention program the hospital is undergoing. I will be interpreting program effectiveness with the available clinical data, but this is not a clinical study. I am working under Dr. Colin Walsh in the HARBOR Lab in the Department of Bioinformatics."
  • Subjects are high utilizers of ER services. This study is a paired design matching the same patient's pre- and post-intervention data. Cases are post-intervention; controls are pre-intervention.
  • Intervention is a specific care plan; outcome is the number of ER visits. Also planning to collect length of stay, type of visit, and cost to hospital.
  • Can generate a spaghetti plot of the number of ER visits pre and post. Can use a Wilcoxon signed rank test to compare number of ER visits between pre and post. Data are likely appropriate for a negative binomial model.
  • Would have been ideal to randomize subjects to intervention or control (no specific care plan) and to compare the number of ER visits between the two groups.

Jill Chafetz, Center for Professional Health

  • "I have questions about interpreting the SPSS output from the severity study (earlier clinic). I did not see values for the independent variables as a whole (such as cohesion or flexibility), only for the levels, plus the regression did not show all of the levels. I also need information about comparing prevalence rates of ACE (Adverse Childhood Events) scores from a sample of MDs to a much larger sample of the general public. The larger sample data come from the CDC, but I have not found either raw data or published statistics that would allow me to run comparisons."
  • For the independent variable 'Threshold', SPSS automatically set category 4 (highest) as the baseline rather than category 0 (lowest). The log odds for the baseline category is equal to the estimated intercept (alpha) in the model. The odds ratio for category 0 compared to category 4 is e^beta0, where beta0 is the estimated coefficient for category 0. To makes thing easier to interpret, there is a way to specify the baseline category as 0 in SPSS.
  • Recommend collapsing dependent outcome 'Severity' into 2 categories to learn how SPSS handles the independent variable(s) in a binary logistic regression model. Then continue with proportional odds regression model.

2016 June 9

Jill Chafetz, Center for Professional Health

  • There were 279 physicians referred to a course in maintaining boundaries. Those with a very extreme violation were fired and not referred to the course. The collected data include severity of the boundary violation classified by the investigators (ordinal, 1-4, "other", "harassment" (toward colleague/staff only), "impropriety" (toward patient), "violation"(toward patient)) and type of family background assessed by 25-question instrument FACES (ordinal, 1-3, "balanced", "midrange", "extreme"). The 3 family background types were further subdivided into 16 subtypes (categorical), and one of the subtypes ("disengaged rigid") accounts for 30% of the physicians. The sixteen subtypes have been validated with good reliability. There is no control group with zero violations.
  • The question is whether the subtypes predict severity of the boundary violation using ordinal logistic regression.
  • Recommend using proportional odds regression lumping together "impropriety" and "harassment" categories due to concerns regarding appropriateness of ordering in severity of violation
  • Due to small cell counts, recommend re-categorize 16 family types into 2 variables 1) cohesiveness ("separated", "connected", "disengaged", or "enmeshed") and 2) flexibility ("structured", "flexible", "rigid", or "chaotic")
  • Recommend including other covariates in the regression model to adjust for possible confounding: age, gender, specialty, race, marital status
  • May want to add control group with zero violations from previous study (n=117)
  • May want to contact Bill Cooper at Vanderbilt's CPPA regarding their program to address multiple complaints about a physician

2016 June 2

Alyssa Hasty, Molecular Physiology and Biophysics

  • Assistance with VICTR application and future R01 submission
  • "We have found in mice, that adipocyte iron concentration is associated with obesity-related metabolic disease. In humans, indices of overall body iron overload also correlate with metabolic disease. I would like to design a study using lean and obese subjects to determine whether their adipocyte iron concentrations relate to metabolic phenotypes. In addition, we are interested in the adipose tissue macrophage iron content and handling. I would love statistical assistance to determine how many subjects I will need and how I will perform the statistical analyses once we have all of the data."
  • Preliminary work on human adipose tissue samples from gastric bypass patients. Planning pilot study of subcutaneous adipose tissue samples from 5 lean and 5 obese subjects.
  • Plan to compare macrophage iron concentration and handling between two groups adjusting for 20 covariates. This would require a minimum of 200 subjects for a linear regression model (guideline of 10-15 observations per degree of freedom). Can consider using propensity scores for dimensionality reduction if relationship between covariate(s) and outcome is not of primary interest.
  • Pilot data collection will not be completed prior to August R01 submission. Can utilize PS software to generate power curves using prior mice data. Requesting VICTR voucher for statistical support to prepare R01 submission.
  • For analysis of grant data, can contact Richard Peek (GI) regarding Biostatistics collaboration.

Matt Lenert, Biomedical Informatics

  • Cases previously hospitalized for congestive heart failure (CHF) and controls previously hospitalized for another reason. Outcome is unplanned readmission within 30 days; have 643 events. May want to consider time to readmission as secondary outcome.
  • Collected information on treatments (ex. schedule follow-up with PCP, follow-up telephone from hospital 1 week post-discharge), discharge location, and risk profile. Only have date of death if occurred during hospital stay.
  • Goal to determine how treatments decrease risk of readmission and incorporate this into treatment decisions for future patients.
  • Can use logistic regression to compare unplanned readmission within 30 days between the two groups. Time to readmission can be analyzed using Cox model.
  • Have already tested second-order interactions in logistic regression model and calculated area under the ROC curve (AUC). Can use bootstrap to account for possible overfitting with stepwise selection method that was used.
  • May want to contact Dan Byrne in Biostatistics regarding similar studies and read the following article for more information:

2016 May 26

Viraj Mehta, Ophthalmology

  • "I'm evaluating eye motility outcomes after surgery for orbital floor fractures in children. I have collected all the data, and needed help figuring out the best way to analyze it."
  • Viraj Mehta and his mentor Elizabeth Mawn came to clinic today. They have a small data set of children with orbital floor fractures and are interested if time to surgery affects improvements in eye motility. There may be some confounding in their data between time to surgery and external referrals, as children who present at Vanderbilt are operated on immediately.
  • They would like a small VICTR grant to help with their analysis, which is appropriate. I said that I would refer them to Amy for help with this process

2016 May 5

Lauren Heusinkveld, Neurology Division of Movement Disorders

  • Questions concerning appropriate methods for statistical analysis and handling missing data in quality of life measures in a recently completed clinical trial testing deep brain stimulation in Parkinson’s disease patients. Faculty mentors for this project are Mallory Hacker, PhD, and David Charles, MD.
  • Randomized 30 patients to treatment (DBS plus ODT or ODT alone); 28 completed 2-year trial. Dose increased in both treatment groups as trial progressed through time. At some point, increasing dose will no longer improve quality of life.
  • Outcome is quality of life measured by Parkinson's Disease Questionnaire (PDQ-39).
  • Missing 3-year data for 12 patients for the extension study (Year 3-5). Generate spaghetti plots and conduct linear regression on each individual patient's QoL scores. Collect slope of regression line for each patient and use rank-sum test to compare slopes between the two treatment groups. If the data are non-linear, then calculate area under the curve.
  • How should 4 patients who worsened and crossed over to DBS during the extension study be handled in the statistical analysis? Censor control patients at the time of treatment crossover.

2016 Apr 28

Bianca Flores, Student

  • "I am trying to use R for a 2 way ANOVA weighted means, however my dataset is not being read. My data I am analyzing pertains to mice (wildtype, heterozygotes, and homozygotes) to assess their performance on a motor task."

2016 Apr 21

Elizabeth Martinez, clinical fellow in pathology

  • The study is a clinicopathologic investigation into Acute Vascular Lesions in the Kidney Transplant and Relationship to Cellular and Antibody-Mediated Rejection through a retrospective review of kidney transplant biopsies with acute vascular rejection and assessment of the concurrence of T-cell mediated rejection by two different criteria (CCTT and Banff). Our cohort includes about 390 biopsies from a span of a decade (diagnosis rendered by VUMC Renal pathology division) all with vascular rejection. I have obtained data on patient demographics and clinical characteristics at time of biopsy, allograft characteristics, timing of rejection episode from data of transplant, C4d status, and when available donor specific antibody status, and followup information on clinical outcome/graft function/survival available on a subset within the cohort.
  • Some of my main concerns relate to the use of survival (K-M) curves to show graft survival from event of rejection episode and also the overall graft survival from time of transplant. I want to ensure I am going about this most soundly and appropriately. I would also like to depict graphically the timing distribution of the rejection episodes.
  • Dataset includes repeated biopsies for select patients who received a second graft after the first failed; subjects were followed for a 5-year period.
  • Censoring is okay as long as it is uninformative, where the risk of graft loss would have been the same if the subject was not censored.
  • Recommend generating a cumulative morbidity curve for time to graft loss using Kaplan-Meier and time of transplant as time 0. Do not recommend using time of rejection as time 0 because you need to account for the level of severity/progression at the time of rejection. Can create a table with the percentage of patients transplanted within prespecified time intervals to show variability in the time to rejection.
  • Recommend submitting a VICTR application for 90 hours of biostatistical consultation (

Cherie Fathy, MD/MPH Candidate

  • I will be requesting your help on my project on Pediatric Ocular Involvement in SJS/TEN
  • All 48 subjects have SJS/TEN, including 36 subjects who developed ocular involvement (OI). Goal is to identify independent risk factors for OI. A model of recurrent SJS/TEN would be limited given only 6 subjects had recurrence and the difficulty with determining status for all subjects.
  • There is a concern for overfitting a multivariable logistic regression model given that the effective sample size is 12. Any effects are at risk of being exaggerated.
  • Do not recommend including length of hospital stay as a covariate in the model since this is unknown at time of admission. Recommend uncorrected chi-square statistic over Fisher's exact test. If logistic regression models are used, confidence intervals for the odds ratios and ROC curves may be calculated.
  • There is not enough data to make definitive conclusions. This will be an exploratory analysis of a small dataset, so be cautious in discussing the results.

2016 Apr 14

Jonathan Kropski, Medicine, Pulmonary/Critical Care Fellow

  • Assistance with VICTR application
  • My primary need is assistance with the statistical analysis plan, and a quote for Biostatistics support for a Phase Ib clinical trial grant we are submitting to VICTR. The primary outcome is to demonstrate safety and tolerability of the proposed treatment 12 weeks after randomization. Our plan is to randomize 30 subjects 2:1 to active drug vs. placebo and follow them for 12 weeks to reach the primary endpoint (safety - proportion of patients who permanently discontinued therapy due to adverse events). Secondary clinical and biomarker endpoints will be assessed after 12 weeks, and 6, 9 and 12 months after randomization.
  • I have uploaded our proposed study protocol
  • Dr. Harrell recommended 90 hour request given multiple secondary analyses and time points.

Oakleigh Folkes, Student

  • "I have data from a behavior paradigm that I ran in which two mice enter a tube and the mouse that backs out is the loser and the mouse that remains in said to be dominant and the winner. Each time a mouse wins I give that mouse one point. I do not know now how to look at this data statistically, or if I need to. On the third day of the test I gave the mice a drug treatment. I do not know how to look at this data other than just based on observation."
  • "I also have data from a three chamber social approach, in which one mouse explores three chambers and one of the chambers contains a mouse. Time spent in each chamber is measured. I gave half the cohort a drug treatment, and the other vehaicle. In this instance I do not know if I should use a one or a two way anova."

2016 Apr 7

Chenjie Zeng, Epidemiology

  • "I plan to build a predictive model with newly-found risk factors and previously known factors using a cohort study data. The outcome is binary. I wish to know what would be the best way to test the added predictive values of the newly-found factors."
  • Needs assistance responding to comments from BioVU committee review; also planning to apply for VICTR voucher
  • Recommend clarifying the insufficient sample size will be used to gather preliminary data to plan a larger, adequately powered study in the future
  • Planning to use logistic regression for binary outcome; covariates age, sex, race, and comorbidities
  • Selected 40 SNPs to include in model based on biological plausibility; plan to use weights from previous analyses to calculate genetic risk score. Recommend clarifying model will not be overfit when including genetic risk score as a covariate
  • Bootstrap on c statistic to assess optimism

Jennifer Madu, Graduate Student

  • "I need help in deciding what type of statistics I need for my research results. I am evaluating improvement of nurse's knowledge and attitudes on end of life utilizing a clinical course called the ELNEC(End of Life Nursing Education Consortium). Pre/post tests and surveys are to assess their knowledge before and after the ELNEC."
  • Tests included true/false questions, but there is no overall scoring mechanism. Pre/post tests can be linked to individuals. Recommend univariate analyses for each question
  • May consider generating a total score for knowledge test by summing total number of correct responses; can compare pre/post knowledge scores using sum rank test
  • Recommend creating plots with line connecting an individual's pre- and post-score; plot improvement scores as well. See parallel coordinate plot (search wikipedia for example).

Paul Yoder, Dept. of Special Education

  • Needs help understanding methods of rate estimation in a partial-interval-estimated count framework.
  • Conducted data simulation using known rate of events
  • Given interval duration and true count, how do I know how much correction is applied by Poisson?
  • Estimated lamba calculated as -log(N0/N), where N0 is the number of intervals with no event, N is the total number of intervals, and N = N0+N1

2016 Mar 31

Paul Yoder, Dept. of Special Education

  • Needs help understanding methods of rate estimation in a partial-interval-estimated count framework.

Mariana Ciobanu, Pediatric Neurology

  • Statistics help with headache quality improvement grant proposal
  • Pediatric Neurology clinic headache patients comprise 40% of clinic visits, and there is seasonality with more headache visits during school year.
  • Plan to evaluate current referral/triage system and identify multiple interventions to implement in Pediatric Neurology clinic, General Pediatrics clinic, and ED
  • Aims are to improve time spent waiting for appointment (internal vs. external referral) and to decrease number of ED visits related to headache
  • Recommend factorial cluster randomization approach to evaluate interventions rather than interrupted time series design; can randomize residents in their rotations
  • Do not recommend use of SPC charts as their purpose is to show uniformity over time, and the objective for this project is to compare randomized groups.
  • Discussed VICTR award options

2016 Mar 24

Mark Tyson and Rohan Bhalla, Urologic Surgery

  • Assistance with VICTR application -- "We are using a national dataset (NSQIP) to study length of stay after cystectomy."
  • The biostatistics conference room computer was not responsive during this consulting session, so please excuse the brevity. Mark Tyson and Rohan Bhalla would like to apply for biostatistics support through VICTR. They introduced their study and their main objective is to assess determinants of length of stay following surgery in a large (n=2000) national database. There are over 100 potential covariates and they would like to build a prediction model. Given the complexity of the data and the potential for some iterative work between the statistician(s) and researchers, we recommend they apply for 90 hours of biostat support (a VICTR award).

2016 Mar 17

Stephanie Moore, Pharmacology Graduate Student

  • Assistance with VICTR application
  • "We are beginning an investigation into SNPs of specific genes of interest to us in populations that develop aberrant mineralization following injury."
  • Future goal is to identify patients at risk of mineralization at time of injury
  • May want to consider a continuous response for mineralization severity score or an ordinal response for count of instances where mineralization is referenced in the subject's medical record
  • Recommend minimum of 10-20 subjects per factor when fitting model
  • Recommend working on analysis plan with Dr. Quinn Wells who may contact Dr. Frank Harrell for guidance

Cherie Fathy, Medical Student

  • Retrospective study in Ophthalmology to assess whether increasing age is associated with an increased risk for receiving unsolicited patient complaints.
  • Complaints are a rare event; do not expect risk to be proportional over time
  • Plan to record complaint (repeated measure) in dataset along with provider's age at the time of the complaint
  • Censored after last complaint is recorded
  • Recommend using Cox model with cluster sandwich estimator with age as a time-dependent covariate
  • Reference: Modeling Survival Data: Extending the Cox Model(2000) by Terry Therneau & Patricia Grambsch

2016 Mar 3

Arion Kennedy, Molecular Physiology and Biophysics

  • Assistance with VICTR response
  • I am interested in quantifying immune cells in liver biopsies of obese patients with various pathologies of nonalcoholic fatty liver disease (NAFLD).
  • Previous published data provides SEM for CD8 counts (analyzed on the wrong scale)
  • Compute SD by multiplying SEM by square root of number of subjects used in the SEM calculation
  • See Section 5.8.3 of Biostat for Biomedical Research at
  • t critical value is 2.1 when n1=n2=10
  • Get the margin of error for estimating the difference between any two of the means (using 0.95 confidence level)
  • Can probably use WebPlotDIgitizer to digitize raw data
  • This would also allow taking logs and computing SD(log CD8 count)
  • Once you have SD(log) you can compute the margin of error on the log CD8 count scale
  • Antilog of this margin of error provides the multiplicative margin of error (fold change margin of error)

Dafina Krasniqi

  • "I am uncertain about the test to use when selecting the sample size for a three arm test."

Mia Keeys, Sociology Graduate Student

  • "The problem has to do with calculating sample size in a three arm randomized control trial."

2016 Feb 25

Juan Pablo Arroyo, Internal Medicine

  • Assistance with VICTR application
  • "The goal of our study is to evaluate if there is a correlation between the levels of serum Cl, the creatinine and the hospital admission rate in patients with congestive heart failure in steady state conditions. The idea would be to perform a longitudinal retrospective analysis."
  • Also have retrospective data collected from kidney donors (5 visits over 1-year time period). Recommend change-point linear regression methods to quantify relationships between Cl, creatinine, and hospital admissions.
  • In the CHF population, Cl is known to affect kidney fill volume which is lowered by administration of diuretics. Plan to use training and testing data sets with k-fold cross-validation to develop tool to predict hospital readmissions. Recommend using Cox Proportional Hazards model for time to hospital readmission with predictors Cl, creatinine, and additional covariates.
  • Dan Byrne's group already utilizes Cl in models predicting VUMC hospital readmissions.
  • The next steps are to develop a protocol, research available patient numbers in Synthetic Derivative, and return to Biostatistics clinic for additional feedback and power analysis discussion before submission to VICTR.

2016 Feb 18

Cherie Fathy, Medical Student

  • "The topic is on the epidemiology and risk factors for ocular involvement of Pediatric EM/SJS/TEN (severe allergic reactions). The statistics are done, but just wanted to make sure that I did the right tests."

Bill Heerman, Pediatrics & Internal Medicine

  • Discussion of sample size calculation for prospective cohort study
  • "I am hopeful that we will be able to use a latent class analysis to identify growth trajectories that are associated with asthma incidence and severity."

2016 Feb 11

Katherine McDonell, Neurology

  • Assistance with VICTR application

Jim May

  • Discussion of pilot RCT proposal

2016 Feb 4

Ben Theobald, Medical Student

  • Discuss use of the PLCO risk assessment model for lung cancer in the setting of incomplete data

Caitlin Ridgewell, MPH Student

  • Study of personality and psychotic symptomology
  • Question regarding clustering of patients into groups in the most efficient way vs. examining the data dimensionally

2016 Jan 28

Akshitkumar Mistry, Neurological Surgery

  • Discussion of meta analysis on survival time between SVZ positive and negative glioblastomas

May Ou, Chemical Engineering

  • Voucher question

Kimberly Albert, Center for Cognitive Medicine – Vanderbilt Psychiatry

  • Voucher question

2016 Jan 7

Chris Brown

  • I have data that was analyzed by Li Wang and have a few questions I was hoping to get answered.

Sudipa Sarkar, Endocrinology

  • Discussion of grant application to study statin exposure and non-alcoholic fatty liver disease

Topic revision: r1 - 18 Jan 2021, DalePlummer

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback