Data and Analysis for Clinical and Health Research Clinic Notes (2017)


2017 December 21

Paul Slocum, OB/GYN Fellow

  • "The project is evaluating two groups of patients. A group with intrinsic sphincter deficiency (isd - as defined by Uro dynamic parameters) and a group without isd. We are investigating to see if patients with ISD have an increased rate of urgency urinary incontinence resolution after a mid-urethra sling when compared to those without isd. All patients have mixed urinary incontinence (stress and urge inconinence together) and subsequently had a midurethral sling for the stress component. Would like to have some help with analyzing some data that I have collected. Data is clean and in stata already."
  • Patients filled out survey in the clinic before and after surgery. Primary outcome is urgency leading to urinary incontinence. ISD is diagnosed based on maximum urethral closure pressure (MUCP) <20 cm H2O or Valsalva leak-point pressure (VLPP) <60 cm H2O. Recommend collecting MUCP and VLPP as a continuous variables. Report descriptive statistics and plot histograms of MUCP and VLPP. Can use a logistic regression model for urgency leading to urinary incontinence; include MUCP and VLPP as independent variables.

Christian Okitondo, Psychiatry Staff

  • "We want to know whether the fact that we see altered thresholds in adults but not kids with ASD could be explained by kids with ASD having lower verbal IQ (which could lead to difficulty understanding instructions) or worse fine motor skills (which could lead to slower responding with the mouse or keyboard) than TD kids. The response variable is the threshold. The categorical variables are the Diagnosis and age group. The continuous variables are the Fine Motor and Verbal IQ. Note that the data fails the normality assumption. We need a way to test this that is robust to violations of normality."
  • Goal is to determine whether there are sensory differences in adults (n=50) and children (n=85) with ASD and IQ >=70 compared to controls. Completed 10 trials for each subject (5 warm and 5 cold). Planning to analyze the warm and cold trials separately and to throw out the highest and lowest measurements and average the middle three. Recommend using semi-parametric model (ex. proportional odds) rather than data transformation because more than two attempts at the transformation yields biased results (see On the Cost of Data Analysisby Faraway, 1992). The independent variables will be clinical group, age (continuous), verbal IQ, and fine motor ability.

2017 December 14

NO CLINIC: Department Meeting

2017 December 7

Maria Powell, Postdoctoral Fellow, Otolaryngology & Lea Sayce, Senior Research Specialist

  • "The purpose of this study is to investigate the effectiveness of the most common treatment approaches for phonotraumatic benign vocal fold lesions. Additionally, we are interested in the barriers to access and utilization of services from a tertiary care voice clinic. Our aims are: AIM I: We will evaluate the treatment history of patients with phonotraumatic lesions prior to referral to a tertiary care voice clinic in order to develop a comprehensive understanding of common community based treatment approaches, health care utilization, and associated costs of treatment in this patient population. AIM II: We will determine the quality of life impact of patients with phonotraumatic lesions at initial study center presentation (to establish baseline) and approximately 6 months later (to measure pre to post treatment specific changes in quality of life). AIM III: We will perform a multivariate analysis of factors influencing responses to: 1) voice therapy alone versus response to 2) surgical treatment."
  • "Questions to be addressed: 1) Are the proposed stats appropriate for our dataset? 2) We have collected data from 53 of the proposed 150 patients. We would like to determine if our study is adequately powered with this smaller number, and if not, what our target enrollment should be based on our preliminary data."
  • Recruited eligible patients from clinics at three different sites. Patients completed baseline and follow-up (~6 months) REDCap surveys. Patients were emailed three times prior to considering them lost to follow-up. Have complete data for 48 patients collected from surveys or EHR; primary outcome is VHI quality of life score. Recorded whether a patient uses their voice professionally (high voice user and/or singer). The treatment type and number of voice therapy sessions are determined by physician recommendation, so it is possible that patients were transferred to another treatment arm.
  • Recommend adding plots of the data rather than reporting only p-values. Can use EHR data to validate the survey data within patients. May be able to include more patients with a retrospective chart review. Recommend concluding that you have not been able to demonstrate a difference. Is there a way of quantifying therapy by the number of sessions? Would then be able to assess correlation between therapy and VHI score.

Shelby Ploucher, Neurology/Movement Disorders & Mallory Hacker (PI) & Max Turchan

  • "Our goal is to obtain a quote required for VICTR voucher request. The project is analyzing the relationship between active contact locations of DBS leads and clinical outcomes. We are requesting the voucher for statistical support in analyzing the data."
  • Treatment is deep brain stimulation in a distinct STN location compared to outside that location (3 other possible locations) in early stage Parkinson's Disease (PD) patients. Location is determined by physician recommendation. Primary outcome is 24-month UPDRS-III OFF motor score. Covariates in linear regression model include 24-month amplitude (stimulation voltage), 24-month LEDD (medications), 24-month active contact position, an interaction between amplitude and position, baseline age, baseline disease duration, and baseline motor score. Have enrolled 14 patients.
  • Due to small sample size, recommend focusing on data visualization rather than a linear regression model. There is not enough data to adequately estimate the effects. Can start with the number of locations, calculate the minimum distance between these locations and the actual DBS location for each patient, and use this summary variable to test whether the distance is different in the good responders compared to the poor responders. Will then be able to determine the median distance to a sweet spot. Can weight the patients based on the amplitude.
  • Recommend applying for 90-hour VICTR award ($5000) for biostatistics support.

2017 November 30

Giovanna Giannico, Pathology

  • "Study design in retrospective studies addressing outcome prediction based on a variable. In the case of predicting biochemical recurrence (BCR) in prostate cancer based on the expression of a marker, is it correct to design the study to include: 1) cases that have a minimum follow up that is arbitrarily designated; 2) enrich the cohort with patients that experienced the event (BCR)? Would it rather be correct to select patents consecutively to avoid introducing a bias?"
  • Collected pilot data on 78 patients. Primary outcome is time to BCR following treatment. Recommend taking a consecutive series of patients who meet the eligibility criteria (cross-sectional study) to determine whether presence of the biomarker is associated with morphology. Potential issue with patients who follow up with their local oncologist rather than VUMC after surgical procedure. Some patients will be censored but should not be removed from the study.
  • Selecting for patients that had a BCR event is a case-control study design; this design can only determine whether a signal exists. Can also incorporate matching cases to controls on select covariates (ex. stage, grade, treatment type). A cohort study is required to assess the predictive nature of the biomarker (using survival analysis). For a retrospective cohort study, it will be better to define the cohort to increase likelihood of follow-up (only use information that is known at cohort entry).

WITHDREW: Maria Powell, Postdoctoral Fellow, Otolaryngology

2017 November 16

Claire Kelsey, Injury Prevention Intern & Purnima Unni, MPH, CHES, Pediatric Trauma Injury Prevention Manager

  • "I have spent a large portion of my semester researching for and crafting a pilot program addressing firearm safety that is geared towards both children and their parents. This pilot is part of an NIH grant our department is applying for. As I believe Purnima has told you, an aspect of the parent piece includes a few questions to be incorporated in the routine PCP injury prevention questionnaire about the storage and safety practices in regard to firearms for those who may own them. These questions would ideally be asked in conjunction with other home safety questions including those related to car seats, various poisonous cleaners and drowning prevention. I would like to get feedback about how many participants we would need to recruit in order to get significant data if we are aiming for a 25%-30% response rate on the survey also taking into account that some parents we ask for permission to participate may say no."
  • Tier 1 will take place in PCP clinic with 3 groups (randomized to control with no education, active education with PCP verbally explaining statistics and communicating safe practices, or passive education with pamphlets and posters in waiting room). Survey questions include how many firearms, how stored, and where stored (score range 0-3). Will survey parents again 2-4 months after intervention. Concern with response bias based on survey topic. Recommend adding survey questions to assess effectiveness of education type. Can do a paired analysis within parent before and after intervention. Is there a positive response (ex. individual parent score goes from 3 to 2)? For the statistical analysis, can exclude parents who do not own firearms or who score 3 on the before survey. How many subjects are needed to have 80% power to determine a difference? What is largest sample size that is feasible? Preliminary data will yield information on the proportion of parents who own a gun and will benefit from education (intervenable). Should incorporate this information into the power calculation for control vs. passive and passive vs. active education, and can include a power curve in the grant proposal. Multiple PCP offices will recruit subjects.
  • Tier 2 will be an in-school curriculum and behavioral skills training. Schools will be selected based on various socioeconomic levels in Davidson County and a rural county. Could select schools in known counties of concern.
  • Recommend applying for 90-hour VICTR award ($5000) for biostatistics support.

2017 November 2

Adoma Manful, MPH Student

  • Aim is to determine which factors are associated with initiation of treatment for latent TB in a refugee population in Middle Tennessee. Refugees are required to complete a TB screen using a skin or blood test at Siloam Health Clinic within 90 days of arriving in U.S. (N = 1300), but some people do not attend the follow-up visit at the Metro Public Health Department (N = 748). Approximately 400 people actually initiated treatment. Primary hypothesis is that people who have more severe comorbidities (diabetes, hepatitis, HIV, etc.) are less likely to initiate treatment because they are focused on treating the more severe comorbidity. Also expect that people are less likely to initiate treatment after obtaining employment. Relocation or death is an issue with follow-up. Plan to match patient data across databases using first and last name and DOB. Able to rule out patients who develop active TB. Plan to use Charlson Comorbidity Index (range 0-6).
  • Recommend using the Elixhauser Index or another more comprehensive score rather than the CCI because of the arithmetic error by adding the hazard ratios. The database has unreliable application of ICD-9 codes used to calculate EI. May want to research the clinical trial for Johnson & Johnson and Janssen Pharmaceuticals treatment for multi-drug resistant TB.

2017 October 19

James Cook, Medical Student

  • 51 nodules, 45 patients, 148 features. 35 patients have cancer. There are 10 outcomes.
  • Goal is to build a predicted model. Issue is that need a model that will work for patients outside of this cohort and there is an insufficient amount of data. There is access to a validating data set with n=100 while likely majority lhave cancer.
  • May be able to answer a different question. 6 features across cancer that come up as statistically significant in the literature. It might be worthwhile to look at those 6 features rather than doing data reduction from the 148 features. A drawback is that effect size would need to be huge in order to pick up the difference in current data set.

Sarah Osmundson, Obstetrics and Gynecology

  • Attended clinic and Daniel Byrne recommended attend clinic. Working on K-23 application. Aim is to predict what women need in terms of opiod supply after released from hospital after C-section delivery. Surveyed 150 women, looked at demographics and characteristics and how much opiods were used. Need to make clear in write-up that will be doing calibration analysis - to what extent your model is really working when applied to new data. First, do internal calibration to assess how well model appears to be doing by splitting the data into a test and training data. If the internal calibration looks good, then validate it on external data.

Leah Acosta, Neurology/Cognitive Behavioral

  • "Analysis of implementation of a protocol to assess patients with normal pressure hydrocephalus (NPH), comparing pre-protocol to protocol patients, on different measures (e.g., percent who received a spinal tap, percent who reported improvement in certain symptoms). I want to make sure I did the analysis correctly, particularly since I’m dealing with small numbers (30 pre-protocol, 40 protocol) and some variables with outliers. I have my Stata commands and data is from REDCap, which I can send ahead of time if preferred."
  • Normal pressure hydrocephalus is a condition that results in decline in cognition, gait, and incontinence. Interested in assessing outcomes looking at patients who were assessed using protocol and patients who weren't using protocol. Main purpose of using the protocol is to undergo a spinal tap. In order to be considered in the population, the patient must be assessed by neurology and have one of gait decline, cognitive decline, incontinence. Currently using Wilcoxon test, t-test for continuous variables and Fisher test, chi-square test for categorical variables. One suggestion is to show graphically the degree of improvement for quantitative tests for cognitive improvement rather than "improved" versus "didn't improve". Boxplots may be more informative to display skewness and outliers. In write-up, make sure it's clear what questions aiming to answer before looking at results.

2017 October 12

Heidi Silver, Medicine/Gastroenterology

  • "To meet with William Dupont regarding reviewer comments on a submitted manuscript."

Jacob Fleming, Medical Student

  • "Retrospective database of TACE procedures; investigating outcome (post-embolization syndrome, readmission) by age (64 and younger vs. 65 and older). Want to clarify best way to address multivariate analysis."
  • TACE is a procedure for patients with liver cancer. Have 161 unique patients with 221 procedures. Outcomes of interest are readmission, post-embolization fever, nausea, pain, or portal vein thrombosis within 30 days of TACE. The Pugh score (continuous) was categorized as A vs. B or C. Recommend regression or principle components analysis to determine differences between the age groups. For example, a logistic regression (or probit) model for post-embolization nausea adjusting for age group and other covariates. Can use restricted cubic splines for variables with non-linear relationships. Due to small sample size, should limit model to most important covariates. An exploratory analysis can be done by plotting logit(P(disease)) vs. age. A good reference is Regression Modeling Strategies by Frank Harrell, Jr., PhD.

2017 October 5

Nicolas Baddour, Medical Student

  • "Our research question: Which factors are associated with higher risk of hospitalization for adult patients with CKD and which will be useful in the development of a hospitalization risk prediction model? My plan is to gather variables on a cohort we’ve defined from the RD, do univariate analysis on the variables to help filter significant ones for a model, then fit a Cox proportional hazard regression with our significant variables. I’m curious about potential confounders as well as limitations to the ways we are defining our cohorts."
  • Primary outcome is frequency of nephrology outpatient care. Limit sample to patients who established care prior to 2013. Time 0 is 1/1/2013. Covariates include hypertension, diabetes, etc. Concern with immortal time bias.
  • For secondary outcome as time to first hospitalization, may use a time-varying covariate Cox proportional hazards model. Time-varying covariates include age and laboratory measurements. If plan to use data collected at Time 0 to predict time to first hospitalization, a standard Cox PH model could be used.

Maureen Saint Georges, Pediatric Emergency Medicine Fellow

  • "We are starting a study looking at pediatric lacerations and randomizing them to sutures (control), Dermabond or Steri-Strips. While applying for a VICTR grant, they had some concerns about our sample size and so I wanted to go over our stats plan to make sure that it is appropriate."
  • Primary outcome is appearance score of scar (range 0-100, 100 is best). Sample size of 30 per arm, but reviewer is concerned that this calculation is for a superiority trial instead of a non-inferiority trial. Want confidence interval of the difference between the groups to have half the width of 7.5 given the standard deviation is 15 and average appearance score is 60; the sample size should be 64 for 80% power or 86 for 90% power. Could also conduct an adaptive trial and periodically analyze the data.

2017 September 28

Maxim Turchan, Health and Policy/Services Analyst II, Department of Neurology, Movement Disorders

  • "I am interested in determining whether or not a 3-way interaction exists between 2 categorical variables and 1 continuous variables using a linear mixed effects model with an autoregressive covariance structure (due to repeated measures with unequal number of longitudinal follow-up per subject) via the “nlme” package in R. I am still relatively new to both R and mixed effects modeling, and while I believe that I carried out the analysis correctly (both methodologically and philosophically), I would love to review my logic, code, and interpretation of the results with someone with significantly more experience."
  • Have N=95 subjects and a total of 370 observations. Outcome is quality of life score (range 0-100, 100 is poor). Want to assess three-way interaction between disease duration, genotype, and neurosurgical intervention (p = .01). Linear mixed model includes 3 main effects, 3 two-way interactions, and 1 three-way interaction.
  • Suspect that correlation structure is actually compound symmetry blended with AR1 when include random effects. Look at plot of standardized residuals vs. fitted values. Note residual standard deviations for models stratified by genotype were not very different. Small sample size is adequate to include only one main effect in the model. May have an influential point in the mutant with DBS group; plan to rerun model after excluding this data point and to compare results.
  • Recommend fitting one model and estimating the contrast within the model to get more stability. Output the predicted values (mean on the square root scale) and the design matrix that gives you those values. Then subtract those 2 vectors and use formula to calculate standard error of the contrast. Look at correlation structure using disease duration on raw scale. Can look into how to specify covariance structure with respect to time. May also consider a generalized least squares model.

2017 September 21

Courtney Zola, Infectious Diseases Clinical/Research Fellow

  • "Retrospective analysis in the Synthetic derivative comparing HIV-infected subjects with echocardiograms to matched controls with echocardiograms to determine rate of pulmonary hypertension and mortality. Preliminary data show an increased mortality rate with HIV and PH, but it seems to be out of proportion to what you would expect. Trying to assess if HIV is an independent risk factor for PH and then analyze contributing or mitigating factors (nadir lifetime CD4 count, viral load, treatment of HIV, etc)."
  • Out of 8,500 HIV-infected patients, 1,050 had an echocardiogram (25% have PH based on RVSP >40). In general, retention for patients undergoing HIV treatment is good. In the general population, there are approximately 30,000 eligible patients with an echocardiogram. Some patients have repeated echocardiograms (especially for cardiovascular concerns). Plan to use 1:2 or 1:4 matching on age, race, and sex. The SD uses the Tennessee Death Index to gather mortality data. May be able to categorize cause of death for patients with PH. Also want to know if HIV treatment impacts PH, to compare change in RVSP to change in viral load in HIV-infected patients, and to determine attributable risk for HIV and HIV with PH.
  • Bryan Shepherd may have additional information on HIV-related mortality resources. May want to collect data on patient workup, including right heart catheterization. Should start with a research question and plan study design and data collection around the question.

No Show: Justin Shinn, Otolaryngology Resident

  • "Retrospective review evaluating botox injections for patients with synkinesis. Primary assessment is for patient outcomes (improvements based on validated questionnaires) in addition to dosing information, muscles injected, dosing over time, etc. Predominantly need statistical assistance as well as help using Stata."

2017 September 14

No Show: Alice Hoyt, Medicine/Allergy

  • "The aims of this project are to determine the preparedness and knowledge of K-12 schools on the topics of asthma and food allergy, then to pilot an asthma telemedicine program."

Shelby Blalock, Pharmacy Resident

  • "My project is regarding gabapentin effect on opioid usage in orthopedic trauma patients. The purpose of this study is to evaluate the safety and efficacy of gabapentin use in patients with traumatic open fractures. I would like to attend the biostatistics clinic in order to get questions answered regarding statistical tests and analyses and overall methodology as I am interested in applying for a VICTR grant. I would greatly appreciate your advice in moving forward with this project."
  • Gabapentin is currently prescribed on a continuous dose based on physician preference. Goal is to see whether the continuous dose reduces need for opioids for pain management. Primary outcome is median morphine equivalent. The distribution of median morphine equivalent is not likely to be normal, so it could be treated as an ordinal variable in an ordinal logistic regression model. To avoid treatment by indication bias, recommend querying five experts to list queues they use to decide whether to prescribe gabapentin. Take all unique queues and make sure this data is collected and adjusted for in a multivariable analysis. Ideally want 10 patients per covariate in the ordinal logistic regression model. Sample size should be at least 200, but recommend gathering additional information on distribution of median morphine equivalent to calculate a more precise sample size.
  • Recommend applying for 90-hour VICTR award ($5000) for biostatistics support.

Theresa Chikopela, MSCI Student

  • "I am looking at endothelial dysfunction (ED), plasma nitric oxide and body fat mass in HIV infected individuals in Zambia. I would like to find out if the lean and the obese have increased ED compared to normal BMI HIV positive individuals. I would also like to find out if this increase in associated with the increased nitric oxide in these individuals as the patho-physiology states. I would like to address which design would best answer these questions and the statistical tests possible. I am also interested in verifying the calculation of sample size for this study."
  • ED is known to result in cardiovascular disease. This study will compare ED (ICAM1, VCAM1) and BMI. Plan to take a convenience sample of patients who visit clinic. If need to conserve resources, may want to set quotas for BMI ranges to avoid oversampling in any given BMI range. Can utilize regression and correlation on BMI and body fat mass (continuous variables). To calculate sample size for correlation between BMI and endothelial dysfunction, use graph to determine desired sample size for a specific margin of error (1/2 width of 95% CI) in estimating the correlation coefficient (ex. 0.1 yields 200 patients). See biostat.mc.vanderbilt.edu/ClinStat and locate graph in Biostatistics for Biomedical Researchby searching for keyword "precision".

Freeman Chabala, Biochemistry

  • For HIV patients, ART is the first line of treatment. Acute Kidney Injury status for new HIV patients in unknown. Goal is to develop model to predict AKI status at 3-month follow-up visit using baseline biomarkers and demographics (age, BMI, CT4 count, etc.) collected at the initial visit. Plan to enroll patients with normal serum creatinine (SCr) at initial visit. SCr is the primary outcome (continuous). Depending on distribution of SCr, may need to use ordinal logistic regression model. Use baseline SCr, biomarkers, age, BMI, and CT4 count to predict SCr at 3-month follow-up visit. Note that using baseline data will not provide much information on how patients respond to ART by 3 months because ART is initiated at baseline. Can also use landmark analysis for dynamic prediction. Start with every patient enrolled, take those who make it to 3-month visit and set this as new baseline. Then take those who make it to 6-month visit and set this as new baseline, etc. Dataset will have multiple rows for each patient (one per visit).

2017 September 7

Wendy Bottinor, Cardio-Oncology Fellow/MSCI

  • "We are looking for predictors of cardiovascular dysfunction in patients receiving VEGF inhibitors for treatment of renal cancer. I would like to look over the data set to make sure I am collecting the right information in the correct manner. I have an excel spreadsheet currently but I am planning to switch to a REDCap database that I have created but not put in production. I would also like to have a better understanding of how to analyze the data once it is collected."
  • Patients who receive VEGF inhibitors can also develop proteinuria. Laboratory data are collected at 3 time points (baseline, 2 and 4 weeks after starting treatment). Can use REDCap calendar to schedule lab draws for each patient. Goals are to assess how vascular function changes and baseline predictors of development of treatment side effects [hypertension (primary outcome) and proteinuria (continuous variable)]. This study design confounds treatment with temporal effects. Since the course of treatment is fairly long, it becomes hard to unravel the effects due to treatment versus the natural course of renal cancer. Patients may be prescribed an anti-hypertensive drug to treat hypertension or an ACE/ARB to treat proteinuria.
  • Recommend setting date of treatment start as Time 0. Before treatment is started, need to establish the extent to which each patient exhibits increasing SBP (or proteinuria) over time. Create spaghetti plot of each patient's SBP over time; change line color at time treatment is started. Decide whether to censor patient if SBP worsens. Calculate confidence band for the trend. Daily (or weekly) SBP measurements will provide more information before treatment is started. If patients are self-reporting SBP, it is useful if all standard machines are calibrated. Can build separate longitudinal models for degree of hypertension and proteinuria as a function of time. Can calculate margin of error in estimating mean SBP (ex. +/- 3-4 mmHg) using standard deviation of the first SBP measurement from each patient (similarly for proteinuria).

2017 September 5

Paul Slocum, OB-GYN

  • 182 women received sling surgery. About 30% of these had ISD. The goal is to see if the urgency component of their leakage post-surgery differs between those with and without ISD before operation.
  • Post operation measurements at 2wks, 6wks, 24wks, and up to 52wks.
  • Primary endpoint is receiving treatment for urgency incompetence (yes/no)
  • There will be important demographic data to consider and to adjust for, some of it may be missing (i.e., probably involves multiple imputation or some other method for handling missing data)
  • Analyses could include multivariable logistic regression.
  • Another analysis option would be to account for the length of follow-up using an offset and perform some version of multivariable Poisson regression.
  • There are several secondary endpoints that will also be considered for analyses. Most of them are similar yes/no outcomes; there is also some survey data pre- and post-operation that is of interest.
  • Data has already been collected and is in REDCap, so should be no to very limited data management. Just analyses.
  • Bryan Shepherd was statistician at biostat clinic.

2017 August 31

Sarah Osmundson, Obstetrics and Gynecology/Maternal Fetal Medicine

  • "I am writing an NIH career development award (K23). I need help with the proposed analysis plan for creating a clinical prediction tool using already collected data. The purpose of my award will be to learn about predictive modeling with the mentorship of Frank Harrell. However I need to put a basic overview in my application and provide a rough sample size calculation."
  • Study will look at opioid use after C-section in patients who are opioid naive and did not have major C-section complications. Have pilot data on how much of the opioid prescription was used within 2 weeks of hospital discharge; 22% of patients said they finished the prescription. Surveyed mothers' emotional wellbeing and whether pain needs were met. Want to reduce unused tablets, so primary outcome is amount of leftover opioid medication. Goal is to develop clinical prediction model to use as prescribing tool at hospital discharge.
  • May want to change outcome to whether pain needs were met. Recommend creating causal diagram to specify how variables are likely related. Since ibuprofen is standard of care, could give every woman a sealed pill bottle with 0-30 pills (randomized). For example, patient randomized to 10, 20, or 30 pills. Instruct patient only to break the seal if she has unmanageable pain, then determine proportion who broke the seal. There are additional pill bottle options to send wireless signal if one pill was taken from the bottle or to weigh itself automatically to know how many pills were removed.
  • Already use Meds to Beds Program. Do have option to provide an extra paper prescription that is good for 30 days and to follow-up to see if it was filled. Could randomize patients to group that does or does not receive an extra prescription. Another approach is to write a consensus guideline and gather data on whether pain needs were met.
  • If want to focus on local population, could reduce sample size to maximize resources to visit patients. It would also be more feasible for these patients to come in for another prescription. Structured counseling may be used to determine actual need. May want to change goal to predict which patients need the most pills.
  • If applying for a VICTR voucher, biostatistics assistance is very likely to take less than 90 hours so a voucher should work

WITHDREW: Brian Adkins, Pathology Resident

  • "Allo-antibodies against red cell antigens in pregnant women lead to poor fetal outcomes. As such OB/GYNs follow serial antibody titers. Traditional tube titration in slow and subjective. Automated gel titratrion is available but testing requires further understanding before clinical implantation. We are trying to figure out sample size and number of tests we should be running to determine clinical cut offs for antibody levels."

2017 August 17

David Kent, Otolaryngology

  • Planning to conduct a meta-analysis of several observational studies and one RCT that assessed changes in AHI and ODI before and after treatment for sleep apnea patients. Recommend obtaining subject-level data from published studies and conducting a paired t test or building a regression model. Recommend applying for 90-hour VICTR award ($5000) for biostatistics support.

Whitney Muhlestein, Medical Student

  • Goal is to determine whether medical student attitudes toward underserved populations change from beginning to end of one year working in a student-run clinic. Pre- and post-surveys were used to gather data on student attitudes and demographics; 30 out of 60 total students completed both the pre- and post-surveys. Gender and number of hours worked in the clinic were recorded for all students. Recommend looking at pre-survey empathy scores for students who did not complete the post-survey. May also predict post-survey completion using demographics and hours worked in the clinic.

2017 August 3

Quique Heurta, Clinical Fellow Allergy, Pulmonary & Critical Care Medicine

  • "Briefly, it’s a retrospective cohort of all hospital-acquired central line infections in adults at VUMC, and we are looking at factors associated with a poor outcome (specifically, the primary outcome is 60-day mortality or recurrence). We were mostly interested in whether prolonged antimicrobial treatment was associated with better outcomes. However, there’s an obvious issue here, which is that patients who die soon after diagnosis don’t have a chance to get a full course of antibiotics. I think this a competing risks problem, but I’m not sure exactly how to get around it. I have the exact dates of diagnosis, discharge/death, and start/stop dates for antibiotic treatment, so all the times can be calculated out."
  • Have ~400 subjects with measured outcomes. Collected information on blood cultures, immunosuppression status, and SOFA severity of illness score (range 0-30). Duration of antibiotics is not the same for all patients, and proportion of course completed should be included as a covariate in the model. Cannot confirm compliance for patients who were discharged on an antibiotic. Most patients are switched to oral antibiotics by discharge. Some patients have to stay on IV antibiotics after discharge; this is dependent upon the organism and provider recommendation. Mortality due to central line infection is included in the primary outcome. May want to consider restricting cohort to patients who completed course of antibiotics (and selected organisms with clear guidelines), then a logistic regression model would be appropriate. Time 0 can be day that antibiotic course was completed. Recommend including ICU stay and probability of death (because related to outcome) as covariates in the model. Watch time-dependent covariates, state transition model.

Maya Yiadom, MSCI Student

  • Currently enrolling patients in a clinical trial; the primary outcome is readmission within 30 days. Error has prevented enrollment of any Palliative Care or Geriatric patients. An interim analysis at the 50% enrollment time point revealed an actual readmission rate that is much lower than what was predicted in study design. Under what conditions can you change your expected detectable effect? Do not recommend extending enrollment for Palliative Care and Geriatric patients. Limited generalizability will be more accepted than changing current enrollment procedure. Need to make sure intervention is not watered down for patients from two medicine services when added in Palliative Care and Geriatric patients.

2017 July 27

Bianca Flores, Neuroscience Graduate Student

  • "I would like to double check if I am using the right formula to find sample size for my animal studies."
  • Sample size of mice (mutant and wild type) and number of neurons per mouse. Will use fluorescence to measure change in intracellular chloride or cell volume between Time 1 (baseline/isotonic) and Time 2 (intervention) and between Time 2 and Time 3 (baseline/isotonic). The three measurements will be collected 10 minutes apart. Also plan to collect time to return to baseline (Time 3 - Time 2, i.e. rate of unswelling); this is dependent upon highest point of response to intervention. Start with repeated measures ANOVA likelihood ratio test (global test) for whether there is a difference in response between the two mice groups at any time point. If there is a difference, then test individual differences at each time point. If there is no difference, do not test at each time point. Recommend doing a simulation study (incorporating standard errors for neurons and mice) and reporting the approximate power for a given number of mice and neurons based on limited resources. To compare time to task completion between the two mice groups, a sample size calculation based on the two-sample t test is appropriate.

2017 July 6

Lauren Lee Wray, Clinical Pharmacology Research Analyst

  • "I have some methodological questions about subsequent analyses following a latent class analysis. Here is a little background information: N = 1,580 and 4 clusters were derived with participants who have different trigger symptoms. All clusters have the same underlying medical issue under study. Now, we are adding SNPs to the analysis plan. I need help choosing analyses to run for this data."
  • All patients have atrial fibrillation (AF). The triggers are dichotomous variables for caffeine intake, sleep, etc. The 4 clusters are no trigger, vagal, adrenal, and combination; smallest cluster has 92 patients. Plan to test 30 SNPs to determine whether patients in the same cluster have similar genetic markers for AF.
  • Recommend looking at distribution of variables (used to determine clusters) within each cluster to verify clusters are homogeneous. Can run logistic regression model for a given trigger (ex. caffeine intake) with 30 SNPs as covariates; then rank SNPs based on correlation coefficient or chi-square statistic. Use additive model on logit scale with SNPs coded as 0, 1, or 2 to account for dominant and recessive genes. The SNPs could be co-expressed, so recommend doing a correlation analysis of SNPs to assess collinearity. Use variable clustering to visualize redundancies among variables. It will be better if the number of SNPs in the model can be reduced.

WITHDREW: Margaret Taylor, School of Nursing Melrose Faculty Practice

  • "I have a deadline to meet and think y'all could answer some questions. Honestly, it's really simple stuff, so much so you will probably laugh but to me it's a big deal."

2017 June 22

Ritu Banarjee, Pediatric Infectious Disease

  • We are planning a prospective observational cohort study of procalcitonin levels (a hormone) and its kinetics in infants and children, and would greatly appreciate assistance in determining the number of subjects needed to get accurate ROC curves.
  • Meeting notes: Blood test used primarily for adults. Interested in how tests perform in children. Infants through 18 year-olds. Interested in identifying cut-offs for sensitivity, specificity. Primary interest is negative predictive value. Can compare to microbiology culture. Will enroll patients with infection to get both blood test and culture. Localized infection location. Test in adults discrimination 81-94%
  • If hypothesized gold-standard defined infection rate is 10%, can produce 95% confidence intervals for estimation of predictive value for comparison tests. Using estimates of prevalence, sensitivity, specificity, positive predictive value and negative value, a sample size can be calculated based on the confidence interval desired. Email william.dupont@vanderbilt.edu for calculations. Recommend applying for 90-hour VICTR award ($5000) for biostatistics support.

2017 June 15

Daniel Markwalter, Center for Biomedical Ethics and Society

  • We are conducting a study to better understand family perceptions of transitions in care in the pediatric critical care unit as well facilitators and obstacles to family preparedness for transitions in this setting. We are using a grounded theory methodology and have data describing the percentages of certain subgroups that reference particular themes. We want to learn about the methods for comparing proportions between groups. For instance, if one group of 20 people reference a theme 85% of the time and a separate group of 25 people reference the theme 20% of the time, can we say these are statistically different?
  • Meeting notes: 4 main transitions in care. Read transcripts to develop complete collection of themes and grouping for words referenced. Consistent interview structure-conversational, might be slight nuances between interviews. Same process repeated with parent and physician. Want to compare percentage that parents identification of theme to physician identifying same theme.
  • May not be necessary to specify that groups are statistically similar or different. Description may be sufficient. Recommend graphical display. Can produce confidence bounds to proportion estimates. (Binomial confidence interval estimator-Wilson) Hypothesis test null assumes that physicians and parents should be exactly aligned and they may not be.

Ian Setliff, Pathology, Microbiology & Immunology

  • We have many features (100) of the antibody repertoire of each of several donors. Some of these are independent of each other, while others are not. We have a question of how to normalize our data correctly for subsequent analysis.
  • Meeting notes: Longitudinal dataset of 6 donors.

2017 June 1

Zachary Cox, Cardiology

  • "We need help determining the number of patients to enroll to have the power to determine a difference in our primary outcome in a randomized, prospective, parallel-design proof-of-concept clinical trial. We are comparing inhaled milrinone (investigational arm) to intravenous milrinone (control arm) on the primary outcome of cardiovascular hemodynamic variables."
  • Will include hospitalized patients with advanced heart failure and undergoing evaluation for heart transplant. Non-inferiority in outcome which is continuous hemodynamic output. A 20% change from baseline is considered standard by Medicare.
  • Recommend adjusting for baseline hemodynamic output in model to increase power. However, if correlation is < 0.5 between baseline and 72-hour measurements, then adjusting for baseline will just add noise. To calculate power, can utilize change from baseline. To estimate precision, need estimate of standard deviation of within subject variance at baseline. Recommend recruiting 20 patients per arm for a pilot study that can provide additional information to plan a large-scale trial.

Dan Ayers, Biostatistics

  • We are conducting a sensitivity analysis on a prediction model, adding patient outcomes from patients excluded from the original model build and comparing c-index and calibration slopes for goodness-of-fit. Our current process is,

    1. The sensitivity analysis will be conducted on the 2 sensitivity sets described below.
    1. Parameters of the full model will not be re-estimated.
    1. 20 datasets will be imputed using the new outcome set and all x-variables as in the original analysis. For sensitivity set 1, all new outcomes for the 65 added patients will equal 1. For sensitivity set 2, all new outcomes for the 65 added patients will equal 0.
    1. Prediction sets, using the original parameters of the full models, will be computed for each of the 20 datasets.
    1. The average prediction per patient will summarize that patients prediction.
    1. The average prediction set will be compared to the observed data to derive a calibration curve. The normal parameters, Dxy, c-index, etc will be used to summarize the goodness of fit.

  • Capture reasons physician did not order MRI to determine whether MRI data are missing at random. If data are not MAR, will need to do multiple imputation or exclude these observations. Use continuous probability (DOC) rather than categorizing as positive or negative. Can report median DOC stratified by MRI status and proportion MRI+ in DOC deciles.

Oscar Ayala, Graduate Student Biomedical Engineering

  • "I am working with Dr. Anita Mahadevan-Jansen. I am currently analyzing spectral data collected using Raman spectroscopy and trying to implement data reduction techniques to ultimately classify my results."
  • Goal to classify bacteria. Have 6 types of bacteria; some are wild type, and others are single gene mutations. Record number of photons (intensity) for 917 bacteria features. Collected multiple measurements from multiple colonies for a total of 162 intensity plots. Recommend grouping bacteria features into 50 intervals and summarizing average height in each interval. May consider PCA with penalization (smooth or non-smooth which penalizes some loadings to zero) and variable clustering analysis. Separately, could fit spline function with 30 knots and associate with classification. Preferable to use bootstrapping rather than cross-validation.

2017 May 25

Tanya Marvi, Medical Student

  • "My project is looking at platelet count in patients with musculoskeletal infection. I am working in stata and would like some assistance with fitting splines and building a predictive model for using platelets and crp to predict severity of infection. My goal is to build a model using an interaction term for interpreting the platelet count in the context of the CRP. Additionally, I would like help with fitting a proportional odds model with robust standard errors for platelet count overtime. If it is possible to work with someone familiar with Stata that would be very helpful."
  • Have 150-250 previously healthy pediatric patients who came to ED, had orthopedic consult, and were admitted to hospital. Diagnosis can be inflammation (excluded from study), localized bacterial infection in joint (septic joint), or infection in muscle. Infection is confirmed with two positive blood cultures. Goal to use CRP biomarker (which increases during infection) and blood platelet level (BPL, which decreases during infection but rebounds over time) to predict outcome. BPL can be confounded with treatment, so antibiotic administration was documented. The outcomes are death (but 0 patients died), complications (n=15 patients), LOS in hours (surrogate for infection severity), and Peds charge weight (standardized cost which is indicative of infection severity). Make sure there is a temporal relationship between outcome and predictors.
  • Recommend using response feature analysis which uses a biologically meaningful summary of data for each patient (ex. area under the curve or slope coefficient). This removes the correlation in the data. With independent data, a simple fixed effects analysis can be done. For example, use hospitalization Day 5 CRP and BPL to predict LOS. Include absolute value of change in BPL from baseline in model. Can also do a landmark analysis; take patients who survive to time t and look at relative importance of BPL. Remember to take logarithm of ratios and show raw spaghetti plots (specify alpha saturation or use grayscale based on LOS). Stata 'mkspline' program can use linear or cubic splines. See William Dupont's book Statistical Modeling for Biomedical Researchers(2009). It is also possible to use loess regression and to calculate confidence intervals with bootstrapping.

2017 May 18

Rita Pfeiffer, Graduate Student Department of Hearing and Speech Sciences/Program for Music, Mind, and Society

  • "I am investigating the feasibility of using a frame difference method to analyze movement interactions during social interactions between adults and preschoolers (who present with and without Autism). We obtained a segment of a social communication assessment, in which the experimenter points to posters to bid the child to jointly attend. After down sampling the video to 10 fps, we used a frame difference method to determine the number of pixel changes per video frame, thereby indicating the amount of movement. The output results in two time series: one capturing the movement from the experimenter, and the other capturing the child's movement. We are utilizing cross-correlation and coherence analyses to determine the relationship between the relationship between the two time series."
  • Have enrolled 13 children aged 2-3 years who have or have not been diagnosed with autism. Recorded at most four 30-second videos per child. Planning time series analysis of movement data for experimenter and child; want to look at relationship between the two and potential differences in children with autism. Should we use cross-correlation or cross-covariance values to compare our data set? How can we best use our coherence analysis to compare our data sets?
  • How does one determine the appropriate window size to do our analyses (for FFT, coherence, etc)? Recommend using two-fold cross-validation. Split data into 2 halves, fit models with different tuning parameters (window size in this case) with first half, and see how well the model predicts the data in the second half (e.g. calculate observed - predicted values). Choose window size that gives the best performance for primary analysis. Can report sensitivity analyses using different window sizes to explain that primary analysis results are not entirely dependent on choice of window size.
  • May contact Hakmook Kang to discuss additional time series or functional data analysis questions during a Tuesday clinic.

2017 May 11

Vivian Kawai, Medicine/Clinical Pharmacology Research Assistant

  • "We are conducting a candidate gene study for gestational weight gain in BioVU. Unfortunately we have several patients with missing prepregnancy weight and will like to see if feasible to impute this information using pregnancy weights at different gestational ages. If so, what weights are needed for this."
  • Defined gestational diabetes using criteria or previous diagnosis in medical record. Need pre-pregnancy and pre-delivery weights to calculate gestational weight gain. 10% of controls and 25% of cases (with gestational diabetes) are missing pre-pregnancy weight. Matched cases and controls on age and number of previous pregnancies. Collected repeated weight measurements during pregnancy, but do not have information on baby's gender or birth weight. Plan to calculate risk score for gestational diabetes.
  • Recommend using multiple imputation to generate values for missing pre-pregnancy weights. Starting June 1st, can apply for VICTR voucher for 90 hours of biostatistics support.

Jiancong Liang, Pathology, Microbiology and Immunology

  • Have 30 cases of papillary thyroid carcinoma with classic variant. This cancer has a 100% cure rate. Noted an unusually high proportion of cases (40%) with hashimoto thyroiditis. Endpoints were collected at time of cancer diagnosis and include tumor size, tumor stage, tumor multiplicity, and metastasis. Lymph node status and treatment response were collected later.
  • Already used Fisher's exact test to compare dichotomized tumor size between hashimoto groups. Recommend not dichotomizing tumor size and using Wilcoxon rank-sum test to compare continuous variable between hashimoto groups. Given small sample size, survival analysis will not be informative. Can generate Kaplan-Meier curves to observe trends. May want to use logistic regression model to predict hashimoto status. Contact William Dupont for additional support.

2017 April 20

Bhumika Piya, PhD Student Sociology

  • "I am examining the relationship between arsenic content in drinking water and body weight status (underweight, healthy weight, and overweight) using multinomial logistic regression. Since I use survey data, I have some questions regarding Stata's svy function and robust standard errors (esp. linearized standard errors and how that affects statistical significance). P.S. My mentor is no longer at Vanderbilt and won't be able to attend the session with me."
  • 2500 household surveys conducted in 8 communities. Body weight status was self-reported on survey. Arsenic content and salinity were measured in each community several months after surveys were completed (mean of several measurements dichotomized into high/low). Other covariates include demographics (age, sex, religion, health status), environmental stress, and community economic development index.
  • Do not recommend using sampling methodology. Can use observational clinical study analysis to determine effect of arsenic on body weight status. Arsenic should be a continuous variable in the model, rather than high/low level. Look at sum of squares for arsenic when do or do not include community in the model. Do not include community variable if include continuous community characteristic (e.g. economic development index) variables in model. When reporting the final results, if the communities are no different other than arsenic level, then this is the effect of arsenic on body weight status. Another option is to fit two ordinal models, one for BMI amount lower than ideal BMI and another for BMI amount higher than ideal BMI.

2017 March 30

Hannah Dietrich, Student

  • "We are part of a QI project gathering data on health literacy levels in the pediatric general surgery clinic. We have finished gathering data for this project, and would like to focus our questions mainly on graphics and statistical analyses for our data. We are primarily examining correlations between health literacy scores and other factors such as income, no-shows rates, etc."
  • Health literacy score measured on a 15-point scale (range 3-15). Have collected 60 surveys. Survey was validated at the VA but not in this clinic population. Do not have gold standard data to compare with survey data. If a new patient did not show, then could not gather survey data.
  • Goal to assess relationship between health literacy score and other patient characteristics (time between VUMC system entry and surgery, clinic no-show status). Can create histogram for health literacy score. Can use Wilcoxon rank-sum test (2 groups) or Kruskal-Wallis test (3 groups) to compare health literacy scores among groups. Recommend using a logistic regression model to predict clinic no-show status (outcome) using health literacy score and adjusting for patient characteristics (race, etc.). Need to decide on a desired level of precision to calculate required sample size.

Celestine Wanjalla, Infectious Diseases Postdoctoral Fellow

  • "Analysis's of cross-reactive T cells in human PBMCs. Calculation of sample no and power and best statistical analysis for my first two aims."
  • Looking at T-cell (CD8alpha) responses to different peptides (tetramer NLV, 2B9, 2A12) in 7 subjects with confirmed CMV infection. Planning to submit grant proposal. Can include CMV negative subjects as negative controls. Sample size will depend on costs and desired level of precision; including more than 7 subjects (ex. 20 in each group) will increase statistical power. Can include graph of power curve in proposal.

2017 March 23

Kristy Broman, Surgery Resident

  • "The question I am trying to answer is whether there is a way to compare two incidence ratio. I am using the SEER database and SEER Stat which has built in modules for calculating age standardized incidence ratio for specific events. The output I get is the total N, the total event number, and the standardized incidence ratio. This is essentially the ratio of observed to expected, but I cannot know how the expected is determined (this is a "black box" within the module. So I want to know if there is a way to essentially compare the already calculated standardized incidence ratios."
  • In cohort of patients with previous colon cancer, the outcome of interest is subsequent GI cancer. Want to know if SIR is significant. Module does not provide standard error or confidence interval for SIR. Reviewed formulas to calculate E* using SIR and given D along with the confidence interval.

Matthew Duvernay, Pharmacology

  • "I am initiating a pilot study to measure DNA methylation at the F2RL3 gene site in whole human blood samples and correlate this with blood cell function. There is a body of data comparing the levels of methylation in smokers and non-smokers at the particular gene that I am interested in. I would like to learn about how to estimate the optimal sample size needed in each of these two groups based on the variability in the published data sets."
  • Published data demonstrated that expression levels of receptor can change based on methylation status of gene. Hypomethylation is highly correlated with smoking. There is variation in methylation among smokers. Planning to look at platelets and monocytes from smokers and non-smokers. Evaluators will need to be blinded to smoking status. Expect different staining levels within a given cell. Classify cells as positive or negative and calculate proportion that are positive. Need to establish protocol for assessing intensity of expression.
  • Recommend using standard errors from published data to calculate sample size. Include multiple scenarios with a range of variances in proposal. Can also use possible number of samples given specific grant amount to conduct power analysis. With new pilot data, can calculate correlation between methylation and gene expression. Planning to apply for VICTR voucher in the future.

2017 March 16

WITHDREW: Tiffany Sarell, Clinical Pharmacist

2017 March 9

Dillon O'Neill, Medical Student

  • "I have a series of measurements made by 3 different readers. The measurements are a contiguous variable. Each reader read essentially all of the images in the dataset. Each reader also re-read 30 of the measurements 2 times. Need guidance as to most appropriate statistic for inter- and intra- observer reliability."
  • Retrospective look at pre and post angles using films from SKFE patients. Measure increased rate of vascular necrosis in hips that were manipulated vs. hips that did not move at all. Determination of stable vs. unstable SKFE uses 15 degree difference between pre and post angles as the cutoff point. The cutoff point was determined a priori.
  • Recommend looking at correlation of deltas (difference between pre and post angles) between the 3 readers. High correlation between readers indicates consistency among readers. Krippendorff's alpha (Stata KRIPPALPHA) allows for multiple readers who do not have to rate every patient; values close to -1 or 1 indicate consistency. Goal to determine AVN rates in stable and unstable SKFE patients. Only 3 patients received AVN. Recommend creating plots of data (ex. Bland-Altman-type plot for observed deltas from each reader (y) vs. mean delta (x)). Can also plot individual deltas. Should not compare observed rate to published rate because confidence interval for observed rate is so wide.

2017 March 2

Miguel Cuj, Graduate Student Latin American Studies

  • "My research project is about cross-sectional survey, in rural area of Guatemala about health status. 1) How compute the sample size in a target group in three small communities with only two selection criteria a) beneficiary of social program and b) older population >50 years old. 2) Which analysis statistic could you suggest beyond descriptive statistic? 3) Some issues about use of SP in clinical trial."
  • Concerns with chronic diseases in older population (diabetes, heart disease, etc.). Planning to collect survey data on demographics and health status opinions about armed conflict. Have a list of 150 potential subjects who meet eligibility criteria and their contact information. From this sampling frame, can select sample and randomize order of approaching subjects. For a qualitative study, can continue to collect surveys until reach saturation point for information. To calculate sample size based on desired margin of error for estimate of proportion, sample size equals 1/e^2 where e is the error. Similarly, can calculate expected standard errors based on sample size you can reach given limited resources. Do not need a power calculation because you are not testing a specific hypothesis. Can utilize information gathered from surveys to identify potential questions for focus groups.

2017 February 23

Aaditi Naik, Undergraduate Student

  • I am an Undergraduate Research Assistant in Dr. David Charles's Movement Disorders lab in the VUMC Neurology Department. Our project aims to identify the prevalence of four previously-identified non-motor markers – spatial discrimination threshold, temporal discrimination threshold, vibration-induced illusion of movement, and kinesthesia – in a population of cervical dystonia patients, unaffected family members, and healthy volunteers (control group). Consenting participants will receive a neurological examination performed by a movement disorders neurologist, followed by an assessment of the four non-motor symptoms. Through analysis of the concurrence of the non-motor features across the three groups of participants, we hope to identify a combination of non-motor symptoms which is more prevalent in the cervical dystonia group, and therefore may be indicative of disease development. Our specific questions are:

    1. What is the appropriate statistical test to assess the association between multiple non-motor features (2, 3, or 4 features) and participant group (patient, non-affected family member, healthy volunteer)?

    1. Based upon the suggested analysis for point 1, what would be an appropriate sample size and power? Expect to enroll 80 subjects per group.

    1. Would Kruskal Wallis (ordinal/interval)/Chi-square (categorical)/Kaplan-Meier (survival) be appropriate statistical tests to assess the difference between three groups for an individual variable identified during the study, e.g. age of onset, gender, income, etc.?

    1. Is a logistical regression model the best statistical test to assess the association of non-motor features with cervical dystonia, based on prevalence rates in the participant groups (patient, non-affected family member, healthy volunteer)? If so, which type of logistical regression would be most appropriate? Recommend using multinomial regression for all three groups or binary logistic regression (looking at two groups at a time) to predict group membership. Given two categories for outcome, take number of subjects in smaller group and divide by 15. This is the number of variables that can be included in the model. Can use bootstrapping to assess stability of model.

    1. What are the best methods through which to report qualitative data related to clinical features of sensory tricks, such as the types of sensory tricks used, frequency of use, effectiveness of use, etc.? Descriptive statistics on any number of clinical features.

Jessica Grahl, Pharmacy Resident

  • "Antimicrobials and Delirium Questions we were asked to answer: 1) Do we have the culture collected at time of giving antibiotic so we might differentiate whether the infection or the antibiotic caused delirium? 2) Do we have the CAM assessment in continuous scale not just delirium yes/no? Additionally we would like to address the statistical analysis plan associated with this project. Dr. Mayur Patel and Joanna Stollings will be accompanying me."
  • CAM ICU used to determine delirium status (positive, negative, or unable to assess) twice daily. Total number of patients is 521; 150 patients did not receive an antimicrobial. Defined 3 antimicrobial groups. In ICU, sepsis status based on SIRS criteria was collected daily. Other infection status unknown.
  • Difficult to interpret results if large number of patients die and are removed from the analysis. Recommending including death and coma as outcome categories. Specify time windows for primary analysis (ex. 2 days antimicrobial use, 3 days delirium assessment). Is it necessary to have a blank-out period between antimicrobial administration and delirium outcome assessment? Can analyze outcome in 12-hour sliding time windows.

2017 February 9

Rany Octaria, MPH Student

  • "I am currently finishing my thesis conducting Social Network of Hospital Patient Sharing in TN to identify at-risk facility for multidrug resistant organism spread. My biostatistics advisor, Yuwei Zhu, recommended me to use your service to get input regarding the statistics I can use for my social network analysis."
  • For Table one, you may want to conduct ERGMs analysis (binary network or using the method proposed in paper appeared in the electronic journal of statistics for analyzing count network) instead of linear regression. For Figure 2, it is possible to run a community detection algorithm based on either modularity or psueduo likelihood to see if the communities detected by such automated way matches the geo distance of these hospitals.
  • Want to identify which facilities are highly connected and why certain facilities are highly influential. Preliminary analysis used UCInet software. Recommend using StatNet software to run ERGM and using permutation tests. An R software package is available to use maximum likelihood for estimates. Recommend doing a sensitivity analysis with different thresholds (ex. 0, 1, >1). Data likely follow a Poisson or zero-inflated distribution; this can be verified by plotting a histogram. Binary network has degree (count of interactions); weight facilities by rank.

Ricardo Lugo, Cardiology Fellow

  • "Comparing Serum BNP level with 1) myocardial scar and 2) Ventricular Tachycardia recurrence rates after catheter ablation. I have a basic familiarity with R and am requesting assistance with 1) ensure I am using the appropriate analysis methods and 2) creating descriptive table (specifically patient characteristics stratified by BNP tertiles)."
  • For a patient to have VT, they have to have something wrong with their heart. All patients in the study have some form of cardiomyopathy. The criteria for performing CA has been refined, and 1-3 per week are performed currently. Blood samples were collected at the time of CA procedure to measure BNP levels. There are a total of 59 patients and 30 events. The goal is to use biomarkers to identify which patients are good candidates for CA.
  • Preliminary analysis used Cox PH model for time to recurrent VT. Collected data on patient characteristics (comorbidities) and characteristics of CA procedure. Created BNP tertials for Table 1, each category has ~20 patients. LVEF and Endocardial Area were shown to be different among the tertials. Recommend reporting descriptive statistics for overall cohort; do not need to categorize BNP. Can still compare BNP levels between gender groups.
  • Ran Cox PH and multivariable Cox PH models with log(BNP). Given 30 events, you have 1-2 degrees of freedom. Can try non-linear term for BMI (ex. restricted cubic spline with 3-5 knots which will use 2-4 degrees of freedom). Plot overall Kaplan-Meier curve by log(BNP). Plot survival from multivariable model [ex. plot(predict(f, time=c(30, 60, 90)))]. Report concordance index for model. Crossvalidation using bootstrapping may not perform well with only 59 patients.
  • May be covered by Cardiology collaboration plan with Meng Xu and Shi Huang.

2017 February 2

Miriam Lense, Otolaryngology

  • "At my previous visit, it was suggested I get information about the reliability of the measure I am using for a power analysis for growth curve analyses."
  • Collecting parent-reported vocabulary measure at 4 time points and global language and communication assessment and social entrainment measurement at 9 months. Test-retest reliability of measure at 12 months is low (0.61) but higher at 18m (~0.87).
  • Recommend starting with PS software (http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize), but it assumes repeated measures are not correlated. Software produces power curves and text for grant proposal; sample size calculations are optimistic. Another option to run simulations of time series data in Stata or R to determine appropriate sample size.

Joseph Kuebker, Endourology Fellow

  • "We are going to retrospectively look at two groups of patients who underwent a ureteral stent for a ureteral stone and subsequent ureteoscopy. Our aims are to identify the rate of stone passage and thus subsequent negative ureteroscopy and predictors of this event."
  • In 89-92% of cases, the stone does not pass after the stent is placed, and a second surgery is required. Outcome of interest is result of ureteoscopy (positive vs. negative). Also plan to collect BMI, stone size, time between stent and URS, sex, side, use of alpha blocker, size of stent (2-3 categories), and stone location.
  • Recommend calculating sample size using PS software (http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize). Select the Dichotomous tab: output=power, design=independent & case control & odds ratio & uncorrected chi-square test, alpha=.05 (Type I error), n=100, p_0=.11 (event rate in controls), m=1 (ratio cases to controls), psi=2; yields power=.407. Software produces power curves and text for grant proposal.
  • Second project looking at radiation dose from KUB plain film x-rays to evaluate kidney stone disease. Generally have 2 x-rays during evaluation. Imaging techniques have changed and can lead to dose creep. Techs sometimes use higher radiation dose for better exposure and interpretability of x-rays. Goal to compare radiation dose between historical plain film x-rays (average 0.7-0.8 mSv) and digital x-rays. Plan to collect age, gender, BMI, and diameter. Recommend a meta analysis and calculation of confidence intervals.

2017 January 26

Drs. Andrew Link & Kristen Hoek, Pathology, Microbiology, Immun

  • VR24294 VICTR 010917 "In an influenza vaccine clinical trial, we discovered a large number of both expected and unexpected differentially expressed human genes in primary innate immune cells 1 day after vaccination1. A group of these genes encode RNA-binding proteins and lncRNAs. Numerous studies have shown that these two classes of genes function as posttranscriptional regulators of gene expression2,3. The prolonged expression of pro-inflammatory genes can cause host tissue damage and has been implicated as the cause of autoimmune and other human diseases4. As a consequence, the innate response is tightly regulated and typically short-lived. We hypothesize that the RNA-binding protein and lncRNA genes expressed in human innate immune cells 1 day after stimulation function as either positive or negative posttranscriptional regulators of the innate response. Using human innate cell line models combined with functional and mechanistic experiments, this proposal experimentally tests the ability of candidate RNA-binding protein and lncRNAgenes to function as early innate response genes that posttranscriptionally regulate and modulate the human innate response."
  • Collected 6 immune types at 1, 3, 7, and 28 days post-vaccination from 35 subjects given placebo or influenza vaccine. The 840 samples were analyzed by a third party. Received a list of 80 genes likely involved in innate response and regulating the immune response. Next study will shut down one gene at a time. Will look at those 80 genes that are arrayed on a plate. Assessing each gene is an individual experiment to determine whether gene expression goes up or down. Plan to do this in triplicate and calculate false discovery rate.
  • When you calculate the false discovery rate when comparing two subjects, the power is equal to the alpha (ex. 0.05). What is the level of confidence that a gene labeled as a loser is actually a winner? Recommend analyzing all genes together and ranking genes in order of strength of association. Bootstrap to calculate false negative rate using Effron's method.

Maya Yiadom

  • Outcome: time to readmission. Primary analysis: intention to treat analysis for not intend vs. intend to call. Secondary analyses: controls vs. reached vs. not reached.
  • Initial power calculation using PS software with alpha = 0.005 for interim analysis and alpha = 0.048 for final analysis, baseline time to readmission of 11.51 days, 90% power, and minimum detection effect of 2 days (or 1 day) yielded 1524 and 3048 subjects, respectively. Another sample size calculation assuming a conservative 2% difference yielded 4344 patients per arm enrolled over 1.5 years for 80% power or 5805 patients per arm enrolled over 2 years for 90% power.
  • Recommend adding more looks to analysis. What is the largest sample size that could be enrolled within, say, 1 year? Collect this pilot data then plan a larger study. Another option to continue enrolling patients until research question is answered. You should study readmissions at 90 days for better power. If there is a difference at 90 days, then there is also a difference at 30 days.

2017 January 19

Kathryn McCrystal Dahir, Medicine

  • "In essence this is a study where we examined rare variants in the ALPL gene that were available in BioVU and looked for expected associations as well as a new associations which were discovered via PheWAS. We had 180 cases plus matched controls which were manually reviewed by two reviewers that were blinded. The manual chart review of the patients record in the SD is in Excel. Additionally we have some very basic statistics in R and summarized here. We would appreciate some advice from bio-stats on better visualization/representation of the data."
  • Location of point mutation in ALPL gene is important in function of hypophosphatasia (HPP). Interested in heterozygous autosomal recessive mutation. New treatment recombinant alkaline phosphatase for pediatric cases. Found 13 rare ALPL variants in BioVU; excluded variant with 11% occurrence because too common. In BioVU, searched for oral surgery clinic visits, dental visits, bone fracture x-rays, and hysterectomies.
  • Want to determine if some SNPs are more pathogenic than others. Recommend using a logistic regression model and reporting odds ratios for disease for each SNP (or compound heterozygous models) along with the confidence intervals.
  • Have funding for statistical support. Contact Frank Harrell, PhD regarding collaboration options.

Leon Scott, Clinical Orthopaedics & Rehabilitation

  • Recommend collecting repeated measures of force from each subject (multiple steps). Calculate average maximum force for each subject. Then calculate confidence intervals for the two devices (want intervals to overlap). To prove equivalence, need to establish what is the likely difference between the two devices (want difference to be clinically trivial). Define apriori what is clinically significant difference. Can use inter-rater reliability to estimate percentage of variation that is due to the difference in steps; report interclass correlation coefficient. Available R software package 'CCRM'. Also recommend generating Bland-Altman plots for each subject or combining data across all subjects. Ideally, the plot will look like a straight line; otherwise, it may show where the wearable sensor is not producing a good approximation of the force as measured by the laboratory instrument.

2017 January 5

Sandip Chaugai, Clinical Pharmacology

  • "I only have a couple of quick questions on meta-regression analysis of calcium channel blockers in hypertension. I spoke to Daniel Byrne, and he suggested me to attend the clinic."
  • Have long-, intermediate-, and short-acting drug classes; outcomes include mortality and heart failure. Each study randomized and matched patients prior to estimating odds ratios. For each study, need to verify on which covariates patients were matched (ex. diabetes). Can only combine subgroups of studies that adjusted for same covariates in estimation of odds ratio. Potential issue given that the control drug was different in each of the studies. Also concerned with validity of comparing odds ratios among 3 drug classes because the drugs are used in different populations.
  • Need to look at confidence intervals from fixed and random effects models. Recommend including additional tick marks and labels on x-axis (log odds scale). Note cannot make patient-level conclusions from population-level data (i.e. ecological fallacy). Number of covariates in meta-regression model will be limited by number of studies that can be combined. Recommend deciding which predictors are most important and including them in meta-regression model. The standard errors will reflect the number of studies that were combined (larger SE's with fewer studies). Can assess stability of model by looking at how results change when remove 1 or 2 of the studies from meta-regression model.

Topic revision: r2 - 18 Dec 2023, IneSohn
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback