Data and Analysis for Clinical and Health Research Clinic Notes (2018)

2018 December 20

Stephen Halliday, Allergy, Pulmonary, Critical Care Medicine

  • We have measured six-minute walk distances for 200+ healthy participants ages 18-50. We would like to use this data to develop reference equations for predicted six-minute walk distance, incorporating known predictors (height, BMI, gender, age) while also incorporating an objective measure of effort, such as peak heart rate, % of max heart rate achieved, or change in heart rate from start to finish. Mentor confirmed, VICTR voucher.
  • Have a subgroup of 50 patients who who repeated the test.
  • Recommend not including post-test heart rate in the model because this is a secondary outcome. Can calculate prediction interval where width is related to how much patients disagree with each other. Can include pre-treatment/prior walking distance as a predictor. Recommend including log(height) and log(weight) predictors together in the model.
  • Recommend applying for VICTR Award for biostatistics support (90 hours).

Sarah Diehl, Hearing and Speech Sciences

  • The current study aims to characterize the cognitive communication deficits, particularly those which impact discourse abilities, in people with HD. Individuals will be characterized at pre-manifest stages (genetically confirmed but prior to onset of motor symptoms), early disease stage, and middle disease stages. Individuals in the late stage of HD are often unintelligible or nonverbal, therefore, this advanced stage will not be included in this study. Mentor confirmed, abstract.
  • Research Questions: 1) What are the patient and close other reported discourse deficits associated with Huntington’s disease? 2) What is the relationship between patient and close other report of discourse deficits? 3) What is the influence of clinical characteristics (e.g., motor score, Total Functional Capacity score, disease stage) on the presentation of discourse deficits? 4) What is the relationship of our findings compared to neurotypical control data (other ratings compared)?
  • Scale to compare responses from patient and caregiver is on a 4-point scale.
  • Recommend comparing total scores or domains using Wilcoxon signed-rank test and comparing individual questions using McNemar 's test (for paired comparison of categorical data). Kappa and generalized kappa will provide measure of agreement between patient and caregiver. Can plot kappa statistic over time or severity over time to visualize switch in scores between disease stages. Can compare total scores or domains between cases and controls using Mann-Whitney U-test. Spearman's rank correlation can be used to assess relationship between two variables. Regression methods should be used to assess relationship while adjusting for covariates. Describe yield of the study for the reader to address margin of error the sample size can achieve (see Figure 8.5 in

2018 December 13

Danny Zakria, Plastic Surgery

  • My project is an RCT comparing the pain patients experience with steroid injections with and without lidocaine. We would like assistance with a power analysis to determine how many patients we need to enroll. Data collection underway

  • Mentor confirmed

2018 December 6

Demetra Hufnagel, Epidemiology

  • Double checking selection of statistical tests: pearson vs. spearman correlation. Mentor confirmed.
  • Have 2 markers of interest in ovarian cancer tissues (measured by H-score, range 0-300); they are part of a similar signaling pathway. Another 2 markers are part of a different signaling pathway. Have 200 samples, and the distribution is non-normal. Goal is to assess independence of markers. Also plan to complete regression analysis but want to start with bivariate associations.
  • Do not recommend testing for normality; power is limited with a sample size of 200. Spearman's rho rank correlation is preferred method to assess degree of monotonic association. Recommend using a proportional odds (ordinal logistic) regression model which can adjust for multiple covariates and is very robust. Can plot distribution of H-scores to determine if it is appropriate to include restricted cubic splines or quadratic terms in the model. Continuous variables should not be dichotomized. Likelihood ratio test is a goodness-of-fit test. RMS course notes available at

Maxim Turchan, Department of Neurology; Movement Disorders Division & Mallory Hacker

  • We are attempting to assess a hypothesized three-way interaction in an independent, publicly available, dataset from an observational study and would like to confirm our understanding of the best practice with respect to controlling for our outcome of interest's baseline value. I have previously attended Dr. Koyama's CRC regression course which did a great job of explaining best practices in regression analyses controlling for baseline scores where there is no interaction term, and we wanted to make sure that this was still in the best practice in the context of our analysis including interaction terms. Data collection is complete.
  • Three-way interaction is between disease duration, treatment status (Y/N), and genotype (mutant/normal). Time 0 is when patient starts to be followed (baseline). There are 98 subjects, but the design is unbalanced. Goal is to determine whether patients with a certain genotype should have surgery earlier to improve quality of life.
  • Recommend to look first at whether quality of life score is impacted by disease duration, treatment status, or genotype. Since there were some patients who started treatment before Time 0, the first score for these patients is not a true baseline. Time 0 should be defined as treatment initiation, date of diagnosis, or date of first symptom. May want to review Laurie Samuel's visual pruner method. A landmark analysis looks at patients who survived 1 year (2 years, 3 years, etc.) and what happened after this point; this method is good to uncover patterns. May consider posting this question on Causal inference epidemiologists may have another solution.

CANCELLED Siobhan Hartigan, Urology

  • Our project will compare 25 subjects with severe OAB who have not yet undergone advanced therapies and 25 healthy controls. Baseline data will be collected in the form of demographic information, subject voiding diaries and multiple validated questionnaires as well as uroflow and post-void residual data. Subjects will then present for an fMRI session using a 7T magnet which is available in a suite fitted for patient care and clinical investigation at the Vanderbilt University Institute of Imaging Science. Subjects will undergo a resting state MRI with an empty bladder and then a foley catheter will be inserted. Using an infusion pump, fMRI sequences will be then obtained at increasing levels of bladder fullness in 50cc increments and the subject will alert investigators to a sensation of urgency.
    Would like to address specific biostats plan for analysis.

  • Mentor confirmed, VICTR voucher request

2018 November 29

Dan Ayers, Biostatistics

  • Imputation recommendations for ongoing clinical trial about to close.

2018 November 15

Emily Matijevich, Mechanical Engineering

  • We are hoping to gain insight into the analysis and interpretation of finding regarding a publication on cyclically loaded bone samples.

  • Mentor confirmed
  • Suggest visual inspection using Bland Altman plot. Not specfically interesetd in relationship just if they are correlated. Recommend using z scores for the purposes of variance consistency.
  • For replication question, look at mean difference and confidence interval so you can see if there are overlaps. Suggest bootstrapping to get confidence bands

George Xu, Department of Pathology, Microbiology, and Immunology

  • Our project is testing whether immune gene signatures of thyroid tumors can be used to predict characteristics such as malignancy, histologic type, tumor stage, and lymph node metastasis. We have collected clinical samples, performed RNA seq, and used TIMER, TIDE, and CIBERSORT software to make estimations of tumor-immune infiltrates. We are currently collecting mutation, lymph node, and tumor stage data. We have used logistic regression in R to identify a number of correlations between specific immune infiltrates and thyroid disease types, and we like to learn more about statistical approaches that might work best for our data.

  • VICTR Biostatistics voucher

  • Mentor confirmed

  • Suggest looking at ordinal or alternate measures to identify associations. Will collect more data and return to clinic.

2018 November 8

Yolanda McDonald, Human & Organizational Development/Peabody College

  • In this project, we will conduct a geospatial analysis of non-fatal overdose rates as compared to naloxone distribution, opioid prescribing, and population demographics at the county level in the state of Massachusetts. The demographics that will be examined include race/ethnicity, age, percent of the population over age 65, primary industry in the area, and gender. We will examine the distribution of overdoses between rural, urban, and suburban areas to determine if there are any significant patterns by area types based on increasing percent of overdose rates. Lastly, we will conduct proximity analysis by mapping treatment center locations to examine if overdose rates increase as distance to treatment increase (note, we are still working on getting the data for this). Through this research, we hope to gain a better understanding of the most at-risk populations in the opioid epidemic.

    We want to review our data analysis strategy of using Geographic Weighted Regression and stratification of overdoses by geographic areas. We want to discuss data analysis strategy for be mapping treatment center locations to see if areas where treatment is not close by have higher overdose.

  • Meeting notes: 14 counties in MA. Outcome of non-fatal/fatal overdose rates. Interested in demographics, naloxone prescribing, geographic factors. Identifiy populations at risk by location. 2010-2017 years available. Recommend descriptive and visual output. May consider a model looking at Prob(dying) ~ demographics + variable of interest. Splines can be used when continuous variables don't behave the same way across all values and may affect the probability differently at different points along the scale ?rcs for restricted cubic splines.

Patrick Kelly, Neurosurgery

  • We are submitting an initial request for industry funding for a randomized phase II study, and need to determine an efficient study design (accrual will be slow).

    In simple terms, patients with resectable brain metastases undergo surgery, then stereotactic radiosurgery (SRS), then systemic therapy. Osimertinib is a new first-line drug for EGFR-mutated non-small cell lung cancer and it has much higher CNS penetration. The aim of the trial is to assess if treatment with surgery+osimertinib is equivalent/non-inferior to surgery+SRS+osimertinib with respect to local brain recurrence (either proportion at 1-year or time-to-local recurrence).

    Mentor confirmed

  • Meetings notes: Estimate local recurrence rates lower and better QoL with SRS (precise application of radiation) compared to surgery alone. Interested in applying new drug to everbody. Randomizing to radiation or not. In slightly different population, recurrence ~28%. Estimated 1 person per month. Suggest working from how many people and determine what difference can be detected based on that sample size.

WITHDRAWN Lyly Nguyen, Plastic Surgery

  • I took over a project, and there was already a sample size in the approved IRB protocol. I need assistance in how they did the power calculation for a study. Design complete but no enrollment/data collection

  • Mentor confirmed

2018 November 1

Erik Lamers, Mechanical Engineering

  • We want to determine if a wearable assistive device (intervention) can affect the rate at which six individual lumbar muscles fatigue during a 90 second, static leaning task. We want to know what statistical analysis would be appropriate to determine if the intervention had an effect on the rate of individual muscle fatigue relative to the control trials.
  • Percent initial median frequency for an individual muscle decreases over time. Smaller slope indicates less fatigue. The intervention is always applied second (between two control trials), and there are 20-minute resting periods between the trials.
  • The order in which the intervention is applied leads to time effects. Ideally, the order in which the intervention is applied should be randomized. Recommend using linear regression to test whether slopes of the 3 trials are the same. Compare control trials, but expect them to be the same. ARIMA model allows for different levels of dependence; use autoregressive or moving average correlation structure (try orders 0 and 1). Calculate Akaike Information Criteria (AIC) for each model; report model with smallest AIC because indicates best fit. In a response feature analysis, can subtract slope of intervention from the average slope of the two control trials to calculate how effective the intervention was for a particular subject; then test differences in this metric across subjects. Test each muscle separately and adjust for multiple comparisons.

Eric Honert, Mechanical Engineering

  • Soft tissues throughout the human body deform and perform negative work (energy absorption) after foot contact in human walking. The magnitude of the absorption performed by all soft tissues has been found to increase with walking speed, however it is not known how different soft tissues sources contribute to the overall soft tissue absorption. In this study we sought to characterize the soft tissue absorption performed by the foot+shoe (i.e., subcalcaneal fat pad and shoe crepe) vs. the rest of the body (e.g., viscera, intervertebral disks). We collected ground reaction forces and whole-body kinematics while ten subjects walked at different speeds and slopes. We computed various estimates of mechanical power, then utilized a previously-published Energy-Accounting analysis, to estimate the amount of soft tissue work immediately following foot contact. Next we extended this prior analysis by parsing out how much of this soft tissue work was due to deformation of the foot (and shoe) vs. work done by soft tissues elsewhere in the body.
    I would like to discuss the approach (or discuss an alternative) that I am taking for if there are statistically significant correlations between the soft tissue absorption work with increasing speed and with increasing slope. My current approach is as follows:
    I first assessed the normality of the foot soft tissue absorption work across all speeds (0.8, 1.0, 1.2, 1.4, 1.6 m/s) for each slope (-9, -6,-3, 0, 3, 6, 9) and vice versa through one-sample Kolmogorov-Smirnov tests. If the data were normally distributed, I examined the correlation through a Pearson’s correlation coefficient and related p-value. If the data were not normally distributed, I examined the correlation through a Kendall tau correlation and related p-value. Finally, we performed a Bonferroni post-hoc error correction; as such, significance was evaluated with a family-wise alpha=0.05, with a per comparison alpha=0.0041 (i.e., 0.05/12).
  • Study included 10 subjects. Recommend creating a heat map of data in Table 1: Foot Soft Tissue work (J) during Foot Absorption at different speeds and slopes. Do not recommend testing for normality; just use non-parametric methods because they are more robust (ex. Spearman’s correlation coefficient for continuous data). Kendall’s Tau correlation coefficient is more appropriate for discrete data. Family-wise error rate correction used to keep error rate below 5%.

2018 October 25

Marshall Guo, Medicine/Pulmonary Critical Care

  • We plan to conduct a PheWAS analysis of two SNPs (rs291102, rs2275531) in the PIGR gene. The MAF are 11.5% and 37.97% respectively. rs291102 has been associated with IgA nephropathy and we hypothesize other disease associations. Record Counter reports 30313 patients with genotyping information for both SNPs.
    Prior to requesting data, we want to discuss statistical analysis methods.

  • VICTR Biostatistics voucher, abstract
  • Mentor confirmed
  • Meeting notes: Correlation between the two SNPs and BioVU info. Association with Buerger's disease. Specifically, power testing for outcomes. Multiple correction testing? Use broad non-gene specific resource for comorbidity relationships. EG relationship with COPD and Nephropathy. Discussed limitations of cross validation, using other cohorts. Suggest do large panel PheWAS using BioVU as an exploratory analysis. Discussed tool useful for finding relationships (Nancy Cox?)
  • Recommend VICTR voucher for getting support for analysis. (Up to $5000). Export more variables for writing the paper.

Robert McKnight, Internal Medicine

  • Assessing the effect of smoking on IPF disease phenotype, specifically the effect of smoking and latency between quit date and diagnosis. Progression and severity assessed using continuous variable of PFT parameters and survival. Need help with eliminating survival bias and lead-time bias.

  • Protocol with no expected funding support, Abstract

  • Mentor confirmed

  • Meeting notes: Peeople diagnosed with IPF and have smoking habits. Also looking at people at risk. Looking at smoking/non-smoking and quitting association with outcome. Number of pack-years. Data shows relationship with protective nature of quitting. Bias problems: smokers more likely to be seen, more likely to have pack-years. If you quit smoking, does your trajectory change?

  • For the at-risk cohort study, can do a simple survival analysis to determine time to disease progression.Could consider competing risks models or using "death or disease" as outcome for supplemental analysis. Suggest 2 separate cohort studies. For those diagnosed, time to death. For those at risk, time to disease.

2018 October 18

WITHDREW: Joseph Lambert, Special Education

Amosy M’Koma, Biochemistry, Cancer Biology, Neuroscience and Pharmacology

  • Defining normal reference interval of blood human alpha-defensin 5 (DEFA5/HD5). We will use blood collected from healthy volunteers to define, establish and verify reference intervals from 240 subjects (120 males and 120 females). VICTR Biostatistics voucher, VICTR voucher not including biostatistics
  • Eventual goal is to measure DEFA5/HD5 from a blood sample to diagnose UC or CD. Have already collected tissue and blood samples from 10 cases (5 UC, 5 CD).
  • Recommend collecting samples from healthy volunteers until confidence interval (90th, 95th, or 99th) reaches pre-specified width (+/- x).

Roxanne Rashedi, Osher Center

  • We are preparing a VICTR grant to further investigate the efficacy of mindfulness-based teaching as a catalyst for children’s development of their habits of mind. The data collection has been completed for this project and we developed a multiple methods, retrospective and investigational study. This study explored the long-term effects of a mindfulness training program, which was implemented in the 1990s. We would like to get an estimate for a VICTR biostatistics voucher and review the data analysis plan in the proposal.
  • Primary outcome is self-reported perceived stress level. More than half of the cases did not respond or consent to follow-up study. Used magazine advertisements to recruit controls who attended the same school/district and were the same age range (grades 3-5) in the 1990s but did not receive mindfulness training. Were unable to obtain class rosters from school districts to contact the controls directly. Collected complete data from 80 subjects (40 in each group). Identified some demographic differences between cases and controls.
  • Concern for self-selection bias given controls volunteered for the study. There is confounding between the mindfulness intervention and method of recruiting control subjects. Can present descriptive statistics and boxplots by group and use t tests. Correlation analysis is less impacted by non-response bias (ex. compare intervention between males and females).
  • Recommend applying for VICTR Award for biostatistics support (90 hours). Email draft of proposal to Dan Byrne.

2018 October 11

Kristen Yancey, Otolaryngology

  • Data collection and basic summary/descriptive stats completed. Seeking guidance on which correct higher level tests to run to evaluate for clinical significance. Planning to submit an abstract.
  • Adult chronic sinusitis patients complete two surveys about symptoms at every follow-up appointment (3m, 6m, 12m, 24m, but some patients do not complete survey at all time points). Want to know if age impacts symptom score (range 0-60 or 0-22 for each survey, respectively).
  • Recommend developing a composite symptom score based on the two surveys (ex. average across all time points). Plot score vs. age. If relationship is non-linear, may consider using restricted cubic spline. Can then build a linear regression model for composite score by age. May consider categorizing patients into four age groups. The non-parametric Kruskal-Wallis test will then be appropriate.
  • Can apply for a VICTR award for Biostatistics support, if needed.

Joseph Lambert, Special Education

  • In Applied Behavior Analysis (ABA), extinction is a term that describes the discontinuation of reinforcement for a previously reinforced behavior which decreases some measurable aspect of its occurrence (e.g., frequency, duration) to some pre-reinforcement level. Extinction bursts are the most commonly observed collateral effect of extinction and are characterized by a temporary increase in responding (relative to baseline) at the onset of extinction. Although the prevalence of extinction bursts can be determined through post-hoc analyses, no method, to date, exists to predict or control their occurrence.
  • The objective of this exploratory project would be to determine whether there is a correlation between a reinforcer’s baseline (pre-extinction) unit price and the relative probability of extinction bursts. The study is designed to be a low-stakes translational investigation in which the reinforcement histories and response patterns of arbitrary behavior, which serve as proxies for challenging behavior, can be carefully controlled and analyzed during parametric analyses focused on any interactions that may exist between baseline unit prices and extinction bursts.
  • I want to know: (1) Whether the probability of extinction bursts decreases as baseline unit prices systematically increase. (2) Whether Pmax (a number that quantifies reinforcer value in behavioral economics) predicts the occurrence/non-occurrence of extinction bursts for all participants in the study. For bullet #2, I’ll place every participant in either a “burst” or “no burst” group based on their performance during my first analysis. Then, I’ll determine the location of each participant’s baseline unit price, relative to Pmax on a demand-curve analysis (i.e., an analysis meant to identify Pmax). My prediction is that baseline unit prices that fall to the left of Pmax will produce extinction bursts and baseline unit prices that fall to the right of Pmax will not. If this turns out to be the case, there are a number of useful applications for the assessment and treatment of severe challenging behavior that could be the focus of future research initiatives. Currently, I need help identifying an appropriate sample size and statistic that I can use to test the significance of any correlations that I find. Planning to submit a grant.
  • Planning to recruit 25 subjects into 5 different groups (differ based on number of times the button needs to be pushed). The assessment will be based on individual characteristics and behaviors; will then calculate Pmax for each subject.
  • Recommend calculating kappa statistic to measure agreement between observed and predicted extinction bursts. Can download nQuery Advisor software to calculate sample size for kappa statistic. Another option to use McNemar 's test and calculate an odds ratio.

2018 September 27

Prathima Anandi, Rheumatology

  • We are looking at polymorphisms associated with adverse effects of azathioprine.
    We are planning to do logistic regression but wanted to know other inputs.
    Want to understand more about sample size calculation using PS software Expected outcome: Grant
  • Meeting notes: Immunosupressant has side effects including leuokopenia. Looking at SNPs predictive of leuokopenia. TPMT 25% predict rate. Wants to look at others. Discovery cohort of 500 caucasian cohort on the immunosupressant with WBC available. Data includes phenotype (demographics), dosage, other meds, 55 SNPS for analysis. Logistic regression Leukopenia <4000 case/control. TPMT haplo and SNPs didn't show association. Added information about if they were treated for leukopenia and excluded (23%). Repeated analysis. Negative paper. Initially used Power calculation to get sample size. Recommend penalized models. Consider transformations that may be useful for examing WBCs. Separate genetic information from other factors.

2018 September 20

Marjan Rafat, Chemical and Biomolecular Engineering

  • I am interested in studying the effect of lymphopenia after radiotherapy on local recurrence in triple negative breast cancer. My initial study at Stanford allowed us to identify the problem, but we did not have enough patients to stratify the patients correctly. I would like to be assisted with a power calculation to determine the number of patients necessary for properly answering this question. Expected outcome: Grant
  • Outcome of local recurrence based on lymphocyte count measured repeatedly starting at 5 months post-radiotherapy. Patients must have 5 years of follow-up to be included in the study. Expect to have 300 non-recurrent patients and 75 recurrent patients eligible for the study, and 30% of these patients are obese.
  • Recommend using a time-to-event outcome (time to recurrence) and include lymphocyte count as a time-varying covariate, stage (I, II, III) at baseline, and obesity status at baseline in the Cox PH model. Can use desired HR for baseline covariates to calculate sample size, but inclusion of time-varying covariates requires complex simulation to calculate sample size. Rule-of-thumb is 15 events per parameter in the model. Want a good spread of patients by stage and obesity status.

Lee Wheless, Dermatology

  • I am looking at comparing between individuals the cumulative incidence of skin cancers. Since each person can develop numerous cancers, it’s not a simple binary outcome, and the slope of the incidence curve is not always linear, otherwise I would compare the beta’s. My goal is to develop a model to predict a person’s incidence over time, adjusting for demographics and other variables. Expected outcome: Grant
  • Each lesion is considered a separate skin cancer event. Have data for ~1000 patients aged 0-99 years. Plan to evaluate whether patient has any other types of cancer.
  • Recommend restricting data to a specific period of follow up. Model count of skin cancers using a zero-inflated Poisson/negative binomial model adjusting for age and genetic covariates. Accounting for correlation between skin cancer events will add complexity to the model.

WITHDREW: Shari Barkin, Pediatrics

  • Assistance with randomization. Expected outcome: Grant

2018 September 13

Christopher Gray, ED

  • Mentor to attend.

  • Outcome: Abstract with no expected funding support

  • Need to formulate calculations for ranges of variables. Would like advise on how best to present the data to show how meaningful the results are.
  • Meeting notes: n=41. size of initial bleed predictive of outcomes. Look at death outcome at 30 days as a logistic regression with size of bleed as the independent variable. Also possibile to do a proportional regression looking at severity of rankin at 30 days.

Sahana Kalburgi, Neuroscience

  • Outcome: Other

  • I am comparing resting state and task based EEG data in typical and autism populations. I would like to compare changes within group and across groups across various parameters.
  • Meeting notes: Study is interested in validating microstates in 28 children eyes closed vs eyes open. More interested in clustering of topographical output rather than actual topographical data. Visual comparison shows similar microstates to literature. Recommend boxplot to compare pairs or spaghetti plot to show trajectory. Treat each microstate separately and compare closed vs open. Do a wilcoxon rank sum or paired t-test. Add an additional state to account for all other states so a proportional test can be performed. Also recommend hotelling's t-test.


2018 September 6

Matthew Lenert, Biomedical Informatics

  • Mentor to attend via phone.

  • Outcome: Protocol with no expected funding support

  • Evaluating the effect of a differential diagnosis educational app for medical students. Need guidance with the experimental design.
  • Second year medical students complete 6-8 clerkships prior to taking a comprehensive shelf exam. Have developed a quiz game and are planning to develop a case simulator. An RCT will be used to evaluate the app. Students will be randomized based on whether they own an iPhone (~60%). Plan to adjust for quiz score(s).
  • Phone type could be a confounding variable. Recommend an intention-to-treat analysis. An observational study is unlikely to produce strong evaluation data. Should run RCT long enough to yield adequate sample size rather than a set duration. Do not recommend including dose response (number of questions answered) in the primary analysis. Can motivate students to use app more frequently.

Joseph Wong, Biomedical Informatics

  • Attended clinic with mentor on 8/16. He would like to follow up on recommendations made at prior clinics regarding regression analyses. Investigating determinants of patient satisfaction with an online patient portal (My Health at Vanderbilt). Plan to submit an abstract of the analysis. 12,000 patients completed a 12-question patient satisfaction survey prior to the EPIC implementation. The questions were based on a 5-point Likert scale (score range 12-60). Built univariate linear regression models for satisfaction score and selected important factors for a multivariable linear regression model. Also built a linear regression model with polynomial terms for Computer Attitude Measurement (CAMpc) score. Can run a chunk (omnibus) test for multiple variables at one time. Recommend adding histograms for percent satisfied by EUCS, CAMpc, and Brief Health Literacy Score (BHLS). Hierarchy principle states one must not include higher order terms without including lower order terms. Recommend including only up to a quadratic term and the linear term in model. Should take the square root of the Health Result Function rather than the logarithm; include terms for the square root and the square of the square root (linear). May want to include sunflower plots, hexagonal binning, or heat maps. Restricting the analysis to patients with a satisfaction score <25 may yield different results.
  • A proportional odds model does not require a cutoff and permits various distributions (Stata ologit). Current Table 1, violin plots, and sunflower plots are appropriate.

2018 August 30


2018 August 23

Peter Louis , Pathology, Microbiology and Immunology

  • PD-L1 is upregulated in several cancers such as melanoma, leukemia and nonsmall cell lung cancer (NSCLC). This over expression allows cancer cells to escape elimination by tumor specific cytotoxic T lymphocytes. PD-L1 is measured in a variety of tumor types to help guide best therapy. We will review those test results with the goal of determine whether there is an association between tobacco exposure and PD-L1 expression.

  • Outcome: Protocol with no expected funding support
  • Meeting notes: Discussed outcome being non-continuous. Can treat as ordered response. Proportional odds model recommended. also discussed percentage score as trigger for billing and other decision making.

WITHDREW: Gowri Satyanarayana, Internal Medicine/Infectious Diseases

  • Attending this clinic for assistance in statistical design for a Pfizer grant: Project design: we will be giving inpatient Clinical Providers feedback on antimicrobial utilization (targeting overall antimicrobial use and specific “high risk” antimicrobials). We would like to examine if overall use and use of specific antimicrobials decrease after the feedback period.

  • Outcome: Grant

James Gay, Pediatrics

  • Looking at behavioral logs for potentially suicidal patients. Data includes 2700 patients over 3 years who meet crteria of psychiatric, sitter service or dedicated manager inclusion criteria.

  • Outcome: Abstract
  • Meeting notes: Interested in time trend. Recommended length of stay analysis using proportional odds. Prior to analysis, should consider grading severity to get more power

2018 August 16

No Show: Sepideh Shokouhi, Psychiatry

  • I am interested in the design of bayesian adaptive clinical trials. My specific question is how to use winbugs (or similar codes) to estimate the drug response model, similar to Satlin A et al. Alzheimer’s & Dementia 2016. Outcome: Abstract.

Joseph Wong, Biomedical Informatics

  • Investigating determinants of patient satisfaction with an online patient portal (My Health at Vanderbilt). Plan to submit an abstract of the analysis. 12,000 patients completed a 12-question patient satisfaction survey prior to the EPIC implementation. The questions were based on a 5-point Likert scale (score range 12-60). Built univariate linear regression models for satisfaction score and selected important factors for a multivariable linear regression model. Also built a linear regression model with polynomial terms for Computer Attitude Measurement (CAMpc) score.
  • Can run a chunk (omnibus) test for multiple variables at one time. Recommend adding histograms for percent satisfied by EUCS, CAMpc, and Brief Health Literacy Score (BHLS). Hierarchy principle states one must not include higher order terms without including lower order terms. Recommend including only up to a quadratic term and the linear term in model. Should take the square root of the Health Result Function rather than the logarithm; include terms for the square root and the square of the square root (linear). May want to include sunflower plots, hexagonal binning, or heat maps. Restricting the analysis to patients with a satisfaction score <25 may yield different results.

2018 August 9

Douglas Brinkley, Cardiology

  • Propose a propensity-matched retrospective analysis of exposure (drug vs. no drug) on primary outcome of survival (Cox regression) and secondary outcome of adverse event frequency (logistic). Have access to a registry of 16,000 U.S. and European patients with heart failure who have received a left ventricular assist device (LVAD) implant. There are 4 exposure groups based on RAAS inhibitor and MRA status at post-operative month 3 (neither drug, RAAS only, MRA only, both drugs). Primary outcome is all-cause mortality censored for transplant or explant for recovery. Mortality rate is generally 15% at 1 year post-operation. Adverse event rates will be calculated at 12 and 24 months. Plan to use 1:1 propensity score matching and to run two separate models for RAAS vs. neither drug and for MRA vs. neither drug. The Cox PH regression model will adjust for the propensity score and important covariates that have significant effects (p <0.1) in univariate analysis.
  • May want to remove certain exclusion criteria and simply run the statistical analysis on a subset of patients or use statistical methods to determine exclusion criteria.
  • Recommend applying for VICTR Award for biostatistics support (90 hours). Email draft of proposal to Chang Yu.

2018 July 26

Chris Guidry, Trauma Surgery

  • Surviving Sepsis guidelines state that antibiotics should be initiated within one hour of suspected sepsis/culture sent to lab. Most QI studies examine how well this guideline is adhered to. Previous pre-post study at one hospital looked at two groups of patients in whom antibiotics were initiated aggressively (n=100) or cautiously/waited (n=101). Baseline characteristics were similar between the two groups. All-cause mortality for waiting was 15% lower than aggressive. Proposing a smaller cluster-randomized crossover trial at multiple hospital sites. Sites will be randomized to which treatment is used during the first time period. Plan to include twice as many patients in the aggressive arm. Expect to complete one interim analysis.
  • Recommend using a superiority design. Cluster randomization requires many more patients than randomizing at the patient level. Recommend having at least 10 hospital sites. Power calculation should not be done assuming results will mimic previously published data. Should use a clinically meaningful change in mortality (an effect you do not want to miss, minimum clinically meaningful effect) to calculate the power. If unable to recruit adequate number of patients, can use a surrogate endpoint (ex. bacterial load, hospital length of stay) to gather data for future proposal. Additionally, may want to survey expert practitioners regarding equipoise. May want to contact Gordon Bernard regarding sepsis studies. Option to apply for a VICTR Studio to gather additional experts to discuss study design, grantsmanship, and statistics.

2018 July 12

Wendy Bottinor, Cardiology

  • Childhood cancer survivors are at risk for long term adverse effects of treatment. One common adverse event is cardiotoxicity. As a result, prominent medical societies recommend cardiac screening asymptomatic patients. Two recent studies have called the cost-effectiveness of screening into question. These studies used simulated populations. We would like to analyze cost-effectiveness using our population here at Vanderbilt. VICTR voucher request. Data collection is underway.
  • Intervening before the patient is symptomatic could delay the progression of disease. In Vanderbilt population that had an abnormal echo, goal is to determine if/what changes were made to the management plan (prescribe heart failure medications, recommend exercise plan, order additional diagnostic tests, etc.). Two published studies have concluded that the screening is not cost-effective. Also want to calculate the number of echoes needed to find one case that had a change to the management plan.
  • Recommend estimating the probability of getting a treatment (or another diagnostic test). Can calculate proportion of patients who had a change to their management plan. Another option is to compare time to cardiac event (or time to abnormal MRI) between patients who had an abnormal vs. normal echo. If the echo results are quantified, what predicts ordering an MRI at Vanderbilt? May want to contact Josh Peterson regarding cost-benefit analysis.

* Recommend applying for VICTR Studio and VICTR Award for biostatistics support (90 hours). Email draft of proposal to Chang Yu.

WITHDREW: Jessie Sellers, Neurology

  • We plan to collect preliminary data for effect and sample size calculations to inform a U01 grant. We will review charts of 5 Huntington’s disease patients taking 3 different atypical antipsychotics (AP) (n=15 total) and determine change in neuropsychiatric and chorea scores after AP initiation.

2018 June 28

Jennifer Hartfield, Center for Men's Health Studies

  • Walk in seeking assistance with R. New user, attempting to use use R for logistic regression. Suggest to use this clinic time to discuss modeling.
  • Outcome is ordinal, predictor is ordinal.
  • Suggest starting with visual repreentation, bubble plot, and perhaps a five by five table.
  • Need rank correlation between variable to determine strength of association. e.g. Spearman, Kendall, Sommers
  • Use proportion odds model to add additional variables

2018 June 21

Staci Sudenga, Epidemiology

  • Liver cancer incidence and mortality differs by race/ethnicity and gender in the US. Variations in liver cancer trends may be due to differences in risk factors associated with liver cancer etiology among the different race/ethnicities. We propose to use the Southern Community Cohort Study to examine racial/ethnic differences in liver cancer risk. Need assistance with power calculation.
  • Hypothesis testing for an interaction effect requires one to posit the effect size. Instead of a power calculation, can use a precision calculation to estimate the effect size of the interaction (ratio of HR's or difference in HR's) by using the estimated standard error (margin of error) of what you want to estimate. Also need to know the number of cancer cases for a given binary risk factor; use the risk factor that is most unbalanced and most important (Hep C). Create a 2x2 contingency table to determine n1, n2, n3, and n4 (ex. the number of incidence cancer cases with Hep C is 24). The var(log Δ HR) = var(ratio of HR) = 1/n1 + 1/n2 + 1/n3 + 1/n4. Take the square root to obtain SE(log Δ HR), which is used to calculate the 95% confidence interval (multiplicative margin of error with 0.95 confidence).
  • Reference"precision"

Francis Prael, Pharmacology, and David Weaver, Faculty Mentor

  • Questions surrounding criteria for statistical significance of high-throughput screen. Specifically, criteria required for a VICTR Resource Request.
  • Planning pilot project with 10,000 screens. False negatives will be addressed by running the screen in duplicate. Pick out 100 to get 10 active. To address robustness of statistical methods, recommend Wilcoxon 2-sample (Mann-Whitney U) test (does not assume normal data, 95% efficient compared to t-test, evidence for normality with smaller datasets is not always clearcut) rather than t-test. Also recommend using Wilcoxon signed-rank test rather than paired t-test. If data are normally distributed, median is only 67% efficient compared to mean; recommend reporting the mean in this case. Recommend plotting effect ratios rather than P-values.

2018 June 14

Jennifer Hartfield, Center for Research on Men’s Health

2018 May 31

Zeb White, Hearing and Speech Sciences

  • We are investigating young children who stutter and how their parents react to their speech. We have collected data with an unverified, experimental 40-question parent-report instrument, and we would like help refining and understanding the data we have acquired.
  • Meeting Notes: Developed tool to measure parent interaction with children (RYCS) 5 point scale 0-4 exhibiting behaviors. 3 subgroups of 40 questions: Timing, Emotion, Language. Interesting in narrowing questions, removing uninformative, repeating.
  • Compare parents of kids who stutter with those who don't. Data is exported in redcap. Stuttering diagnosis based on clinical tests, severity. Also interested in frequency/severity associations.
  • Mann Whitney U test to test for differences per question. Logistic regression for multivariate. Spearman correlation for all to see how they match each other. Classification and Recursive Partition Tree to determine which question is best at separating the groups. Recommend R clinic

2018 May 17

Stephen Wilson, Hearing and Speech Sciences

  • I am analyzing a dataset in order to write up a paper. We evaluated language function in 21 stroke patients with aphasia (acquired language disorder), 2-6 times (every ~2-3 days) during the first 2 weeks after stroke. There are 79 data points in total, and 8 different language measures per data point. I would like to address the following questions: 1) Which language measures shown improvement over this time period, and what is the rate of improvement? 2) For each measure, does rate of improvement depend on initial severity? 3) For each measure, is improvement linear, or does it have a “decelerating” time course? I have carried out an analysis in R using mixed models. I have not done this kind of analysis before so I am not sure if I did it correctly. So my hope is to evaluate my analysis and fix it if necessary.
  • Generated summary score, QAB overall, based on all components. Should evaluate whether the components are correlated. Calculate mean rate per unit improvement for each patient using Day 2 and 14 scores. Can fit a regression line using splines for each patient. A linear mixed effects model can also be fit using lme() function in 'nlme' R package. Can look for variation in slopes among patients using simulate.lme() function.

2018 May 3

Vickie Hannig, Pediatrics/Genetics

  • Our project utilizes a 24 question validated survey with a 7 point Lickert scale, completed by the patient before and after a genetic counseling visit to assess the effects of the visit. We would like to find out how many completed surveys we should collect to have enough data to draw some conclusions. We have >40 completed surveys so far, and ~ 20 in which only the “before” survey was completed. We would also like advice on the best way to score the data (?exel spread sheet or red cap?) and an estimate of the cost of statistical analysis needed for publication, so we can apply for VICTR funds.
  • Expected outcome: Protocol with no expected funding support, VICTR Biostatistics voucher, Abstract, Other
  • The survey collects some demographic information (ex. education level, reason for counseling) but not the name of the genetic counselor (5 counselors involved) nor the patient's medical diagnosis (of which there are a variety). A successful outcome is defined as the patient feeling empowered. Is it possible to include a control group that does not receive genetic counseling? May consider excluding patients who previously received a diagnosis.
  • The large number of non-responders to the "after" survey is a source of non-response bias. Providing an incentive to patients may improve the response rate. May compare baseline responses of responders to the baseline responses of the non-responders. REDCap is able to randomize the order of the survey questions. May want to ask an important, global question first. Asking a smaller number of questions of each patient, and combining the data, may also yield a better response rate because each patient has to answer only a few questions. In the limitations section of the manuscript, it should be stated that these are the findings among patients willing to complete the surveys.
  • May want to apply for VICTR Award for future biostatistics support (90 hours).

2018 April 19

Seth Rhoades (mentor Jake Hughey)

  • "I used a multi-level modeling procedure taken with my data, and would like to review the approach and its validity."
  • Data collection complete.
  • Expected outcome: Abstract
  • Collected de-identified synthetic derivative 3600 subjects with data available (2002-2018) data from clinic includes self-reported duration, patterns, demographics.
  • Interested in what factors affect sleep in this community. Concerns about selective cohort bias. Patient subset has trouble sleeping, referred, possible apnea.
  • Models so far look at demographics and compare to literature. Look visually at distributions. Recommend treating skewed data as ordinal to account for zero-inflation. Recommend proportional odds with outcome as a function of age or gender interacted with effect of interest. Sensitivity analysis: sleep duration, sleep start time/midpoint correlation.

2018 April 5

Jason Cook, Cardiology Fellow & Joseph Fredi

  • "Intravenous fluids are given to patients undergoing cardiac catheterization to prevent kidney damage from contrast induced nephropathy. The volume and rate of fluid administration and volume has been established in the literature (POSEIDON trial). The aim of our study is to assess whether balanced crystalloids is superior to normal saline in preventing kidney injury in patients undergoing cardiac catheterization."
  • Expected outcome: VICTR Biostatistics voucher, Grant. Generally have 40-50 cardiac catheterization patients each day, and plan to exclude emergent cases. Outcome is post-cath SCr level (kidney injury) and at 1-month and 6-month follow-up visits compared to baseline SCr.
  • Recommend randomizing patients to IV fluid type. This assignment should be blinded; pharmacy can distribute numbered fluid bags. A statistician can generate the randomization table to upload to REDCap. A pragmatic clinical trial that does not require collection of additional data on a CRF and applies cluster randomization by cath lab may be another option. Power analysis depends on event rate and expected difference between IV fluid groups. Threshold for kidney injury (binary outcome) needs to be validated. Continuous outcome will require fewer subjects. Recommend completing power analyses for both the continuous and binary outcomes. Proportional odds model may be used if assumptions are met. Recommend contacting Cheryl Gatto, Learning Health Systems, regarding whether this qualifies for a design studio. Can also seek biostatistical support through Cardiology collaboration plan or VICTR award.

2018 March 29

Megan van der Horst, Chemistry Graduate Student

  • "I develop diagnostic tests for tuberculosis and want to know how many samples I need to be able to detect a difference between my test and the commercially available alternative."
  • Expected outcome: other. Goal is to calculate sensitivity and specificity of new test. Test result displays 0-2 lines (0 lines or left line positive = invalid test (must retest), right line positive = no TB, both lines positive = TB).
  • Recommend looking at precision of estimated proportion (ex. 0.1 margin of error yields 96 samples per TB +/- HIV group). Search for "96" in

Chris Guidry, Trauma and Critical Care Fellow

  • Visualizing outcomes data for quality reporting. Standard is to report O/E ratio. Want to create risk plot for continuous O/E using ACS NSQIP data. Plan to apply for VICTR Award for Biostatistics support.
  • O/E shown to have anomalies when E is very small. Recommend plotting E vs. O in 100 bins and adding a 45 degree reference line (1.0). Recommend scheduling a meeting with Frank Harrell to write a statistical analysis plan for VICTR application. Sharon Phillips may have knowledge of NSQIP data format.

2018 March 22

Sophia Yu, Medicine/Endocrinology Fellow & John Stafford

  • "I am investigating the effects of exercise in lean, obese, and diabetic subjects. I am performing a HDL proteomic assay looking at HDL’s protein composition and the posttranslational modifications of its proteins before and after exercise. I am not sure about the statistics: 1) how do I know the power? 2) what tests do I perform to determine if the results are significant? Of note, I have 3 groups (lean, obese, diabetic) with 6 subjects in each group, and I have an intervention (exercise). I anticipate quantifying the levels of 20 pre-determined proteins before and after exercise, as well as looking at the posttranslational modifications of those proteins."
  • Expected outcome: VICTR Biostatistics voucher, Grant, and Abstract. Multiple reaction monitoring method to quantify levels of proteins. Limited to pooled runs of 3 subjects each instead of a run for each individual subject. There are 5 females and 1 male in each group.
  • Recommend separate analysis for each protein. Ideally, data will be normally distributed because non-parametric tests will not work well with 2 samples per group. Present results of exploratory data analysis. Recommend talking with Chris Lindsell regarding biostatistics support for grant.

Neil Newman, Radiation Oncology Resident & Evan Osmundson

  • "I have completed an analysis of the NCDB comparing two radiation treatments on the outcome of OS and would like to review some key questions regarding my methods/codes and perhaps find someone who would like to serve as a co-author. If this required funding I do not imaging it would take many hours."
  • Expected outcome: VICTR Biostatistics voucher. For patients with early-stage small cell lung cancer, the current standard of care (n=1150) is treatment with conventional dose of radiation (6-8 weeks) and chemotherapy. SBRT (n=176) has recently gained interest for these patients; only half received follow-up chemotherapy. Have already built a multivariable regression model with backward variable selection. Adjusted for clinically relevant variables (with p < 0.3 in univariate analysis) and other variables with p < 0.05 in univariate analysis).
  • Since the rate of disease progression is not constant, then you likely have bias with patients detected at screening who are different from patients detected symptomatically. R package 'glmnet' incorporates penalized variable selection (lasso). Propensity score analysis is ideal if you have enough subjects. Recommend scheduling a meeting with Li Wang to write a statistical analysis plan for VICTR application.

2018 March 15

Josh Lander, Medical Student

  • Conducted a survey of medical students (243 of 435 responded) regarding their opinions of and attitudes toward point-of-care ultrasounds (POCUS); also collected student demographics. Goal to create and validate perceived value and usefulness scores for ultrasounds.
  • May not have a rich outcome set. Recommend creating graphs to summarize opinions and attitudes; evaluate whether data points are valid. Generate a connected line graph that includes percentages for every response stratified by class year. Can generate scatterplots for value and usefulness summary scores. A factor analysis to determine the principle components for value and usefulness may be worthwhile.

WITHDREW: Jessica Grahl, Pediatric Pharmacy Resident & Jennifer Hale, NICU Clinical Pharmacist

  • "We are aiming to assess the impact of a non-pharmacologic pilot program on morphine utilization (retrospective before-and-after study). Would like help figuring out which statistical tests to utilize. Additionally I do not have access to statistical software system (i.e. Stat or R) therefore I will need to run stats in excel."

2018 March 8

Cynthia Arvizo, OB/GYN Fellow

  • Multi-site retrospective cohort study of AVM patients treated with artery embolization vs. standard treatment. Treatment is constant over time. Condition is very rare; expect to have 70 subjects in a 10-year period (2/3 with artery embolization and 1/3 with standard treatment). Outcome is occurrence of second treatment (ex. hysterectomy, another embolization). Goal is to estimate event rate in each treatment group (expect 40% in each).
  • May want to apply for VICTR Award for future biostatistics support.

Greg Terry, Radiology

  • Applying for VICTR HIV/AIDS CFAR Grant (due 3/9). Will use data from HATIM Study (Kerta). Plan to collect CTA scans of coronary arteries to measure artery size and plaque buildup. Goal is to compare plaque levels between patients with HIV and controls with cardia; expect to enroll 100 patients. Will stratify patients into 3 categories by artery size and review CTA's for 13 patients with coronary calcium (highest category) and 12 patients without coronary calcium (lowest category). Can calculate expected power assuming a 30% (20%, 10%) difference between the groups.

2018 February 8

Linda Sealy, Associate Dean for Diversity, Equity, Inclusion / Basic Sciences

  • "Graduate student climate and culture survey likert scale data sorted by gender and racial/ethnic identity. Significance of differences?"
  • Possible responses to the questions were completely satisfied, somewhat satisfied, somewhat dissatisfied, completely dissatisfied or excellent, good, fair, poor. 106 out of 250 students completed the survey. The goal is to determine whether there are significant differences between females (n=65) and males or between underrepresented minorities (n=19) and all other students. Also plan to look at whether there are changes over time.
  • Using males as a control group, can report descriptive statistics, such as mean satisfaction score, with confidence limits and plot histograms or stacked bar charts. Recommend randomizing mentors to workshop or control group and collecting surveys at random time points during the year. Mario Davidson may be able to provide further guidance.

2018 January 25

Sarah Grayce, Child and Adolescent Psychiatry Fellow & Greg Plemmons, Faculty Mentor

  • "Collected demographic and clinical data on children who presented to the VCH ED with psychiatric concerns and would like to analyze that data with the goal of answering a number of questions about how/why those children came to the ED and predictors for their clinical outcomes."
  • Population: Children <18 presented to ED with psychiatric concerns 330 separate visits.
  • Data collected: Who referred them? Geographic areas? Presenting complaint. Safety, Trauma, DCS cases, Hospitalizations , previously seen
  • Longitudinal Study: August to November redcap database collection, mix of retrospective and in-person completion
  • Research Questions. How to present data. % complaints, other care,
  • Is there a temporal relationship between number of visits?
  • Predictors of presentation to ER. Retrospectively get controls for people who haven't presented for psychiatric concerns
  • Suggestions: Visual presentation. X-axis: age, Y-axis different presenting reasons. Colored by density for each corresponding age-complaint combo.
  • x-axis: enrollment, y-axis: cuumalitive number, separated by color, vertical line for school start.
  • Future plans: Email Ahra ( to discuss data format and access to redcap.
  • Zipcode privacy:

2018 January 18

Yolanda McDonald, Human and Organizational Development

  • "Description of Health-based U.S. Drinking Water Dataset (2011-2015)
  • Database Structure: The database consists of public water systems violations obtained from the EPA and we linked sociodemographic variables from the U.S. Census. The rows of our dataset are public water systems (pwsid) and the columns are various variables (i.e. violation type, violation code, health-based (N/Y)). The dataset does not have counts of violations; the field is binominal yes or no.
  • Purpose of Study: We want to measure whether vulnerable populations at the county-level (i.e. minorities, elder, and uninsured) have a greater incident rate of health-based violations. The sociodemographic information has raw counts and proportion of the population. We are also interested in population size served by water system because it is a proxy for the training, funding, and managerial resources of the community water system. The population served by community water system based on stratum used by federal agencies for funding purposes:=50,001 (i.e. large). We have disseminated the dataset to see the health-based violations (Arsenic, Coliform, and TTHM) by pwsids by population size served by water system.
  • Statistical Analysis Strategy: We are considering either Poisson regression or negative binomial regression for data analysis based on data structure and purpose of study? The unit of analysis is county-level. As noted, we also want to stratify results based on population served by water system but are not sure our sample size is large enough. Are there any other recommendations that you have?"
  • May want to build separate models for naturally occurring contaminants and agricultural/industrial contaminants. If dispersion in Poisson model is too high, then should use a negative binomial model. An ordinal logistic regression model may also be useful for comparison.

2018 January 11

Joshua Cockroft, Medical Student & Heather Davison, Faculty Advisor

  • "VICTR resource request pre-review: interview-based qualitative protocol on educational outcomes of student patient advocacy project with questions regarding sample size and saturation."
  • Goal is to develop and publish a conceptual framework of educational outcomes for the hot-spotting program at VUMC. Plan to use qualitative interviews of 15 students to assess educational outcomes. There are three cohorts of students who completed the program at different times. Should explain what will be done if saturation is not reached with 15 students. In case there is an order effect (ex. interview fatigue), can randomize the order of interview questions. After the themes have been identified, may want to re-interview students using more specific questions.

2018 January 4

David Vago, Physical Medicine and Rehab/Osher Center

  • "We will need appropriate consultation for powering a clinical trial appropriately and establishing go/no-go criteria that are based on establishing a clear effect size in the primary mechanistic outcome measure(s) by the proposed intervention, the best methods for Demonstrating the safety and tolerability of the intervention(s) used in the study by a set of metrics, establishing a pre-planned timeline for reaching the randomization target and estimated drop-out rate, and demonstrating the appropriate analysis for demonstrating the changes in the primary mechanistic outcome measurea are associated with changes in the clinical or functional outcomea in the specified population."
  • Can calculate power using the maximum possible sample size (45 per group) and Hamilton-D (HD) score (degree of response). Support vector machine requires a minimum of 200 patients per candidate feature. Do not recommend looking at changes from baseline or using 50% improvement threshold to categorize success (responder analysis). Subtracting the baseline HD score from the follow-up HD score is not a valid change score. Outcome should be follow-up HD score in continuous form (not dichotomized), and model should adjust for baseline HD score (use spline to allow to be non-linear). Should also assess the interaction between feature and treatment effect (double difference or ratio of ratios). Can reword goal to "understand severity of depressive symptoms and what relates to that severity." Hakmook Kang has used high dimensional models to adjust for random effects for voxels.

Topic revision: r1 - 18 Jan 2021, DalePlummer

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback