Recommendations, Analyses, and Data for Health Services Research, Diagnosis, and Prognosis Clinic
Notes 2019


Ronnie Beaulieu, Infectious Diseases

  • Retrospective analysis of antimicrobial utilization over time relative to the implementation of a stewardship intervention. *I would like to discuss the analysis plan, and get assistance with applying to VICTR for biostatistics support.
  • VICTR voucher, mentor confirmed
  • Outcomes: total use in days of therapy (DOT) overall, and by drug; spectrum score, duration, acceptance rates, LOS, mortality, cost
  • estimate of 90 hours
  • deadline: end of June


Emily Sedillo, Master’s of Public Health

  • The project consists of doing analysis on survey data surrounding contraceptive prevalence, contraceptive type preference and attitudes surrounding contraception. The data is from Lwala Community Alliance’s “household survey” that was completed between 2018-2019 and includes survey responses from 5 rural regions in western Kenya. The overall research question is “What are the factors that influence the utilization of contraceptives in order to reduce unplanned pregnancies for women in Lwala’s catchment areas within Migori County, Kenya?”
  • VICTR voucher, mentor confirmed
  • Timeline very short, we will need information on the readiness of the dataset.
  • Estimate of 40-80 hours required. Single dataset with probability weighting.
  • Non-response likely due to sensitive data/subject.
  • Apply for VICTR ASAP and hope it's approved by end of year. If data is ready immediately, quick project timeline is likely doable. Issues with data will cause delays. Once the application is received, put in the queue for VICTR help from biostatistics. For application, put together detailed analysis plan and protocol.
  • Tables with missing data patterns, table of descriptive data, any comparisons/models requested.

Raymond Zhou, Infectious Disease

  • This is a retrospective chart review of refugee patients being treated at the Siloam Health Clinic. We motivated by a desire to improve the efficacy and efficiency of the Hepatitis B screening and vaccination efforts at Siloam and the refugee camps they receive patients from. Of note, Hepatitis B vaccination is ideally completed with multiple doses.
  • Primary questions we want to answer are whether the average number and spacing of vaccination doses differs according to various demographic variables, such as country of origin and age. Other questions include whether the average dose/spacing differs across patients positive vs. negative for HBV, HCV, and HIV.
  • Evaluate data integrity without manually reviewing each chart; what analyses are most practical and useful.
  • Categorize immunization adherence into four groups, calculate mean across variables of interest, and calculate (and emphasize) the confidence interval surrounding that mean.


Jourdan Holder, Hearing and Speech

  • Population: Cochlear Implant Recipients
  • Research Question: Does more consistent device use drive better speech understanding?
  • Question(s): We are looking to get some advice regarding a power analysis/sample size justification to address the following: We completed a correlation study in which the correlation between hours of daily device use (std. dev. = 4.2) and speech recognition (std. dev. = 22.5) was 0.6 with a slope estimate of 3.3 for n = 290. In the present study, we are interested in assessing causality (i.e., does increased device use cause better speech recognition?) via an intervention study in which the independent variable will be an increase in hours of daily device use and the dependent variable will be speech recognition.
  • For F32 grant to be submitted next week. SS calc can be simpler than actual analysis used. For SS calc, use one-sample t-test on the difference (within-person) between timepoints; or could estimate SS needed for certain CI width you'd like to obtain. Make a plot of the current patients with both timepoints existing, to give grant reviewers an idea of the data. For analysis, untransformed spearman correlation. In model, control for the average number of hours spent in quiet time per day. Also examine adherence to usage increase.


Jay Bagai, Cardiovascular Medicine

  • Previous visit: 2019-10-28
  • VA outcomes study seeking to determine the current usage and utility of nuclear stress tests for cardiac assessment prior to low-risk surgery.

Karampreet Kaur, Otolaryngology

  • This project is a retrospective cohort study of patients >18 y/o who received a tonsillectomy by Dr. Kim Vinson. Currently, no guidelines exist for when to offer tonsillectomy to adult patients. The aim of the project is to describe this patient population, analyze factors that contributed to the decision to offer tonsillectomy, and to analyze outcomes for these patients. I would like to run analyses look at indications for tonsillectomy in patients who were students vs. those who were not, and patients who were singers vs. those who were not. Additionally, I would like to look at the relationship between indication for tonsillectomy and age to see if there is a correlation. * Mentor confirmed


Jacob Beckstead, Pharmacology

  • Our interest is in determining whether metastatic progression in patients with colon cancer is associated (positively or negatively) with also having an atopic disease such as asthma. We have used the SD to identify patient groups with colon cancer (stage 2 or higher) with or without metastasis. Both control (no metastasis, n=361) and case (with metastasis, n= 906) groups have now been selected and have PPV of greater than 95%. We want to know how best to balance the groups and then the appropriate analyses to determine the association with asthma or such disease.
  • Ideally collect more patients with asthma (currently 10% of controls and 5% of cases). Include a negative control group.


Jay Bagai, Cardiovascular Medicine

  • Retrospective analysis of 1192 VA patients. We are looking at effect of transitioning from trans-femoral (TFI) to trans-radial PCI (TRI) in 2009. Prior to 2009, pts underwent mainly TFI and after 2009, pts underwent mainly TRI in years 2010-12. The problem is that patients were not randomized to either group, and we cannot compare TFI and TRI groups in 2006-2008 as there were too few TRI and in years 2010-2012 as there were too few TFI. In 2009- the transition year, comparisons are also difficult since pts were not randomized.
  • To determine variables to inlude in matching: assemble group of experts and determine which variables are required for a person to be assigned femoral or radial (without telling them which variables are already included). will reinforce variables included in model. In the model, interact calendar year with treatment. Cumulative volumes by operator are the stronger indicator of learning curve over time. Ways to proceed: (1) return to biostats clinic, (2) cardio collab plan, (3) VICTR voucher, (4) VICTR studio.

Jessica Kneib, Pathology/Transfusion Medicine

  • This project will look at the variability in ECP procedure costs across the country. We have collected data including chargemaster list prices, facility type, location (urban/rural), hospital referral regions and hospital size. We would like to determine of any of these factors are associated with variations in procedure cost.
  • variability in cost of one procedure across hospitals/providers (n=92). For graphics, use histograms with bins, as to not display individual hospitals. Model the prices with related variables; but use a robust method that can deal with outliers. Investigate the non-response bias.


Eva Mistry, Neurology

  • This is a prospective case-matched control study to understand the treatment effect of endovascular therapy in patients with vs without pre-existing disability. My specific questions are: (1) What are the minimum number of controls do I need per case if I were to match for x number of baseline characteristics? I also need help with sample size estimation. (2) Can I perform ordinal regression if the two groups of patients are expected to have an unequal "starting point" across an ordinal scale. For example, I am comparing the two groups of patients on a scale that goes from 0 to 6 at 90 days, but one group, at baseline as a score 0 or 1 and the other group as baseline score of 2-3. So at 90 days, I expect that the scores for the patients in baseline 0-1 groups will range from 0-6 but those in 2-3 group will range from 2-6. Can I perform an ordinal regression in this scenario? (3) If not, what are some appropriate alternate tests? (4) I also plan to perform cost-effectiveness analysis of endovascular treatment in disabled patients. Does this analysis require specific statistical expertise? If yes, who would be the most appropriate statistical collaborator?
  • Direct matching recommended, using all available samples. Sample size will depend on practical limitations, taking into account the variables you match on, in order to maintain a minimum saturation. Include baseline score as independent variable in model.

Alexander Hawkins, Surgery

  • Looking to assess the association between preoperative transfusion and disease free survival in patients undergoing rectal cancer resection. Favor a propensity adjusted analysis as the two groups are different at baseline.
  • For propensity-adjusted model, must have variation in the protocol for transfusion. Could model who gets a transfusion and who does not, then compare propensity scores to see how much they visually overlap. Could evaluate "dose-effect" of how much blood is tranfused.
  • Project was evaluated for a VICTR grant for biostatistical support. Statisticians agreed that the number of staff hours required to complete the project was within the standard grant limit.


Fiona StrasserKing, Cardiology

  • Descriptive study on HF in Zambia
  • HF in pregnancy prevalence and risk factors; FU for six months; screen for hf diagnosis (diagnosis usually made one month before delivery to 5 months after delivery)
  • 16% in pilot diagnosed with hf (incidence rate)
  • SS requires precision, width of confidence interval; recommended to provide a reasonable estimate of what sample size you can obtain, then estimate the precision that that sample will provide.
  • Recommend to spend more resources on aggressive follow-up and quality of study than obtaining more samples.


Monika Schmidt, Cardiology

  • Objective of the study is to examine how professional titles are used in the same mixed gender speaker introductions at national cardiology conferences.
  • Primary end point: determine whether the speakers professional title was used during introduction of presentation.
  • Secondary end point: determine whether the speakers professional title was used during anytime during the presentation.
  • Collecting race and gender of introducer and presenter; analyzing recordings of text of conferences from 2016-2019. Analysis: 1) descriptive statistics. 2) among proportion of people who use both professional title and name, analyze data at introducer-level. Notes: age-differential between introducer and speaker play's a big role. Statistical power will be driven by proportion of introducers which use both (as introducers who are consistent will not be included in analysis). Each introduction will be an observation in the dataset, with an indicator variable of same race vs different race. Analysis: logistic regression allowing for multiple rows per introducer. Include calendar time variable in analysis; look at interaction between calendar time and probability of using professional title in secondary analysis. Incorporate role (NP, MD, PhD, etc.) as well. Possibility of getting a VICTR biostatistics voucher.


Allison Wheeler, Pathology

  • I I am designing a retrospective project that will compare IUD use in women with heavy menstrual bleeding (HMB) with and without bleeding disorders. Our primary objective is to compare the efficacy of Mirena IUD as treatment of HMB in adolescents with and without diagnosed bleeding disorders by evaluating bleeding outcomes as well as to compare complications such as expulsion. Approximately 20-40% of adolescents with HBM are diagnosed with a bleeding disorder. My specific question is geared towards cases versus controls:
• Given the expectation that for every female with a bleeding disorder and HMB there will be 2-5 without a bleeding disorder, can I collect data in a 1:1 fashion or should I have more non-bleeding disordered patients in the analysis? * Could do 2:1 or 3:1 matching--not much utility after that. Could also do matching at site, or pool data and then match.


Christina King, Chemistry

  • I am interested in studying racial disparities in hypertension by comparing plasma samples from middle-aged normotensive and hypertensive African Americans. This is a pilot study that will be executed by using quantitative proteomics techniques.
  • VICTR voucher/Mentor confirmed (voucher for biostat for aim 1 & geneticist on board for phenotyping) VICTR may not like that it's already funded.
  • Specialized biostat faculty may be interested in this project, if some other funding approved. VICTR statistician needed initally for aim 1, but will need to check on how funding may work.
  • Consider definition of hypertension in study.

Naeem Patil, Anesthesiology

  • I am planning animal and clinical research study aimed at recruiting septic patients and studying certain bio markers to correlate them with disease outcome. I have questions about sample size calculations and power analysis.
  • Anesthesiology collaboration within biostat dept will give more specialized expertise than clinic. (Meeting with collab later today.)


Rajiv Agarwal, Medicine/Hematology/Oncology

  • I'm a new faculty member at Vanderbilt, and am part of the MSCI/K12 program. I'm hoping to learn more about how to design effective studies related to my research - on measuring outcomes longitudinally over time from palliative care interventions in patients with cancer.

Yuxi Zheng, Ophthalmology

  • Surgically naive students interested in going into a surgical field will perform a series of pre and post tests to assess for speed and accuracy under the microscopes in the wetlab. They will be randomized to either the 2d or 3d group and assessed on various tasks pre and post intervention.


Ronald (Ronnie) Beaulieu, Infectious Disease

  • We are interested in a time-series analysis to evaluate the impact of an antimicrobial stewardship intervention on antimicrobial utilization rates. Questions: How many data point/how much time to reach power? Statistical methods to analyze outcomes (time-series vs pre-post). Feedback adherence rates. Change in utilization/time. Mortality. Cost.
  • VICTR voucher/mentor confirmed
  • pilot study done - reviewing antimicrobial use for inpatient. time series analysis before and during the antimicrobial stewardship intervention
  • Q: What time period do we need to observe? been implemented since april twice weekly. 30-40 datapoints
  • Data: 0-18 patients per week; could look at: feedback day, weekly (2 feedback days), biweekly (4 feedback days); 5 teams: disease burden within each team is approx same
  • Analysis plan: account for confounding variables over time (institutional shifts, etc.); tracking adherence to feedback recommendations; censoring data for those who receive an infectious disease consult
  • Primary objective to look at reduction of days of therapy. Possible primary outcome: change in total antibiotic utilization (per thousand days)
  • next steps: identify patient population; visualize time trend; identify primary question that you want to answer/which outcome makes more sense to audience; check data variables that shouldn't matter (like LOS) to make sure they don’t; investigate patient-level data; state-transition model may help to visualize

David Wu, Cardiovascular Medicine

  • We are looking at protective association of lithium and cardiovascular diseases both epidemiologically using groundwater lithium and using patient data. We are looking for ways to finalize and strengthen our current results.
  • VICTR voucher/mentor confirmed
  • 1962 geological survey data (migration effect not accounted for - huge limitation) (bottled water/water filter use has changed over time) (area-level ecological analysis has its own problems, since different from individual data - subject to ecological fallacy) (try to adjust for age - mean age for country) (nonlinear adjustment for age and income)
  • current analysis: lithium rate/mortality correlation (county level mortality); regression adding income; currently assuming linearity (may be a threshold effect to account for); lithium levels transformed (sqrt, log); lithium also associated with prevalence of diabetes (unknown whether mediating or simulataneous)
  • "lithium protective for MI in bipolar patients" - need to dig into EMR (balanced levels of MI, why some pts are started on Li and others are not, etc.)
  • "lithium protective for MI in diabetic bipolar patients" - be careful with phe codes (how diabetes is denoted in SD) (hard to tease out confounding effects of diabetes)
  • confounding is so unknown, that p-value and "significance" not recommended. Use descriptive data, slopes, confidence intervals, etc.
  • for VICTR application, voucher would be to pull data from SD (need to talk to the SD folks to get details on that) (describe the group that didn't get lithium and for what reason)


Pingsheng Wu, Medicine/Allergy

  • Determine whether metabolite or combination of metabolites measured at birth can be markers of in utero exposure to smoking and even to specific product of smoking.
  • VICTR voucher
  • Maternal smoking or exposure-to-smoke (along with frequency of smoking). Data: pregnancy assessment monitoring system, surveillance of 4-6 months after delivery, assesses pre-pregnancy/in-utero environment. National survey. Proposing to use 2009-18 data, limited to 2 states, linked to newborn screening database (national, taken within 24 hours, 37 metabolites measured; standardized procedure done by state). e-cigarette usage and amount documented from 2016. hookah usage documented as well but not amount.
  • After quitting, metabolic pathway found to quickly recover. Never-smokers and those who quit before smoking have similar results. Third trimester has highest impact on child.
  • Can we identify metabolic pathways associated with effects of smoking in utero?
  • Subgroup analysis: can you identify second-hand smoking?
  • These biomarkers would have to work in a dose-response way.
  • Goal is to use biomarkers to determine whether intervention is needed (like vitamin C, shown to reverse detrimental effects of smoking on babies' lungs in utero).
  • Suggestions: Internal validation with bootstrapping. Use variable clustering to reduce 37 metabolites to ~7; tree with Spearman rho.


Emily Ambrose, Otolaryngology

  • Returning with research mentor to discuss chronic cough triage tool.
  • Lit review non-productive. Suggested by a mentor to use synthetic derivative to examine scope of problem.
  • SD contains clinical data + free text. Harvard has a guideline for cost-effectiveness analyses. Conduct a VICTR studio to assemble experts to determine clinical pathway/proper workflow. Starbrite website > funding tab > apply for VICTR studio. Specifically ask for people from certain departments.


Dave Patrick, Cardiology/Clinical Pharmacology (NO SHOW)

  • We will examine the correlation of a novel biomarker with clinical characteristics and laboratory values in patients with lupus (SLE). I am proposing to use univariate and multivariate analysis. I am preparing a grant on this topic. During the clinic, I would like to address a method for calculating power and necessary patient enrollment numbers for this project.
  • PGY7 Resident, future faculty, mentor excused.

Mike Lowry

  • We are looking at rates of serious infections in intravenous drug users during the opioid crisis. We will have discharge data from TN in the last 5-10 years that we will use. We will also plan to use nationally compiled data (discharge codes) to see how TN compares to the rest of the nation. We will look to compare the incidence of serious infections year-by-year in TN and then compare these to incidence nationwide. Our question is: what type of statistical tests can we use to properly show this?
  • Mentor confirmed and present
  • VICTR voucher request--suggest to return to clinic before submitting given suggestions provided
  • Goal is to compare trends over time in TN, and compare to national. Anecdotally seeing more complex infections in ID from IV drug users.
  • Plan is to use HEPC+ as a marker for IV drug use. This will capture some users but will also capture non users. Need to define group of interest (patients with infections or infection cases), and decide if hep c positive is a comprehensive enough marker for IV drug use. Need to also find out if data contain ED obs patients who ae d'c from ED.


Ashley Nassiri, ENT

  • Vestibular schwannomas are benign skull base tumors that have variable growth rates. Treatments include surgical resection, observation, or radiation. Because these tumors are benign, we are conservative with surgery and debulk, but generally leave some tissue behind if it is adherent to the facial nerve (which controls muscles of the face). Rather than damaging the nerve by trying to do a complete resection, we do a subtotal resection and have better facial nerve outcomes. This however may lead to future tumor growth, and we are interested in evaluating the factors associated with postoperative tumor growth after subtotal resection. We have collected tumor volumetric data (from surveillance MRIs after surgery for many years), patient demographics, and other important disease related metrics. We would like to analyze these factors to see which are associated with postoperative tumor growth.
  • Total resection or partial (to preserve facial nerve function). Baseline preop volume; and yearly follow-up, location of tumor, amount of tumor left after partial resection, demographic vars. Stable tumor size or growth after resection. Some patients have received radiation in follow-up (16/46), if tumor grows. Use tumor volume as longitudinal variable. Scan pre-op, then immediately post-op (within 12 hours), then every 6 months. Proportional scale or absolute scale? Adjust for initial volume, then capture change on absolute scale in post-op scans. After radiation, patients are censored. look at facial nerve outcome over time as well (scale of 1-6). (Facial nerve function affected by radiation.) Particularly interested in outcome at one year and what is happening within that first year. Will be applying for VICTR voucher. Two vouchers (analysis done at same time).


Emily Ambrose, ENT (walk-in)

  • Help in data collection for development of a chronic cough triage tool (questionnaire to take when setting up subspecialty referral appt). questionaire is developed and wanting to validate. purpose is to triage pts to the right clinic, in order to decrease cost and improve practice. tracking referalls would be difficult: referall comes from all over. just at the beginning of the project; to work with clinical experts in various field to determine clinical workflow.
  • VICTR support possible if project involves research (research into patient care, validating/showing impact of the questionnaire)
  • Take plenty of time to brainstorm/come up with a plan; i.e. avoid seasonal issues, find a control group. First steps: describe current workflow/current scope of problem (interview small number of patients, with clinical expert review), pilot test questionnaire
  • Come back to biostatistics clinic with your mentor when a more concrete plan is developed.


Gabriella Glassman, Plastic Surgery (Walk-in)

  • Survey results from plastic surgery programs: ~ 71 across US.
  • Recommendations: 1) Descriptive summary: Calculate distributions of responses for questions (no formal comparisons). 2) Formulate questions that you want to answer. 3) Apply for VICTR biostatistics support through Vanderbilt faculty supervisor, emphasizing why this project falls under the umbrella of VICTR ("implementation science project", looking at how training programs operate). If accepted, you'd be assigned to a staff biostatistician. VICTR voucher discussed; contact assigned faculty statistician to continue.


Stephen Gallion, Kiersten Espaillat, Neurology

  • Hypothesis: There is a positive correlation between the number of licensed county EMS vehicles per population in a given county and the prevalence of negative stroke outcomes in the same county. Secondary Hypothesis: The Social Vulnerability Index score can be a protective factor in areas with fewer trucks per population. Summary: This project seeks to analyze available data on the number of EMS vehicles, stroke patient outcomes, social vulnerability index score, and number of stroke centers in a series of Tennessee counties to identify whether or not the hypothesis (above) is supported.
  • 37 centers, county level data. Response time of more importance than distance. Group level data/ecological data tends to show reversal of data in large, urban counties
  • GIS analysis using ESRI, USGS, etc databases recommended. Small area analysis re health. Will return for clinic when a GIS expert can be present


Midya Yarwais, Pediatric Rheumatology

  • The aim of this study is to estimate the prevalence of medication non-adherence and identify demographic and disease characteristics associated with medication non-adherence in youth with childhood-onset systemic lupus erythematosus in the pediatric rheumatology clinic at Vanderbilt. Medication possession ratios (MPRs) will be calculated using pharmacy refill data for all immunomodulatory medications over a 2 year period of time to estimate medication adherence. Chart abstraction will be completed to obtain demographic and disease characteristics. We are seeking assistance from the biostatistics clinic to ensure that we collect the correct details required to accurately calculate MPRs and that we organize these details in a format that can be efficiently analyzed after export from the REDCap database. We would also like to review our planned statistical analysis to determine if it is an appropriate/feasible project for a biostatistics VICTR voucher.
  • Mental health and medication adherence in children with lupus, estimating adherence via refill status (medication-possession-ratio) need help calculating time-period. MPR calculated as total number of doses dispensed over period of time. did they fill their prescription enough (percentage); restrospective 2-year chart abstraction. Interested in examining duration of disease; are patients more adherent at the beginning of their diagnosis?
  • Limited in sample size, and therefore the number of covariates - 96 patients needed to accurately estimate possession (yes/no), without covariates, for 0.1 margin of error. Pharmacy refill data only go back 2-3 years, so limited to using current patients.
  • Suggest an outcome with higher resolution (ex: hours of gap time between prescription) or measure something more often in the same group of people (ex: blood pressure measurements every ten minutes). The outcome should be clinically meaningful, helping to advance knowledge or prove feasibility. Suggest to use two sources of data: pillbox cap detection validates easier collection of pharmacy refill data.
  • Secondary endpoint to evaluate MPR on outcome of disease index/severity, with a correlation coefficient. (Requires about 400 patients to get margin of error of 0.1.)
  • Could include young adults from young adult rheumatology. Or could try to recruit another pediatric rheumatology practice. Could include other diseases with similar characteristics. Could look at severity as time-depending covariate on adherence, after defining a good baseline. (Longitudinal data moving forward with current set-up would be an advance in current literature.
  • Think about what kind of conclusion you'd like to present, in order to determine the type of study required.


Yolanda McDonald/Kayla Anderson, Human & Organizational Development/Peabody College

  • This study (statewide survey of TN Public Water System Operators [N=3,608]) addresses the following research question: What are the current and future challenges that operators face in providing a safe drinking water supply for Tennesseans? We want to review the survey instrument (56 items) with a biostatistician to ensure that variables are optimally operationalized for descriptive and inferential statistical analyses.
  • Details: Aging infrastructure and aging workforce. Are there differences/different challenges in water quality for purpose (park, hospital, etc.), population density, education, etc.? Survey to be given to water operators. Goal of survey to address those differences, and to be used as a tool for education for the water operators, dept. of health, dept. of conservation.To be dispensed via email to the 85% who have email addresses on file and via hard copy to those who do not. Goal response rate of 80%. First time this is being done in the US statewide, so there is external interest in the results.
  • Recommendations: (1) Determine differences in those who respond and those who do not. You can include a question at the beginning asking why they do not wish to respond, if applicable. Plan for this, so that you can make judgements about the response bias. (2) Put thought into the cover letter. If there is someone they respect, a cover letter from that person encouraging them to respond could help. Often offering the results to the survey-takers is a good incentive to respond. Highlight how you plan to dispense the results to the survey-takers in the promo message. Emphasize anonymity. (3) Reformat questions: make likert questions into a matrix; change interval-scaled responses to numeric, continuous response. (5) Incentive of a raffled reward. (6) Analysis: descriptive, correlations, R package for exporting REDCap data. Means or correlation coefficients, plus confidence intervals, will be more useful than hypothesis testing for survey results. (7) Double-check with VICTR central about whether this can be funded by VICTR, emphasizing public health; future studies from this study will look at water systems and health outcomes. VICTR doesn't award grants post-grant award.


Shriya Karam, Epidemiology

  • Study on Ovarian cancer and BMI. Goal is to calculate mean and median of BMI values in each 6 month time interval from the primary cancer diagnosis date.BMI among women diagnosed with ovarian cancer. USing EHR data from the Synthetic Derivative, currently being processed. Will be using SAS for analysis. If BMI measurement around date chosen in 8-week window. All ovarian cancer cases, so assumed BMI will be measured around diagnosis to determine dose. First initiative is to try and characterize what the changes in BMI are. Accessing any record a patient has, in the whole system.
  • Can use time-varying covariates. Assumes that the BMI during the whole interval is the same. Windows are not uniform for every patient.

Jake Hughey, Biomedical Informatics

  • I’m studying the association between the presence of a preprint and the altmetric score and number of citations to the corresponding peer-reviewed article. This is an observational study, and I’d like to get feedback on my analysis and interpretations.
  • Interested in comparing metrics (altmetric score and number of citations) between papers that have a preprint and those that don't. Preprints can be updated (optionally) but are never removed. Preprints are relatively new in the life sciences. Most journals accept preprints, however some explicitly do not publish manuscripts which already have a preprint.
  • Analysis planned is regression, with log transformation on retention score and number of citations. Number of citations x preprints, adjusting for MeSH terms (assigned to almost every peer-reviewed model). Number of MeSH terms varies from one journal to another, so principle components are calculated journal-by-journal and planning to produce a different model for each journal with top 10 PCs for each model. Random effects meta-analysis model to provide aggregate estimates. Meta-regression then used. Using 4 years worth of data. Not including an interaction term between time since publication and preprint.
  • Suggest using mixed-model and simplifying the analysis. Suggest to use weight and height rather than BMI. (Data is set up in long format with multiple observations per person.) BMI changes over time - not fixed - so this method allows you to use all observations available.


Laura Wang, Dermatology

  • We are using a GVHD consortium data set to look at how skin GVHD disease progression may predict non-relapse mortality. We have the body surface percentage affected by erythema for followup visits at 6-months intervals, in addition to relapse date and death date. We would like to see how the rate of change correlates with non-relapse mortality.
  • Background: Bone marrow patients receive transplants which attack host cells. Skin is most commonly affected organ (erythema or sclerosis) and is best the clinical predictor of how well the patient will do. Type of skin disease and percentage of body affected
  • Study: Does rate of change in body surface area pct correlate with outcome? Outcome is overall survival or non-relapse mortality. There are 13 all-cause deaths in the current data: sample size becomes an issue, as you need 10-20 events per parameter in the model.
  • Suggestions: Model should include age, BSA% at visit 1 (initial erythema), and one of the change over time parameters (not both). Consider adding age as a non-linear term (age-squared, or restricted cubic spline) if you have enough patients in the model and can put an extra parameter into the model. Multiple univariate analysis are harder to interpret and not recommended, since we cannot tell how parameters affect outcome when accounting for one another. If including all incident cases, would need to add additional parameter to denote acute versus chronic, but extra cases would possibly allow you to add more parameters.


Alex Cheng, Biomedical Informatics

  • We are planning a prospective study to assess the relationship between treatment workload, capacity to manage care, and outcomes in patients undergoing treatment for breast cancer. We have put together a collection of surveys from PROMIS and other sources to give to patients over 5 months after the start of treatment. We need some help coming up with the proper analytical plan and sample size calculation for the study. A previous study that most closely resembles this one is this one However, we want to draw a more direct relationship between the imbalance of workload and capacity and outcomes.
  • Seeking VICTR voucher/help with study protocol
  • Survey data collected via RedCap for 1 medical center, survey is currently 96 Qs
  • Hypothesis: Imbalance of workload and capacity can results in worse outcomes in breast cancer. Patient workload (personal life + medical demands) versus patient capacity (resources available to the patient: finances, insurance, etc). Objectives: demonstrate the correlation of imbalance to health outcomes in patients undergoing breast cancer. n=104 (52 lost to FU)
  • Planning to perform MLR
  • Recommendations: likely need more cases, especially since half patients don't have complete data. 400 patients required to estimate correlation with margin of error of 0.1. Ideally reduce number of questions on survey to less than 30 - can give different questions to different people. Risk high non-response bias, since non-reponse is likely related to workload. Could collapse dimensions via factor analysis/variable clustering. Pre-specify the strategy of dimension reduction but not final summarization. Could follow-up with patients only once, randomising what time they are contacted, to increase independence and reduce number of dropouts.

Audrey Bowden, Biomedical Engineering (walk-in)

  • Hypothesis: clinic OCT can identify CIS (carcinoma in situ) against inflammation in bladder cancer. Training group has received a biopsy; ideally biopsy would be avoided due to comorbidities. Recommend to search for a graded histology (rather than binary yes/no) to train group to the highest signal.
  • First need to identify those included in the study and clear study outline. For sample size, base it off prevalence of parameter of most interest in population.


Benjin Facer, Epidemiology

  • I am using the National Cancer Database to compare outcomes between laparoscopic surgery and robotic-assisted laparoscopic surgery. I have run several comparison tests, which have resulted in various p-values and confidence intervals, but I would love some guidance on if I’ve used the right tests and am interpreting them correctly. Data is in R. Time frame of 2010-2014, with follow up through 2017.
  • Comparing robotic surgery versus laparoscopic surgery, for outcomes being measured are 5-year overall survival, conversion to open percentage, length of hospitalization. Only some have biopsy, so not everyone has a wait time between biopsy and surgery; explain to reader in manuscript that this is the case. Reasons given for type of surgery are not available. Robotic surgery available in 30% of hospitals and 70% of surgery; depends on availability of robot and surgeon experience level with use of robot. No randomized trials so far.
  • Need to ask preliminary question: what is the propensity for a patient to get robotic surgery? To denote randomness of getting robotic-assisted surgery. Need robot availability data, geographical location, busy-ness of robot, surgeon experience level, patient preference, etc. First paper: propensity model to use robotic procedure. Then in second paper, analysis to compare outcomes.
  • Is there a specialty that only does laparoscopic or only does robotic? Need to consider differences in patient characteristics, institutional characteristics, surgeon characteristics; case experience volume of individual surgeons or centers/institutions...
  • Models: Logistic regression for conversion to open (yes/no), including center-level characteristics, propensity score...; Cox model for length of stay in hospital. Will report the coefficient and confidence interval of the surgery type.


Sarah Osmundson, OB/GYN & Maternal/Fetal Medicine

  • 1) Want to compare patient-reported opioid use to use documented from track caps. 2) Have dates/time of opioid use after discharge for cesarean and want to graphically present data.
  • Outcome: pill unused (Pillsy cap pill tracking)
  • Want to compare patient report to Pillsy report. Also describe pattern of opiod use over time and interaction with ibruprophen. n=~176, ~ 100 with Pillsy. Online survey sent two weeks after discharge. Need to consider date of delivery (people d/c on different days post delivery). Pillsy data imported into redcap. Frank suggested event chart, plotting event over time. Time in days (or fraction) on x-axis. Could select five or so by algorithm to present in manuscript. Reccomend using delivery date not discharge date--could stratify by days in hospital. Possible time to event analysis, or time to milestone. Could do scatter/bubble plot of self report vs Pillsy.


Dylan Williamson (Walk In), Ped endocrinology

* Ashey Shumaker is PI. We need PI to provide maningful assistance. Question is related to z score and importing z score into database. Lots of problems with standardixation in the population.


Inga Saknite

  • Hematopoietic cell transplantation (HCT) is the only potentially curative option for an increasing number of patients with hematologic malignancies and other non-malignant conditions. 20,000 allogeneic HCTs are performed annually in the US. Graft-versus-host disease (GVHD) occurs when the transplanted immune system recognizes the host as foreign and mounts an immune response. Acute GVHD (aGVHD) develops in 30-60% of patients following HCT, is one of the leading causes of mortality in the immediate post-transplant period, and is associated with substantial morbidity and mortality. Both timing and accuracy of aGVHD diagnosis are important areas of unmet need in the first 100 days post-transplant. Although the diagnosis is relatively certain if multiple organ systems are involved (i.e., skin rash, diarrhea, and increased bilirubin), many of these correctly diagnosed patients die because it is difficult to halt the inflammatory cascade at this stage of clinical presentation. Treatment decisions are highly dependent on the diagnosis, and need to be made quickly. Early intervention is vital to reduce mortality, and identifying early signs of aGVHD before clinical presentation is an important unmet need. An imaging biomarker could lead to improved outcomes by supplementing clinical decision-making and reducing delays in treatment.
  • The pathogenesis of aGVHD involves the activation and expansion of donor leukocytes which mediate cytotoxicity against host cells. The inflammatory response causes increased expression of specialized endothelial proteins on vessel walls making leukocytes roll, adhere and eventually extravasate into the tissue at a high rate. The nature and kinetics of leukocyte migration are thus intimately connected to aGVHD pathophysiology. Other groups have described and characterized dynamic leukocyte motion by intravital microscopy in mice. Important parameters include the level of leukocyte rolling (number of leukocytes rolling per minute per vessel length), adhesion (leukocytes stationary >30 seconds), and the rolling leukocyte velocity. The level of leukocyte rolling and adhesion can be seven times higher in GVHD compared to control mice. Leukocyte-endothelial interaction has previously been observed by RCM in human skin, but has not been explored clinically. We will assess all three of these parameters as potential imaging biomarkers by testing their ability to discriminate presence from absence of aGVHD. Study ends when patient gets GVHD.
  • Aim: Test the feasibility of confocal imaging biomarkers in 30 patients to predict the development of aGVHD. We will track patients prospectively through multiple imaging sessions over the course of the first 30 days post-HCT. First, we will longitudinally image 15 patients over 30 days by using the Vivascope1500. We hypothesize that there will be a significant difference in the maximum number of rolling and adherent leukocytes between those who did and those who did not develop aGVHD within 60 days post-HCT. Second, we will image 15 more patients by using the high-speed, portable confocal microscope. We hypothesize that the high-speed, portable confocal microscope enables a more precise measurement of the quantitative parameters, and a reduced imaging time.
  • Question 1: What is the best approach to test the statistical significance of data of 2 groups (control vs. disease) when the data is acquired longitudinally (specific parameter changes over time after transplant)?
  • Question 2: We have preliminary data of a cross-sectional study (disease vs. control), 10 patients in each group, 2 parameters for each patient (number of adherent leukocytes, number of rolling leukocytes) at only 1 timepoint (NOT longitudinal data). For an R21 grant, we would like to discuss power analysis calculation.

  • Recommendations:
  • a) Investigate the correlation between adherent and rolling leukocytes. If there is some correlation, consider combing them in a model. Let the data speak for itself. Could be increased adherence and rolling prior to becoming GVHD. Three observations were excluded from graphs because they later developed GVHD, but they should be included in model. Use grade of GVHD (Booksberg scale). Goal is to predict (with a prospective longitudinal study) GVHD; this is hard to do with a dichotomous variable, would need a large sample size in order to do so. Determination of GVHD comes from multiple organ systems; but typically better to measure one system really well rather than dichotomising.
  • Think about this project as learning about trajectories; being able to classify by following trajectories or following trajectories of those who are already in either group. To estimate probability of GVHD without knowing prevalence, need at least 96 patients (just to estimate intercept of logistic reg model, without biomarkers). If you have an idea of prevelance, can estimate sample size with that known range and sample size will likely be smaller. Don't consider forced classification (GVHD or not) but rather use a tendency outcome. Typically requires minimum of 200 patients for only one biomarker.
  • b) If considering a proof of concept study, to see if something can distinguish the two groups, can search for a signal in the marker; allows for equal numbers in groups. Look at distribution of GVHD versus non. Nonparametric comparison of medians (Wilcoxon test) possible, however many observations still needed; power calculation based on Wilcoxon test required. (Large outlier in current data implies large possible variability.) Test as 0.025 level in current data. Pay attention to confidence intervals; if you want to CI to be half as wide, need 4x as many observations.
  • c) For longitudinal study, examine slope change. Longitudinal mixed model possible, however with pattern unknown hard to know how much data. Longitudinally, probably only able to describe data, rather than test it.


Garrett Booth, Pathology

  • QI project looking at US chargemaster costs for blood products. Help in statistical analysis for various blood product costs. Help in geographical mapping of cost data.
  • Background: Wants to be able to mentor others within pathology department about using biostats services. Every procedure carries a CPT code so that people can be billed. 1% of hospitals operating budget goes to path/blood products. No one knows true cost of blood.
  • Purpose of study: to identify true cost of blood that goes to patients and look for regional trends. What is the best way t olook at blood products? By type of cell? By procedure (some procedures require fractionating blood, which some insititutions charge for and others do not)? Can we identify differences in hospital costs? Would like to look at common procedures and look at how in line (or out of line) certain hospitals lie geographically and cost-wise. Goal to write comentary about limited biological supply and arbitrary billing structure. How much of cost can be attributed to geographical effects; how much of variation in cost is explained geographically? Geographic location captured by zip code in dataset. Goal: demonstrate difference, then speculate reasons why. 78 academic medical centers included in data.
  • Notes: hospitals in expensive cost of living areas may reasonably increase costs for indeterminable reasons. Will not be able to differentiate those reasons, so some hospitals may appear to go against regional economic trends. Useful to use relative charges (e.g. bed-days), rather than absolute charges? Rates for procedures among different insurance companies are not publicly available. Red cross (controlling 40% of blood supply) does not charge every hospital the same.
  • Recommendations: With the hospitals spread all over the country, so there's not much use in geographical mapping. Generally, geographical analysis can be performed using GIS, which will use zip codes and can bring in census bureau information. (Problem: catchment area for some hospitals is very wide, inter-state.) Could identify private/public health insurance as a proxy; tells you who is not paying out of pocket. Useful to gather population density by zip code, or by census tracts; using address/lat/long, map those to FIPS codes or shapes files (used by GIS) which have the characteristics to be used for geographical analysis. Cost data tends to be skewed, so nonparametric methods or log-transformation to normalise data is required. Storytelling using maps (thermometer plots) for comparing single products or grouping of products. Statistical model could include rural/urban variable (determined by popul density; accounting for other possible explanatory variables) and raw charges, to create raw model and map those against adjusted charges to determine what amount of variation is explained by measurable things and what is not explained. See how the amount of things not explained by variables in the model vary by region. (Rurality, number of hospitals/hospital beds per capita in catchment, etc available in census data.)
  • Next steps: come back to another clinic before applying for VICTR voucher to further develop research plan; talk to Health Policy department for information re: health economics (John Graves)


Rachel Koch, Surgery

  • I need help with coding of string variables from Redcap and then would like to confirm that I am using the correct test to compare groups given my data and perhaps also to discuss ways to find the most interesting results from all of the data. Mentor confirmed, may be late.
  • Project: Perceptions of underserved care in Kenya, by residents in program. Comparing residents who went through program before/after rotation was implemented.
  • Issues with likert scale responses: treating as linear and using mean, ties in data.
  • Recommendations: For analysis of survey data, give difference of means for unpaired data and margin of errors/confidence interval. In dataset, create 'long' data with one variable for likert score and one variable indicating in which group each participant is. Numerically code the likert-scaled variables (after ensuring ordering of string variables are correct using value labels) to use in t-tests. Use IF statement to select only those post-rotation to compare two Kijabe groups.

Madison Wright, Chemistry

  • This is part of a class assignment through Bruce Damon's Experimental Design for Biomedical Research Course. My project focuses on understanding protein-protein interactions as they pertain to protein folding. I'd like to address methods to evaluate data normalization of quantitative mass spectrometry based data sets.
  • Project: protein-protein interactions as it changes through disease-states. What are best methods to normalize the data? (Tuesday clinic may be more helpful.) Base protein is in all six conditions being compared. Intra-disease comparison of proteins (~1000) with base protein, and inter-disease comparison of each protein (difference between each protein and base) across disease. Thirteen (independent) runs.
  • Recommendations: Log-scale the data if copy numbers are low. Investigate correlation to determine the appropriate sample size. If 6 proteins from the six conditions in the same run, sample size is effectively 13. If no correlation, ss is 6x13. Specify compound symmetric correlation structure in protein between diseases (any pair of the 6 you measure is equally correlated) to estimate rho/correlation. In regression model, can choose to control for the base protein, could include raw number in the model but the starting value as a covariate. Multivariate analysis will include six dependent variables. Beta on log-term in model denotes fold change normalization.


Jenna Dombroski, Biomedical Engineering

  • Request: Maria and I are students in Dr. Damon’s PHAR 8328 Experimental Design course. My project is to test the efficacy of a vaccine I have developed to prevent 4T1 breast cancer in a Balb/c mouse model. Maria’s project is to synthesize a dual functionalized liposome which will target and kill circulating tumor cells in the bloodstream before they can form a distant metastasis. Our questions are related to pilot studies, sample size and avoiding bias.

  • Maria: colon cancer metastasis. Studying a protein which comes out of cancer cells when they move through blood to other parts of the body. Staining and cancer cell images; protein appears as spots (puncta) on the cell surface in imaging. Getting 10-15 images of the puncta in the cell line. Needing to analyze the puncta on image (define, number - variation of puncta across cells - and size, typical shape, distribution/location). Using ImageJ partical analysis function. Will eventually build an AI. Staining/imaging process is long. Cell sizes are approximately the same.
  • Advice: Could look at density of distribution of puncta across cells. When measuring multiple units which mimic/influence each other (where there is less variability), more cells do not necessarily contribute new information. Could look at nearest-neighbor distances to evaluate distribution/location. In presenting, state assumptions (e.g. that cells are the same size). If two measurements are highly correlated (variable clusters) don't need to compare across both measurements, only one. Will help to establish the dimensions that you need to deal with, in order to organize output of interest. Recommend to follow-up with animal research biostats clinic. Recommend displaying all raw data with current data, due to number of cell lines. Scatterplot: number of puncta by another characteristic, colored by cell line. Could look into research of characterizing data on a sphere (contact Tom Stewart for contacts), parallel coordinate plots.

  • Jenna: Testing efficacy of vaccine in mouse model. Has performed pilot study: 3 test/3 control. Initial results: reduced tumor size. Goal: reduced growth, increased survival. Primary endpoint is time until death, following all mice for 6 weeks maximum (time until established tumor size). Measuring tumor size with imaging, every 2 days. Batch effect of housing mice together is unknown; assumed no effect.
  • Advice: Longitudinal profile recording size of the tumor over time would give most information/more power, using an endpoint of tumor size. Make a decision about how long to follow the mice (e.g. at end of 6 weeks, end follow-up of all mice). For full study, the researcher taking tumor measurements will need to be blinded.


Brett Byram, Biomedical Engineering (VUSE)

  • We are interested in doing A/B testing of some images as a way to assess improvement, but we would like to have a brief chat about the experimental design and how to analyze the data before we go farther. Outcome: Grant/abstract *Two images, which one better, or are they similar? Question: Which is image is preferred by physician? *Design: Want to assess consistency, as well. 10 images, repeated. *Could assign 0.5 point to answer "C" (similar). *Could not do binary, could do a "slider" and capture how much a physician prefers the image. *If have small number of readers, they need to read more images. Then can checked intra rater reliability, and assume that these readers will be the same as the population of all readers. *Could also do three images, three sliders. Or instruct readers to assess the first, and then compare the others to the first one.


Lara Harvey, Gyn

  • A comparison of surgeon times and scores on 3 simulation trainer tasks before and after a training session in Haiti. Question regarding best statistical test to compare times and scores.
  • No funding support expected.
  • 7 surgeons testing 3 skill sets in laparoscopic surgery technique; want to evaluate time and OSATS score pre- and post-intervention. Recommended to use descriptive tables and figures to describe data. Inferential stats not recommended due to small sample size. Wilcoxon rank-sum may be used.

Alexander Hawkins

  • Overview: Robotic surgery, with articulated instruments and the ability to perform delicate dissection in the pelvis, has been thought to offer an advantage to traditional laparoscopy. The specific aim is to determine if there is a difference in the rate of negative margin status between patients undergoing laparoscopic versus robotic resection
  • Data: National Cancer Database
  • Design: Retrospective cohort of laparoscopic and robotic approach for patients undergoing resection for rectal cancer.
  • Endpoint: negative margin status
  • Funding: Will apply for VICTR biostatistics voucher
  • Recommendations:
    • Adjust for potential confounding due to surgeon choice using propensity score methods.
    • Seek biostatistics voucher
    • The hours of support required for this project are projected to fit within the standard voucher.


Reza Ehsanian, PM & R

  • Design: Cross sectional population based study.
  • Data set: Comprehensive pain reports categorically defined as head, spine, trunk, and limb pain; smoking history; demographics; medical history from a total of 2,307 subjects from the 2003-2004 National Health and Nutrition Examination Survey obtained from the Centers for Disease Control.
  • Objective: Examine the interrelationship between smoking and pain.
  • Have questions about the analysis conducted. Want double check our methods and potentially receive input on how to improve analysis.
  • No funding support expected, mentor to attend by phone.
  • Result of discussion: best course of action to obtain the data and start over, in order to appropriately defend the analysis. Since smoking is a key variable of interest, use pack-years, time since quitting, multi-level smoking status (e.g. never, former, current).

Yuri Kim, General surgery

  • Would like to conduct a retrospective review of comparing clinical outcomes in trauma icu patients who received palliative care intervention.
  • VICTR voucher, mentor confirmed.
  • Trauma subjects with palliative consult or no consult
  • Need sample size
  • Outcomes: utilization: LOS, cost '
  • Propensity vs. regression: see Frank Harrell's write up at
  • Frank: back up, consider what factors are important in real time.


Jake Hughey, Biomedical Informatics

  • I am using the SD to identify medications that are associated with false positive drug screen results (where the sample initially tests positive by immunoassay, but then negative by the gold standard mass spec). I would like to know if my approach, which is based on constructing 2-way contingency tables, is reasonable.
  • Background of study: Immunoassays designed to recognize specific drug or class of drug. Then confirmed based on a more specific assay (standard practice). Immunoassays can recognize other molecules/compounds than what they're designed for. Systematically going through SD to use lab test results along with medication information to determine which drugs associated with false positive screen.
  • Question of interest: What is the probability of having a false positive screen (of a particular sample having a positive screen result and a negative confirmation result)?
  • Output measurement: 0 (screen negative) or 1 (screen pos, conf neg). Retrospective review of what medications the patient had an order for in the previous 30 days (arbitrary amount of time, drug likely to be in urine by that time). Only prereq: patient had their very first visit at least 30 days prior to screen (urine sample), in order to know what medications they're on. Analysis excludes patients who had both positive screen and positive confirmation. There is a small percentage of patients who have negative screen and positive confirmation, but they don't fit well into the study framework. Two of the medication compounds are similar/overlapping, however the medications tested are fairly distinct. Each observation is an individual screen, so the same patient may have multiple observations. Confidence of capturing medication data in patients. (OTCs are not documented, PCP may not be at Vanderbilt, brand names/generic are grouped together into same variable. Testing 700 ingredients across screens. Looking at correlations in medication usage, calculating pairwise Pearson correlations between top ~20 ingredients.
  • Recommendations: Look at confidence intervals instead of p-values, as CIs will give information about magnitude. Candidates that need to be in the combination are the ones which are not independent of each other/those which co-occur a lot (based on raw counts): use a logistic model including all of these combinations and the second-order interaction of those which co-occur a lot. Need at least 200 events ("1" outcomes in the dataset) to stabilize the logistic model (at least 5 people - not measurements but actual people - must have had a false positive on that medication for that medication to be included.) Could extend the available data by stacking the data/combining data from all screens; correct later for faking the sample size.


Aaron Brill, Radiology and Epidemiology

  • Project: 35,000 patients treated between 1946 and 1968 for hyperthyroidism with different combinations of I-131, anti thyroid drugs and surgery. Mortality data updated thru 2015 on 90%. Therapy not randomized. Much Co morbidity and biased treatment allocation. Known small radiation risk. To avoid potential radiation risk anti thyroid drugs used preferentially. Need to look at how different outcomes correlate with therapy, including effects on longevity, a potentially positive effect. Data regarding I-131 risk has been analyzed in collaboration with NCI but has not included drug and surgical therapy and as the initial study PI I want to look at the data as a Phase 4 type study to look at unexpected correlations and need to find a statistical approach and a statistician interested and skilled in using the available tools needed for such an analysis. Data at NCI and their collaboration will be needed.
  • Hoping to have a more clear analysis plan by M Jan 14
  • Advised to call it an 'epidemiological cohort study' rather than Phase IV study.
  • With many comorbidities, database will need thousands of outcome events to use individual comorbidities. May need to use comorbidity index to approximate impact of comorbidities present. Will need to choose the appropriate comorbidity index for your project.
  • If dataset has baseline information collected prior to treatment allocation, then a propensity score could be included as a covariate in a regression model. What were the physicians thinking when they made the treatment allocation? Factors may include calendar time, etc.
  • Swedish paper excludes many patients in their cohort, which may cast doubt on methods. Comorbidities could be included directly as covariates.

Jae Jeong (JJ) Yang, Epidemiology

  • Project: I am working on a cohort study to examine the associations of baseline characteristics (i.e., lifestyle and dietary factors) with weight change during follow-up using a multivariate mixed effects model. I would like to have your comments on how to select adjustment variables for our mixed models.
  • 18000 patients in dataset with baseline time point. Outcome is continuous variable of weight. Exclude patients with severe disease at baseline. When a patient develops a severe disease, they are excluded, and when a patients reaches age 70, they are excluded from study. Data from Southern Comm Cohort Study. Follow-up data is collected at yearly intervals.
  • What are primary covariates of interest? Lifestyle, psychosocial factors, medical hx. With all covariates included in model, some are significant and some are not.
  • If goal is inference, recommended not to use a variable selection procedure and to include all variables. Automatic variable selection causes CIs to become too small and type I error rate is not protected. If goal is prediction, can use a variable selection method.
  • Due to size of cohort, the number of covariates included in the model are not a concern.
  • Analysis done by sex and race.
  • Outcome variable should be what you measure in the follow-up and baseline variables could be nonlinear. (For age and weight, could put variable + variable^2; or could put an interaction term in as a secondary analysis.)
  • One model: baseline covariates. Second model: baseline and follow-up covariates. (Test R^2 for change/effect of follow-up time points.) Third model: include interaction terms.
  • If lack of follow-up is due to baseline characteristics is related to issues other than baseline characteristics, need to state in limitations.
Topic revision: r1 - 15 Jan 2021, DalePlummer

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback