Data and Analysis for Clinical and Health Research Clinic Notes (2020)


2020 December 17

Amany Alshibli, Bantayehu Sileshi, Anesthesiology

  • We are conducting a retrospective analysis of perioperative data collected in REDCap as part of the ImPACT Africa program to understand the effect of the COVID-19 pandemic on surgical care and outcomes. I have done some work analyzing missingness of the data and have some questions on how this will affect my analysis and if we should perform imputation. I would also like to go over some questions from my univariate analysis and logistic regression if time allows. We are anticipating submitting a conference abstract by mid-January. Mentor confirmed.
  • Primary outcome is 28-day mortality. Goal to assess differences among Phase 0 (pre-pandemic), Phase 1 (pandemic with restrictions), and Phase 2 (pandemic without restrictions). There were fewer elective surgeries during Phase 1.
  • A patient's 28-mortality status is unknown if the patient could not be contacted during the follow-up period. If the outcome and covariates are Missing Completely at Random (MCAR) - based on clinical knowledge - and there are similar rates of missing data over time, then recommend running a complete case analysis and comparing the results to analysis of a multiply imputed dataset (R package 'mice'). The analysis using imputed data will be able to include more patients and have greater power. The chi-square test can be used to assess overall differences in the distribution of categorical variables among the three phases. If the result is statistically significant, then pairwise comparisons may be used to determine exactly which phases are different. Can plot histograms of case volume per day (week) separately for each phase with time on the x-axis. Recommend testing the daily (weekly) total number of cases in each time period. Can also use the Kruskal-Wallis test to assess the difference in mean case volume per day (week) across the phases. In the logistic regression model for 28-day mortality, should include calendar month as a covariate to adjust for seasonal time effects. Due to small referral counts by district, can categorize as Regional or Outside Region for each patient and assess differences across phases using the chi-square test. May also consider grouping districts into a smaller number of regions before testing. If the expected count in any cell is less than 5, recommend using Fisher's exact test. Referral counts can still be plotted by all districts.

2020 December 10

Jennifer Laws, Whitney Browning, Pediatrics

  • We have created a standardized nighttime curriculum focusing on subspecialty topics to enhance nighttime teaching as well as fill in topics taught during noontime curriculum, which is missed during nightfloat rotations. We have a pre and post test that interns have completed. Our question for the VICTR clinic is do we have the power needed to appropriately study the data and the best statistical tool to study the data. Mentor confirmed.
  • Recommendations: Use a paired t-test to compare scores from baseline to post 6 months for participants. Mean scores of controls who have not completed the curriculum can be described when comparing the mean scores of participants. To calculate power, try using the Power and Sample size software (https://biostat.app.vumc.org/wiki/Main/PowerSampleSize).

2020 November 19

Caitlin Jacowski, Uchenna Anani, Neonatology

  • Survey of physicians of 3 different subspecialties (OB/MFM, neonatology, Cardiology) of their opinions regarding interventions for infants with Trisomy 18. We are trying to see if there are differences in what is considered acceptable interventions between the subspecialty groups. We are also asking if it agrees with counseling styles. Mentor confirmed. VICTR biostatistics voucher.
  • Meeting Notes: Survey includes questions based of 4 different scenarios and questions on counseling styles. Plan to distribute survey to national professional associations and expect a 50% response rate (~2000 physicians).
  • Recommend Kruskal-Wallis test to assess differences in the 3 subspecialties. Dan Byrne will provide information on how to evaluate and minimize non-response bias, assuming the overall population characteristics are known. Document calendar time when participants complete the survey and subgroup into early vs. late responders (more like non-responders). In cover letter, include estimated time to complete the survey. Watch order of possible responses in columns and be consistent throughout survey. Recommend using PS software (https://biostat.app.vumc.org/wiki/Main/PowerSampleSize) to compute possible differences and sample sizes. May consider removing "neutral" as a possible response and force respondents to pick a side. Can create forest plots to analyze scenarios. Recommend having VICTR review wording of questions to verify clarity and pilot testing the survey. To improve response rate, may want to reach out to leadership to encourage participation and to highlight the benefits.
  • Recommend applying for VICTR Award for biostatistics support (90 hours). Application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/).

Marissa Brakefield, Bobo Tanner, Medicine - Rheumatology / Nephrology

  • Please provide a short description of your project and the questions you’d like to address: We are performing a retrospective safety study to determine if IV infusions of zoledronic acid in kidney transplant recipients are associated with acute kidney injuries or long-term renal damage. We will do this by analyzing serum creatinine pre- and post-infusion for each zoledronic acid infusion. We have data on 424 unique patients with many having received more than one zoledronic acid infusion. Typically, patients received one infusion approximately annually. However, most patients receive that is slightly over one year from their previous one. We need help with the entirety of our statistical analyses, but especially 1.) determining which software is best to use to store the data in way that will make statistical analysis easiest (i.e. Excel? Access?); 2.) determining how to account for the time discrepancies between infusions (i.e. patient A receives 2nd infusion at 1 year and 7 weeks after their first infusion, patient B receives their 2nd infusion at 1 year and 3 months after their first) and determining how to set-up our spreadsheet in a way that takes this into account; 3.) would it be best to do a post-hoc power calculation?; 4.) which confounding variables should we include? We have thought of several, including comorbid conditions, concomitant medications (especially nephrotoxic ones), smokers vs non-smokers, as well as other lab measures such as serum calcium, PTH, and serum phosphate if available. Mentor confirmed. VICTR biostatistics voucher.
  • Meeting Notes: Patients are prescribed zoledronic acid infusions to prevent bone density loss. Plan to use baseline serum creatinine (SCr) and have patients serve as their own control. SCr (and other labs) are drawn just prior to and following the infusion in addition to physician-recommended SCr measurements throughout the year. There is no maximum number of infusions; over 12 years one patient had 11 infusions. Plan to collect DXA (t-scores) over time.
  • Recommend using REDCap database to enter chart review data; see recommended formatting (https://biostat.app.vumc.org/wiki/Main/DataTransmissionProcedures). If repeated SCr measurements are entered into the database in a "wide" format, then the biostatistician can always translate the data to a "long" format. Recommend using survival analysis methods to assess time to AKI diagnosis. Patients who have to stop infusions should be censored on that date when calculate time to event. A long-term prospective adaptive platform study could be planned for the future.
  • Recommend applying for VICTR Award for biostatistics support (90 hours). Application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/).

2020 November 12

Matt Smith, Gary Smith, Radiology

  • We are proposing a new method to detect pulmonary perfusion using fluoroscopy. Providing an alternative screening test for pulmonary embolism with a fraction of the radiation, lower cost, no contrast required, and increased accessibility will greatly enhance our ability to detect PE and provide early intervention. A human validation trial is planned by acquiring fluoroscopy before, during, and after a balloon is inflated in a pulmonary vessel. We would like assistance in determining the best way to prove our methods detect a reduction in perfusion and how many patients we should include. Mentor confirmed.
  • Meeting Notes: Recommend to collect the data after blinding readers to reduce bias when comparing the two methods. Collect information on number of defects and area of defect for each method. Then we can calculate sensitivity, specificity, and agreement between methods. Constructing 2x2 tables (defect: yes or no, method: 1 or 2) can also help determine whether there is a relationship between these two variables.

Kevin Schey, Biochemistry

  • 5 molecules were measured in human retinas and we are interested in: 1) age differences, 2) correlation b/w right and left eye, 3) sex differences, 4) central vs. peripheral differences. VICTR biostatistics voucher.
  • Meeting Notes: To see if there is a different in A2E by region, conduct a Wilcoxon-signed rank test on paired data. Can also use this test with A2E levels as the outcome and other covariates of interest like sex. To see whether there is a difference in A2E by age, sex, and other covariates together, may want to consider regression analysis or longitudinal analysis to account for correlation. To see the relationship between A2E and A2DHPE, start by calculating Spearman's correrlation and create a scatterplot. If there is reason to believe that this second compound affects A2E, may want to include it as a covariate in the analysis. Recommended to apply for VICTR biostatistics voucher.

2020 November 05

Katherine Black, Michael Dole, Pediatric Gastroenterology

  • We are looking at the epidemiology of ingested foreign bodies pre and during COVID. The questions revolve around how best to do the statistical analysis looking at risk factors (some binary, continuous, categorical) and outcomes (binary and categorical) and stratifying based on date seen. If possible, we also have a few stata based questions we would like help on as well. Mentor confirmed.
  • Meeting Notes: Have finished data collection for time periods Mar-Jul 2019, Oct 2019-Mar 2020, and Mar-July 2020. Can bootstrap confidence interval around the median. Recommend using Fisher's exact test or likelihood ratio test for location and type of foreign body. Can also calculate expected number of events (e.g. required endoscopy) in a later time period given the proportion of events in an earlier time period. Recommend creating a binary variable for COVID time period (Y/N), then build a logistic regression model to calculate odds of endoscopy in COVID vs. non-COVID period. Can create side-by-side histograms for age in years for each time period. Can also create spline graphs for age in years.

2020 October 22

Ashton Lehmann, Otolaryngology/Rhinology

  • I am seeking to conduct a mixed-methods observational pilot study on shared decision-making and outcomes in a prospective cohort of patients with sinus disease who are considering/deciding on available management options. We will collect baseline and longitudinal decision-making metrics and patient-reported quality of life outcomes metrics (all survey-based) as well as data regarding social determinants of health, general health characteristics, and sinus disease-specific characteristics. We aim to identify baseline factors that predict decisional conflict in these patients (Aim 1), what factors predict decisional satisfaction and quality of life outcomes in these patients (Aim 2), and to characterize the experience of these patients regarding their sinus care and decision making (though semi-structured qualitative interviews of a subset of this cohort, Aim 3). I would like feedback and advice on my statistical analysis plans and power calculations for these 3 Aims. I am currently compiling a VICTR proposal to apply for biostatistical assistance for this pilot study. VICTR voucher request.

  • Meeting notes: For aim 1, first assess the correlation between two variables using Pearson’s or Spearman’s correlation coefficient depending on the distribution of variables. Create a scatterplot to visualize the relationship. Then conduct linear regression at baseline adjusting for other covariates of interest. If there is reason to believe that medical and surgical patients will have different values in SDM, Decisional conflict, and QOL scores, may want to stratify the descriptive table (Table 1) by two groups. For aim 2, conduct a linear mixed effects model adjusting for baseline value as a covariate with random subject effect. Can create a spaghetti plot to see how measurements change over time in individuals. Recommend to apply for VICTR biostatistics voucher. Application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/).

2020 October 15

Dana Brantley-Sieders, Medicine/Rheumatology

  • Please provide a short description of your project and the questions you’d like to address: Our previous work identified EphA2 as a potential mediator of resistance to HER2-targeted therapies in cell line and xenograft models, though the clinical relevance in human cancer has not yet been established. The goal of this study is to determine if elevated expression of EphA2 and/or associated Src proteins correlates with poor outcome and resistance to HER2 targeted therapies in residual disease. Clinical collaborator Dr. Brent Rexer and his staff will search patient records to identify HER2+ patients with residual/recurrent disease following treatment with HER2 targeted therapies (e.g. Herceptin, Lapatinib). Dr. Rexer will then communicate with pathologist Dr. Melinda Sanders and her group to pull FFPE tissue samples from the archives and prepare sections for Dr. Brantley-Sieders’ laboratory to stain with validated anti-EphA2 and anti-Src antibodies (de-identified samples). Dr. Brantley-Sieders’ staff will stain and score staining based on % positive cells and intensity, creating a numeric score from 0-3. After scoring, Dr. Brantley-Sieders will communicate with Dr. Rexer’s group to correlate staining score with clinical outcome based on patient records. This study will be used to generate preliminary data for a grant application in which mechanistic and pre-clinical studies will be performed to determine if EphA2 and/or Src inhibitors can alleviate resistance to HER2-targeted therapies. The ultimate goal is to identify breast cancer patients likely to benefit from an experimental therapeutic designed by Bicycle Therapeutics (BT5528, currently in early clinical trials). VICTR voucher request.
  • Meeting Notes: Have tissue samples for 96 patients. Elevated expression defined as numeric score of 2 or 3. Can build a logistic regression model for poor outcome (Y/N) with predictors EphA2 expression (2/3 vs. 0/1), Src protein expression (2/3 vs. 0/1), and an interaction between the two expression variables. A similar model can be built for resistance to HER2 targeted therapies (Y/N). May also consider building additional models with expression coded as an ordinal variable (0, 1, 2, 3).
  • Recommend applying for VICTR Award for biostatistics support (90 hours). Application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/).

Meredith Mattlin, Ray Blind, Diabetes, Endo and Metabolism

  • A student (Meredith) has gathered up Synthetic Derivative EMR data, which we want to analyse. We’ve collected all records in the SD that contain opioid, hepatitis and liver cancer ICD codes, and want to ask if there are statistical correlations between these three disease states. Mentor confirmed.
  • Meeting Notes: Should exclude cases which have liver cancer prior to diagnosis of opioid abuse. Control subjects (no opioid abuse) should be selected using a similar calendar time. Recommend collecting additional clinical variables for all patients in the cohort and reformatting database (https://biostat.app.vumc.org/wiki/Main/DataTransmissionProcedures). Recommend summarizing additional descriptive statistics stratified by groups of patients (no diagnosis, opioid abuse, opioid abuse and liver cancer, etc.). Create matched pairs of cases and controls using the variables calendar time in system, hepatitis status, etc., so the matched pair is similar in every way except the opioid abuse diagnosis. Use all matched pairs to build a logistic regression model for liver cancer with predictors opioid abuse (Y/N) and all factors used to match cases and controls. Can then calculate odds of developing liver cancer for a patient diagnosed with opioid abuse compared to an otherwise similar patient who was not diagnosed with opioid abuse.

2020 October 08

Chelsea Gorsline, Gowri Satyanarayana, Infectious Disease

  • Previous clinic session July 16, 2020 & July 30, 2020
Presenting updates to prior Biostats Clinic discussions in July. Will focus on Survey portion of 2 part study looking at antimicrobial use in the management of febrile neutropenia. Would like to review preliminary survey data and discuss best ways for analysis and presentation of survey data. Will bring basic tables and/or graphs with survey data to review. Mentor confirmed.

  • Meeting notes: Add percentages for each frequency in the table. Suggest to flip rows and columns, displaying the answer choices as rows. For checkbox type questions, tally the number of "Yes" responses for each answer choice. Separate out answers to 6 columns corresponding to 6 answer choices and tally the numbers in each column.

Sean Berkowitz, Shriji Patel, VEI - Ophthalmology

  • We are working on a project evaluating demographic trends in FDA drug approvals. We would love to vet the statistical approach as well as tests we have done on the data. We would appreciate your expertise regarding the approach to tests of association/fit with large sample sizes (and alternative approaches). Mentor confirmed.

  • Meeting notes: Use a Chi-square test to compare the association between participants by race and source (study versus general population). To compare the change in proportion of race over time (White and All Others), conduct a logistic regression in R using glm. The outcome here is the proportion of Whites and Other races joined by "cbind". Run 3 models: the first is with time as the covariate to see the change over time, the second is with time and source as covariates to see if the proportion in study is less than what you would expect based on the general population, the third is a model with an interaction between time and source.

2020 October 01

Olivia Boorom, Department of Hearing and Speech Sciences

We have collected longitudinal behavioral data (at three time points) on a cohort of infants, and would like to compare growth trajectories between behavioral measures. We’re unsure of the best statistical method for both modeling these trajectories and comparing different trajectories against each other. Mentor confirmed.

  • Meeting notes: Start by creating spaghetti plots to see the trajectories over time. For univariate analysis, can compare Spearman's correlation between different measurements at one time point at a time, and also conduct Mann-Whitney U tests. For multivariable analysis, recommend using a linear mixed effects model adjusting for random subject effect. Start with complete case analysis. Need to decide whether baseline measurement will be adjusted for as a covariate. If one assessment is hypothesized to predict another assessment, may want to adjust it as a covariate in the model. Use splines or measurement variable squared if believed to be U-shaped. If doing the analysis in R, recommend using lme4 package for models and emmeans for plots.

Soha Patel, OB/GYN, Maternal-Fetal Medicine

Induction of labor methods and outcomes; previously submitted for publication but need additional statistical analyses completed. VICTR voucher request.

  • Meeting notes: Start by describing the sample and their scores pre- and post- for both groups. Conduct Wilcoxon signed rank tests to see how knowledge changes between time points for each group separately. Then, use the change in knowledge as the outcome and compare how this variable differs between groups by using a Mann-Whitney U Test.

2020 September 24

RESCHEDULED: Soha Patel, OB/GYN, Maternal-Fetal Medicine

Induction of labor methods and outcomes; previously submitted for publication but need additional statistical analyses completed. VICTR voucher request.

WITHDREW: Joshua Lawrenz, Orthopaedic Surgery

This project is seeking to improve our ability to diagnose soft tissue tumors of the extremity prior to surgery. Anticipate using logistic regression to create a predictive model that could help clinicians. Am collaborating with medical image processing team on utilizing their expertise into better understanding images, and incorporating them into a predictive model. Questions to address: 1) Study design- predictive model build 2) Sample size needed 3) Application of model. Mentor confirmed.

2020 September 17

Dylan Knox, Julie Pingel, Pharmacy

  • Methadone is a medication that has been studied for and is associated with QTc (measurement on an EKG) prolongation in adult patients and currently there mixed data in pediatric patients regarding QTc prolongation. QTc prolongation is important because it can lead to serious side effects such as fatal cardiac arrhythmias. QTc prolongation when methadone and another medication known to cause QTc prolongation used concomitantly has not been studied. The primary objective of this study is to evaluate the frequency for QTc prolongation in patients on methadone and another medication known to cause QTc prolongation. Secondary objectives are to identify incidence of concomitant QTC interval prolonging medications with methadone, evaluate change in QTc interval from baseline to highest QTc interval, and evaluation of EKG practices. There will not be a comparator group. Would like to address: appropriate statistics for endpoints, power calculation when no comparator group and unknown incidence in general population. Mentor confirmed.
  • Clinic Notes: Plan to use standard formulas to calculate QTcB and QTcF. All patients were in the cardiac ICU. Not all patients have a pre-drug baseline EKG.
  • Ideal to use each patient's own baseline EKG as a control. Important to note the proportion of patients who do not have a baseline EKG. Recommend generating spaghetti plots of uncorrected QTc over time (baseline on x-axis, follow-up on y-axis) and marking where drugs were started. These plots can be used to discover trends. Recommend adjusting individual patient plots for baseline uncorrected QTc and changing the line color when medications change to denote days on medication(s). If less than 70-80% of patients have baseline, then exclude patients without baseline and run a longitudinal analysis to discover long-term trends. It would be a worst case scenario if data are missing because the patient got sicker (not missing at random), and this would make analysis very difficult.
  • May consider applying for VICTR Award for biostatistics support (90 hours). Application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/).

2020 September 10

Stephanie Rolsma, Pediatric Infectious Diseases

I will conduct a PopPK study of 100 patients on ECMO or CRRT receiving certain antibiotics, with primary output the effect of dose and dosing interval an achieving therapeutic drug concentrations. My attendance at this clinic is part of my required coursework for MSCI 5009-Biostatistics I with Dr. Byrne. At this clinic I would like to discuss a few simple, general questions about population PK studies and modeling such as good online or text resources and software to start learning more about this topic.

  • Meeting notes: Try to collect the data at multiple timepoints and at baseline to see the change in drug levels in the blood over time. Can fit individual PK curves and also the population PK curves from this data. May want more frequent sampling before reaching optimal level and less frequent sampling once it reaches that level. It may be interesting to look at time to reaching optimal threshold as one of the analysis. Another interesting analysis may be to characterize those who are less likely to reach optimal levels.

Katie Davis, Radiology/Women’s Imaging

We are comparing outcomes of screening mammography rates during COVID between rural and urban imaging centers in Tennessee.

  • Meeting notes: Accuracy of estimates will be an issue due to limited sample size if choosing to conduct a logistic regression. As a rule of thumb, 10-15 outcomes (number of rural facilities) are needed for each parameter included in the model. Start by first describing and summarizing the data using Chi-Square statistics stratified by setting (rural/urban), and also brainstorm which covariates are important to include in a statistical model.

2020 September 3

Ivana Thompson, Obstetrics & Gynecology

  • We are looking at the frequency of complications charted with different contraceptive methods. How should I compare the complications among different devices?
  • Can report descriptives statistics for categorical variables (n, %) and continuous variables (median, interquartile range) stratified by contraceptive device. Recommend using chi-square test for categorical variables and Kruskal-Wallis test for continuous variables to assess differences among contraceptive devices. If there are small counts, may need to collapse categories. Can fit a predictive model to determine which demographic characteristics have higher risk for complications. May want to contact Yu Shyr regarding long-term Biostatistics collaboration plan for future research.

Rand Pope, Wesley Thayer, General Surgery/Plastic Surgery

  • Examining differential cytokine and interleukin markers expression of cutaneous squamous cell carcinoma pathology samples between immune suppressed and competent individuals. Questions for clinic: are we planning on appropriate analysis methods? Can the clinic assist in analysis? What is the turnaround time? Will our data extraction format be functional and easy for statistician to utilize? What is the turnaround time on analysis? Mentor confirmed. VICTR voucher request.
  • Clinic Notes: Pilot study included 20 patients. Found TNF-alpha expression was decreased in immune-suppressed patients with no previous cancer histories. Did not follow patients after the procedure. Plan to recruit 20 patients in each group to increase power of statistical analysis.
  • Recommend applying for VICTR Award for biostatistics support (90 hours). Application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/). Contact Dan Byrne for assistance with writing statistics portion of application. May want to contact Yu Shyr regarding long-term Biostatistics collaboration plan for future research.

2020 August 27

Raymond Zhou, Infectious Disease

  • Previous clinic session Monday 12/9/2019
  • This is a retrospective chart review of refugee patients being treated at the Siloam Health Clinic. We motivated by a desire to improve the efficacy and efficiency of the Hepatitis B screening and vaccination efforts at Siloam and the refugee camps they receive patients from. Of note, Hepatitis B vaccination is ideally completed with multiple doses.
    • Primary Question – Are there differences in dosing schedule (timing between doses, amount of doses) of HBV vaccines between asylees vs refugees vs standard of care?
    • Secondary Question – Are there differences in dosing schedule of HBV vaccines between asylees/refugees from different countries and/or camps of origin?
    • Multivariable Regression Analysis –
      • a. Primary Variable asylee vs. refugee status
      • b. Primary Outcome - # of doses
      • c. Secondary Outcome- timing between doses continuous or dichotomous?)
      • d. Covariates (country of origin; camp of origin; sex, gender; HBV, HIV, HCV, status; Difference between origin camp and country of origin
    • Questions for biostats –
      • a. Should timing of doses should be characterized as a continuous or dichotomous (compliant or not?)
      • b. Can we accurately compare the dosing schedules of our experimental groups and the standard of care (1, 3, 6 months)?
      • c. Would this require that we find another “standard of care” population?
      • d. Should we exclude vs. control for HBV status?

  • Meeting notes: Recommend to describe the subjects in the data first, perhaps stratify by vaccination status (yes/no) or other covariate of interest. Can run Chi-Square statistics from the stratified table. Then, create new variables to define whether vaccination was met at second and third time points using a window of +/- 14 days from expected vaccination date, and summarize number of people vaccinated on time. Can use binary logistic regression for main outcome to characterize who is more likely to receive vaccination overall and also at second or third time points. However, need to be careful of interpretation and making inferences, because reason for not receiving vaccination may be confounded by other factors.

Mentor confirmed, VICTR voucher request

Karampreet Kaur, Maternal Fetal Medicine

This is an anonymous REDCap survey study of patient and provider perspectives on the use of telehealth for prenatal care during COVID-19. All provider surveys are collected (78) and patient data collection is ongoing. Need help with powering our study–specifically how many patients do we need for meaningful. Further discussing options for running stats on our data. Mentor confirmed. VICTR voucher request.

  • Meeting notes: Need a minimum of 384 respondents to estimate a proportion with a margin of error of +/- 0.05 (95% confidence). With 96 respondents, can estimate a proportion with a margin of error of +/- 0.1. With 71 respondents, this will result in a margin of error +/- 0.12. The small number of respondents is a limitation that needs to be acknowledged. First create some descriptive summaries of the two groups (patients and providers) for each covariate/survey element. Then, use a Wilcoxon rank sum test to compare whether there are differences between two groups for satisfaction and other outcomes of interest.

2020 August 13

WITHDREW: Brittany Cowfer, Pediatrics

  • Our project is an educational intervention for pediatric residents on leading debriefing sessions (based on existing MedEdPORTAL curricula) after distressing events. We plan to use pre-session and post-session surveys (primarily Likert scale-based) to evaluate for session effectiveness and changes in self-efficacy. Data will be used more for session improvement/institutional purposes rather than with intent to publish. We are inquiring about cost of basic statistical support to evaluate for significant changes in comfort/likelihood of leading a debriefing session after educational intervention. Mentor confirmed. VICTR voucher request.

2020 July 30

Chelsea Gorsline, Gowri Satyanarayana, Infectious Disease

  • Previous clinic session July 16, 2020
  • Survey with retrospective review and use of DOOR/RADAR for evaluation of antibiotic practices amongst Heme/Onc providers at VUMC. This is a follow-up of clinic on 7/16 where we discussed use of DOOR/RADAR. This session will be to review our Survey in detail with the group following some suggested changes from the SRSR. Mentor confirmed. VICTR voucher request.
  • Meeting Notes: DOOR categories 1) alive without complications, 2) alive with one complication from treatment, 3) alive with two complications from treatment, 4) alive with three complications from treatment, 4) dead. Should we add a question about provider's region to account for differences in care across hospitals?
  • Recommend using REDCap slider bar (without numbers) for each question and analyze as a continuous variable. Make sure provider knows that the slider needs to be touched/moved even if their response is 50; otherwise, the data will be missing. Can attend REDCap clinic for guidance. Be clear about which answers are mutually exclusive (select only one vs. select all that apply). Include anticipated time to complete the survey in introduction. May be better to include demographics at the beginning of the survey, so you know the demographics of those who end up abandoning the survey. How will you assess differences between those who complete vs. those who do not complete the survey? It is helpful to know the characteristics of the total population of providers who receive the survey. Note that early responders to the survey will be inherently different from later responders.
  • Can apply for VICTR Award for biostatistics support (90 hours): application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/).

David Brooks, Justin Gregory, Neonatology

  • We are examining the accuracy of point of care glucose meters compared to gold standard plasma glucose samples in critically ill neonates admitted to the neonatal intensive care unit. The question we have is determining the “n” needed for our study. Mentor confirmed.
  • Meeting Notes: What impact will hematocrit or clinical perfusion play on the sample? Plasma glucose sample collected using capillary heel stick. Previous study in healthy infants reported 99.7% accuracy within +/- 15 mg/dL for POC glucose meter. Plan to collect two samples by each method at the same time.
  • See observer variability and agreement in http://fharrell.com/doc/bbr.pdf. Report mean absolute difference and confidence interval. Calculate sample size using current CI. If want CI to be 1/2 current width, then need 4x more subjects. Recommend calculating difference between glucose measurements and including this as a column in the database. NQuery software includes sample size calculations for bioequivalence. Can report R^2 to determine impact of hematocrit.

  • Attendees: Amy Perkins, Frank Harrell, Shawn Garbett, Dan Byrne, Dale Plummer, Chelsea Gorsline, Gowri Satyanarayana, David Brooks, Justin Gregory

2020 July 23

Raymond Zhou, Vanderbilt Eye Institute

  • Previous clinic session April 16, 2020
  • To correlate known single nucleotide polymorphisms (SNPs) in CXCL8:CXCR1/2 with Diabetic Retinopathy (DR) susceptibility and progression in clinically-validated cohorts of patients with Diabetes Mellitus (DM). Neovascularization in DR is currently managed by targeting vascular endothelial growth factor (i.e. anti-VEGF therapy); however, a significant portion of patients are unresponsive to treatment suggesting alternative pathways of inflammation and angiogenesis contribute to disease. We hypothesize polymorphisms in the genes CXCL8, which codes the cytokine Interleukin-8 (IL-8) and CXCR1 and CXCR2, receptors of IL-8, will be associated with DR susceptibility and progression in patients with DM. We have developed and begun validating cohorts of patients with DM with and without DR using Vanderbilt’s adult synthetic derivative (SD). These cohorts were developed using an algorithm that considers a combination of billing codes, common procedural technology (CPT) codes and availability of DNA. To assess the clinical validity of each cohort, we have manually phenotyped each record confirming algorithm parameters in conjunction with the de-identified medical record. To test our hypothesis, we will determine the prevalence of previously described SNPs in CXCL8:CXCR1/2 in our clinically validated cohorts and compare these to known SNPs in vascular endothelial growth factor and its receptor (VEGF:VEGFR). We would like assistance in executing sample size calculation as previously discussed during previous meeting (4/16/2020), when mentor was present. Mentor may not be able join for this session (Dan/Chang okayed mentor possible).

  • Meeting notes: The first step is to look at the distribution of the 3 genotypes within each outcome group and ensure there are adequate numbers for each group. May want to split the analysis into two parts because two groups are on the same ordinal level, or may want to treat those two groups to be the same if using the ordinal variable as the outcome. Patients who cannot be categorized into mutually exclusive categories should probably be excluded from the analysis (can be done later). The 15:1 ratio applies for the number of parameters (coefficients); for a variable with 3 categories this will be 2 parameters. Recommend prioritizing covariates to included in the model. May want to try calculating sample size using nQuery or R software, since PS does not have the option for ordinal variables.

  • Attendees: Ahra Kim, Chang Yu, Dan Byrne, Raymond Zhou

2020 July 16

Pauleatha Diggs, Sarah Stallings, Department of Medicine

  • Proportional odds model for survey data – determining whether indep variables (survey scores) should be treated as continuous or grouped into categories for analysis.
  • Meeting Notes: Survey questions (Likert scale) regarding health-seeking behavior in African American men. Ran proportional odds model in SPSS; did not pass test of parallel lines.
  • In a regression model, add up effects of multiple factors, so do not need to worry about error regarding zero cell counts across combinations of variables. Test of parallel lines is not valid, and you do not need to report this. Can look at residual plots. Check proportional odds assumption by taking most important predictor and stratifying into 4 intervals (unless already binary) and stratifying dependent variable into 4 intervals. Take logit of cumulative proportions and plot data; verify the curves are parallel. Recommend including independent variables as continuous to maximize power. Only when there is competition among subjects (ex. grading on curve), then may want to consider categorizing independent variables into quartiles.

Chelsea Gorsline, Gowri Satyanarayana, Infectious Disease

  • We are conducting a retrospective review and analysis of antibiotic practices in hematologic malignancy and stem cell transplant patients with febrile neutropenia at Vanderbilt University Medical Center (VUMC). Data collected will be analyzed by comparison of outcomes for those patients who received longer duration of antibiotics versus those who received shorter duration of antibiotics. Our hope is to show that a de-escalation approach is safe at VUMC. Our question is what is the best approach for extracting appropriate patients from eStar using diagnosis codes to be included in the retrospective review.
  • Meeting Notes: Plan to use DOOR/RADAR approach which uses a composite outcome of ranked efficacy and safety outcomes to assess superiority of a shorter duration antibiotic treatment strategy while weighting potential harms. Each patient is assigned a DOOR value based on severity of outcome and days of antibiotic use, then patients are ranked by value. Next calculate probability based on DOOR. Want to show that patients who received less antibiotics have similar or improved outcomes. How many physician survey responses do we need to be able to make relevant conclusions?
  • To determine whether there is a difference in days of antibiotic use between treatment arms, recommend using a Mann-Whitney U test. Use same test to assess difference in the clinical outcome. Using observational data (rather than randomized), need to balance differences in treatment arms (ex. matching). May want to consider designing a future pragmatic trial. For survey, recommend randomly assigning 5 questions to each participant (shorter survey) to improve the response rate; then combine responses to questions across entire sample.
  • Can apply for VICTR Award for biostatistics support (90 hours): application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/).

  • Attendees: Frank Harrell, Dan Byrne, Dale Plummer, Amy Perkins, Pauleatha Diggs, Sarah Stallings, Chelsea Gorsline, Gowri Satyanarayana

2020 July 09

Kaushik Amancherla, Cardiology

  • A recent change was made to the United Network for Organ Sharing (UNOS) heart allocation policy in October 2018. We are seeking to assess the changes in practice patterns regarding bridging strategies (e.g. no treatment, inotropes, mechanical circulatory support) by blood group (A, B, AB, and O) between the new and old systems.
  • Questions:
    • 1) We want to make sure that the design of the study is sound.
    • 2) We would like help with data analysis once the data is obtained from UNOS.
  • Mentor confirmed, VICTR Voucher

  • Recommendations: Start with Chi-square and Mann-Whitney U tests to compare variables of interest by time periods. Conduct a series of binary logistic regressions predicting each treatment adjusted for covariates. Consider propensity score adjustment if the two groups (treatment vs control) are believed to be very different. A more interesting analysis is looking at time to treatment, using univariate Kaplan-Meier and Log-Rank tests and then multivariable Cox proportional hazards regression. May want to consider a competing risks analysis, since death is a competing event to transplant. Also consider an interaction term by time and/or blood type.

  • Attendees: Ahra Kim, Dan Byrne, Dale Plummer, Kaushik Amancherla, Sandip Zalawadiya

2020 June 25

Dakota Vaughan, Medical Student, Project in Ophthalmology Dept.

  • We are interested in assessing the results of ophthalmic exams following preschool vision screening, specifically in generating a likelihood ratio of amblyopia diagnosis as a function of estimated degree of refractive error on vision screen. Mentor confirmed
  • Recommendations: Collect information on relevant covariates such as age, height of child, ethnicity, insurance information, and parent status (single parent) if possible. Days since screening may be useful to collect. Conduct a logistic regression model including 6 measurements as covariates and an interaction term between age and each measurement.

Osarhiemen Omwanghe, Obstetrics and Gynecology

  • Our study is a retrospective analysis that evaluates the impact of telemedicine on clinic attendance / no-show rates. Furthermore, we are determining if increased access to telemedicine platforms were useful in bridging/decreasing disparities and improving clinic attendance by increasing access to convenient clinic encounters with increased opportunities for patients to engage with clinicians in virtual clinic appointments. We would like assistance with data analysis, determining which data points are statistically significant, and we would like assistance with developing graphical representation of data (or guidance with regards to how to complete presentable and legible visual representations of the data set available).

    Mentor confirmed, VICTR voucher
  • Recommendations: Need to clearly define numerator and denominator measures, such as whether to look at no-show with inactivated account or activated account. Want to collect proportions for both pre-pandemic and post-pandemic levels to do a comparison. Information on timing of scheduling and timing of setup may be interesting to look at. As a first step, collect these proportions broken down by racial groups or other covariates of interest.

* Attendees: Ahra Kim, Chang Yu, Frank Harrell, Shawn Garbett, Dale Plummer, Dakota Vaughan, Sean Donahue, Osarhiemen Omwanghe, Ro Lister

2020 June 11

WITHDREW: Devika Nair, Nephrology

  • I have deidentified data that has been already collected that I would like to request a VICTR voucher for biostatistics support. I have already conducted the analyses myself; this would be to confirm/validate my findings. The data is available in REDCap and was discussed at a previous Biostats clinic (project related to burnout)

Pauleatha Diggs, Medicine

  • We administered a survey composed of multiple validated sub-scales. We would like assistance in determining an analysis plan for determining the association between three independent values (the score of three different sub-scales) and one outcome variable (the PROMIS mental health score). We also would like to determine if there are any mediating or moderating co-variables affecting the association.
  • Recommendations: Do not dichotomize the outcome if there are adequate numbers of subjects in each outcome category. Try using a proportional odds model (ordinal logistic regression), adjusted for other covariates. If age groups are believed to be very different, consider a subgroup analysis. It is better to model age continuously with a some type of linear, quadratic, or cubic relationship assumed.

Lee Wheless, Dermatology

  • I am conducting a multi-failure analysis of the development of new skin cancers in organ transplant recipients (range 1-110 discrete skin cancers per person, ~2000 total events). I’m using the Prentice-Williams-Peterson Gap Time model and have formatted my data for this with extensive checking to confirm. I include a censoring date for all individuals of their entire duration in the SD, though often data exist in the SD after the final skin cancer. I get a fairly strong inverse association with age and skin cancer when I include the full range of dates (HR 0.90, opposite what is expected), and significantly different results (but more in line with expectations) when I censor cases at their final skin cancer. It seems wrong to throw out perfectly good data, but the results make more sense in doing so. VICTR Biostatistics voucher.
  • Recommendations: Run a stratified Cox PH model if baseline hazards are different (non-proportional) as a start. Age at transplant is a well defined covariate, but age at first event is not and is almost a different question because it's a moving target. For time to first event, censoring is not a problem as long as the events have been observed. Time to recurrent event is a more complicated question due to issues in censoring and time-dependent covariates. Chang will be in touch to help find a statistician for the project.

  • Attendees: Ahra Kim, Chang Yu, Shawn Garbett, Dale Plummer, Lee Wheless, Pauleatha Diggs, Sarah Stallings

2020 June 4

Erin Bouquet, Internal Medicine

  • We have a project looking at colonoscopies with inadequate prep at the VA. Collected data on >500 scopes and their repeats. Goal was to see if there was a difference in polyp detection and adenoma detection rate. We also collected basic demographic data.
  • Meeting Notes: Collected patient data from 2011-2019 and recorded medications, comorbidities, procedure characteristics and outcomes on initial and repeat colonoscopies. Want to assess difference in outcomes and adequate prep between patients who repeat the next day (n=178) vs. within one year (n=397). Overall, 58 patients had inadequate prep for the repeat colonoscopy.
  • Recommend using a chi-square test (report proportion and confidence interval) or logistic regression model (report OR and CI). Can use a chi-square or Kruskal-Wallis test to assess differences between the two exposure groups. In future research, consider using a randomized controlled trial to assign patient to repeat the next day or within one year.
  • Can reach out to Chris Slaughter or apply for a VICTR award for assistance with statistical analysis.

2020 May 28

Shi Yang, Otolaryngology

  • Help with analyzing trends in data for a project looking at number of females in surgical subspecialty. Previous clinic session: Thursday, March 30th.
  • Recommendations: In Table 1, add statistics displaying median and interquartile range for continuous variables (H-index, years in practice) and p-value obtained from Wilcoxon-rank-sum test. For Chi-square tests, only conduct 1 test per 2x3 or 2x2 table (i.e. 1 test for gender and rank, 1 test for gender and leadership) and not 3 tests for rank (assistant, associate, professor). Combine leadership position variables into one variable, due to small number of those in leadership positions for certain columns. May want to look at other potential confounders affecting academic rank. Conduct linear regression if outcome is H-index, binary logistic regression if outcome is leadership position, and ordinal logistic regression if outcome is rank.

  • Attendees: Ahra Kim, Chang Yu, Frank Harrell, Shi Yang

2020 May 07

John Brems, Internal Medicine

  • Designing a cross-sectional analysis of current intellectual conflict of interest among Clinical Practice Guidelines published by cardiology and pulmonary societies in past two years. We are looking for assistance with project design before starting the process of data collection. We have multiple ideas of outcomes to look at including: percentage of guideline authors/chairs/co-chairs who are an author/first-author/last-author on a study reviewed by the guideline, use of GRADE methodology, inclusion of “intellectual conflict on interest” in COI disclosure. We are tentatively planning on doing a comparison of these measures between cardiology and pulmonary guidelines. VICTR voucher request. Mentor confirmed.
  • Meeting Notes: Between 01/2018 - 12/2019 there were 36 pulmonary and 41 cardiology society guidelines published. Plan to have 2 reviewers assess pieces of each guideline. Will write a manuscript to summarize findings.
  • Recommend drafting a data collection form and planning descriptive statistical analysis. Can schedule another clinic appointment for additional guidance.

Aima Ahonkhai, Institute for Global Health

  • Preliminary analysis of HIV health outcomes among foreign born youth living with HIV in the South. VICTR voucher request.
  • Meeting Notes: Recruited subjects from three sites and recorded clinical information gathered over the past year (age, gender, disclosure status, continent of origin, adoption status, etc.). Want to assess predictors of HIV care outcomes (virologic suppression, retained in care). Data are saved in Excel. Plan to build separate regression models for engagement in care, virologic suppression, and combined engagement in care and virologic suppression. Viral load is measured up to four times during the year.
  • Recommend chi-square or Fisher's exact test for univariate analysis. Logistic regression models can assess relationships with clinical variables while adjusting for other covariates. Can use a mixed effects model to analyze the repeated viral load measurements. May need to transform viral load data prior to modeling; start by creating spline plots of viral load by time.
  • Recommend applying for VICTR Award for biostatistics support (90 hours). Application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/). Contact Dan Byrne for assistance with writing statistics portion of application.
  • Attendees: John Brems, Ellen Clayton, Frank Harrell, Dan Byrne, Amy Perkins, Leslie Pierce, Aima Ahonkhai

2020 April 30

Jessie Gibson, Neurology

  • Ciaran Considine and I are planning to conduct a pilot project using Blue Light Therapy Glasses for sleep, mood, and social function in Huntington’s and Parkinson’s diseases. We have developed the preliminary protocol but would like protocol approval and a VICTR voucher for support with data analysis. VICTR voucher request.
  • Meeting Notes: Plan to assign 32 subjects randomly to treatment or placebo glasses which will be worn 30 minutes a day for 3 weeks. Then the subject will cross over to the other study arm for another 3 weeks. Subjects will self-report their sleep-related symptoms; the primary outcome is their Sleep Related Impairment which is measured by 8 questions. Secondary outcomes include depression, anxiety, sleep quality, and social function. Questionnaires will be completed at baseline, 3 weeks (just prior to crossover), and 6 weeks. The subject will also wear an ActiGraph to detect sleep time, time to fall asleep, etc.
  • Parkinson's patients will likely be older than Huntington's patients and have different atrophy patterns at baseline. There may be some carryover effect for those patients who start with the Blue Light Therapy glasses, so reviewers may comment on no washout period.
  • At the end, can ask the subject which glasses they prefer. Each patient serves as their own control in a crossover study. If the treatment effect does not vary by age, then the baseline measure will not be needed. May consider a washout period of a couple days.
  • Recommend using a linear regression or proportional odds ordinal model (semiparametric rank which accounts for floor and ceiling effects and can handle a variety of distributions, see course notes at http://hbiostat.org/doc/bbr.pdf). Should utilize an intention-to-treat analysis. A random effects model requires at least 3 observations per subject. May consider a subgroup analysis by disease group. Should report compliance data by disease group but do not need to adjust for this in the model.
  • Recommend applying for VICTR Award for biostatistics support (90 hours). Application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/).
  • Attendees: Jessie Gibson, Ciaran Considine, Frank Harrell, Dan Byrne, Amy Perkins

2020 April 23

Mariya Kovaleva, School of Nursing

  • I have a question about my paper that I am trying to publish, it involves longitudinal data analysis, 3 time points (used mixed linear models in SPSS). I would like to check please regarding assumptions I must check. I did check normality of the dependent variable, what other assumptions must I check? I measured psychological well-being outcomes in caregivers 3 times.
  • Recommendations: When plotting the raw data, connect the dots at the subject level to create a spaghetti plot and easily see time trends of individuals. To check assumptions, plot residuals of the model for each model separately and also create qqplot of residuals. Model may not be normal, due to small sample size. Treat time as a continuous variable rather than categorical by converting it to time in months or days. Model time as a quadratic to allow for non-linearity by adding time and time^2 together in the model. Do a chunk test/composite test to see the effect of time jointly with time^2 in the model. Encouraged to include baseline levels as a covariate to adjust in the model, since many times people have different baseline levels. If question is to model the whole trajectory of scores and incorporating variety, then don't need to adjust baseline levels as a covariate.

2020 April 16

Rachana Haliyur, Vanderbilt University School of Medicine

  • To correlate known single nucleotide polymorphisms (SNPs) in CXCL8:CXCR1/2 with Diabetic Retinopathy (DR) susceptibility and progression in clinically-validated cohorts of patients with Diabetes Mellitus (DM). Neovascularization in DR is currently managed by targeting vascular endothelial growth factor (i.e. anti-VEGF therapy); however, a significant portion of patients are unresponsive to treatment suggesting alternative pathways of inflammation and angiogenesis contribute to disease. We hypothesize polymorphisms in the genes CXCL8, which codes the cytokine Interleukin-8 (IL-8) and CXCR1 and CXCR2, receptors of IL-8, will be associated with DR susceptibility and progression in patients with DM. We have developed and begun validating cohorts of patients with DM with and without DR using Vanderbilt’s adult synthetic derivative (SD). These cohorts were developed using an algorithm that considers a combination of billing codes, common procedural technology (CPT) codes and availability of DNA. To assess the clinical validity of each cohort, we have manually phenotyped each record confirming algorithm parameters in conjunction with the de-identified medical record. To test our hypothesis, we will determine the prevalence of previously described SNPs in CXCL8:CXCR1/2 in our clinically validated cohorts and compare these to known SNPs in vascular endothelial growth factor and its receptor (VEGF:VEGFR). We would like to discuss how we can determine and justify sample size and the appropriate biostatistical plan for this study.
  • Meeting Notes: The outcome is severity of disease (ordinal variable range 0-4), so can use a proportional odds model. For sufficient power, recommend having no more than 4 levels for an ordinal variable. In addition to SNPs, can collect additional patient characteristics (ex. age, sex, race, duration of diabetes, etc.) and adjust for these covariates in the model as well. With only 250 patients, will likely be unable to complete subgroup analyses (subsetting by race or sex). Using the smallest cohort, need a minimum 15:1 ratio of patients to covariates in the model. See http://hbiostat.org/doc/bbr.pdf physical page 187 Figure 6.1 (reference for sample size). Can use PS software for sample size calculation.
  • When defining NPDR or PDR, recommend using a continuous variable rather than dichotomizing into positive or negative (binary); this will improve heterogeneity and power when comparing SNPs. At a minimum, should define severity of DM, NPDR, DME, and PDR as "mild", "moderate", or "severe" rather than positive or negative. Use all available information from ICD-10 codes, and do not do anything to lose any information (ex. categorizing or dichotomizing variables).
  • Current Excel file will need to be reformatted for data analysis. This can be done using a statistical software program (ex. R, SPSS, Stata). Dan Byrne will send recommended guidelines.
  • Recommend applying for VICTR Award for biostatistics support (90 hours). Application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/). Contact Dan for assistance with writing statistics portion of application.
  • Attendees: Amy Perkins, Dolly Padovani, Dan Byrne, Rachana, Haliyur, Frank Harrell

2020 April 9

Teminioluwa Ajayi, Internal Medicine

  • Our project is a cross-sectional study to address the prevalence of imposter syndrome among minority medical residents. VICTR voucher request, mentor confirmed.
  • Want to measure the association between imposter syndrome and burnout in African American trainees using 2 validated scales. Survey will be administered through REDCap.
  • If primary question is comparing African Americans with other groups, then we need a control group. Recommend trying to recruit other racial groups using the same method to increase validity.
  • Recommend having demographics upfront or randomize the order of questions, since survey is quite long (60 questions) and there will most likely be survey fatigue. May want to use an abbreviated survey if available. Response rate is a concern for long surveys, and would ideally like to have at least 70-80% responses.
  • Need to collect an identifier to avoid duplicated people in both groups (name, email address).
  • Recommended to apply for VICTR grant and biostatistics support voucher.

Juan Pablo Arroyo, Nephrology

  • Two stages: 1. Hypothesis generating retrospective study to asses the characteristics of kidney injury in Covid19 + patients 2. Propensity score matched retrospective study comparing kidney injury present in Covid19+ vs Flu + patients looking at outcomes identified in #1. VICTR voucher request, mentor confirmed.
  • Recommendations: First step is to describe the subjects (admitted patients positive for Covid-19) and their BMP and presence of kidney failure. Try to use data from all subjects and all timepoints (ex: not limiting data to patients with up to X days). Use a spaghetti plot to illustrate the trajectory of raw creatinine levels throughout. Try to collect as much detailed information as possible, especially where the patients started (comorbidities, onset of symptoms), and even data for flu patients with other viruses may be helpful. Flu patients with other viruses may serve as a reference group for later analysis.
  • Second step is the longitudinal analysis, looking at time to drop in renal function measurements or time to failure.
  • If measurements are available, matching might not be the most helpful since it can lower power and sample size. If sample size is adequate, use the lab data to predict which cohort the measurement is from. Direct adjustment is recommended rather than propensity score matching in this case.

2020 March 30

Shi Yang, Otolaryngology

I have attached my data set. I will be doing analysis on the first two tabs. I am looking at gender differences in Facial Plastic Surgery. Below are some questions I want to answer ...

  1. Is there are statistically significant difference between fellowship directors based on gender?
  2. Is there are statistically significant difference between division chiefs based on gender?
  3. After controlling for years in practice is there a statistically significant difference in academic rank (assistant, associate, professor) based on gender?
  4. Are women more likely to be assistant professor compared to associate or professor?
  5. Is there a difference a statistically significant difference of H-index based on gender after controlling for years in practice and academic rank?
  6. Any trends in H-index based on academic rank or years in practice? Any differences with gender?
  7. Plot for
  8. Has there been an increase in number of females completing AAFPRS fellowship over recent years? (stat significant difference?)
  • Meeting Notes: Recommend recoding select categorical covariates as binary (1 = yes / 0 = no). Clean up data values to be in consistent format. Try to find out missing information and fill in values as much as possible. May need to collapse categories with small counts. Can create a table to summarize characteristics stratified by gender. Recommend using an ordinal logistic regression model for academic rank that adjusts for covariates gender, years in practice, etc. Can information be gathered regarding timing of childbirth for women? For binary variables (fellowship: yes/no), use a logistic regression and for continuous variables (H-index) use linear regression.
  • Is this clinical and translation research? Recommend applying for VICTR Award for biostatistics support (90 hours). Application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/). Contact Dan Byrne for assistance with writing statistics portion of application.

2020 March 19

Justin Jacobse, PMI

  • Immune responses between people are not equal. The aim of this study is to study the immune response of three groups of people to proteins X1-X15. Whereas some people might yield an immune response against one of these proteins, we hypothesize that multiple people mount an immune response against the same protein(s). We are interested to compare the baseline characteristics of these three groups of people to examine in which aspects they are similar, and in which aspects they are not. Moreover, we want to determine whether people in a certain group are more likely to mount an immune response against the same protein(s)."

    VICTR voucher request, mentor confirmed.
  • Meeting Notes: Planning to recruit 90 subjects (3 groups of 30 subjects). A score is derived from a graph for each protein. Went over risk of unreliable results when examining a high number of candidate features in comparison with the number of subjects. Recommend 15 cases per candidate variable (15 proteins), so a minimum of 225 subjects would be required. A high signal-to-noise ratio requires fewer subjects than a low signal-to-noise ratio. Sample size is best based on ensuring that the apparent predictive accuracy (or group separation which is also called predictive discrimination) not drop very much when applied to new patients.
  • Recommend using a multivariable regression model rather than analyzing each protein separately. A pre-specified biological/biochemical ordering of scores could be developed and statistical analysis could use potential predictors in this pre-specified order. If the desire is to rank the promise of features individually, the Kruskal-Wallis test statistic (and the generalized rho^2 which can be calculated from it) is a possibility.
  • Duration of time to develop immune response needs further consideration. Immune response measurement time and time since stopping treatment will be recorded. May also control for severity of disease. May be good to eliminate some candidate features - restricting targeted regions. If do not increase the number of subjects, the results need to be accompanied by appropriate caveats such as confidence intervals for importance ranks of features. Will have several features in the inconclusive range. Resampling validation (strong internal validation) may pay off, usually using the bootstrap.
  • For statistical collaboration the DDRC core may have a biostatistician available (Tatsuki Koyama).
  • Attended: Frank Harrell, Amy Perkins, Ahra Kim, Justin Jacobse, Jeremy Goettel

2020 March 5

Kerry Schaffer, Assistant Professor of Medicine, Hematology Oncology

  • This project is intended to offer genetic testing (both germline and somatic testing) to African American Males with metastatic prostate cancer. It is now standard of care for males with metastatic prostate cancer to get offered genetic testing as it can have treatment implications and also impact on family members, if a hereditary cancer syndrome is diagnosed. African American (AA) men suffer from a disproportionately high incidence and mortality of prostate cancer compared to Caucasians (CAs). Although a large part of overall survival (OS) differences are known to be related to a later stage at diagnosis for AA males, among late-stage prostate cancer patients there is also a racial disparity for OS, attributed in part to worse access to standard of care therapies. Given the established lower rates of testing when warranted in minorities nationally and racial disparities in delivery of care for prostate cancer, the intent of this proposal is to implement a protocol focused on increasing germline and somatic genetic testing in AA males with mPC. In the setting of increasing demand for genetic counseling yet an ongoing national shortage, numerous models for care delivery are being developed to attempt to increase access for comprehensive genetic counseling care to patients. The study will also work to assess patient comprehension of pre-counseling tools. Questions of interest are 1) what number of patients are needed and 2) what methods for data analysis are recommended?
  • Meeting Notes: What is a quantitative improvement in genetic testing rate? Current rate is 2%, but hoping to reach 50%. Collected pre- and post-surveys from 100 subjects after watching Tuyas video. The survey includes a question on favorability toward genetic testing. Future crossover study will randomize approximately 60 AA mPC patients to one of two arms (subjects watch Color video or Tuyas then Color videos) and have subjects complete surveys at baseline (before videos) and after watching each video.
  • Recommend collecting data on outcomes such as mortality, quality of life, and genetic testing status. Can use PS software to calculate sample size (http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize). An increase in genetic testing rate from 3% to 30% would require 38 subjects in each group to have 90% power. Can calculate confidence interval for historical testing rate, then recommend using the upper confidence limit in hypothesis test vs. post-crossover study testing rate. Funding for biostatistics support may be available through the Learning Health System Biostatistics Shared Resource Center, Cancer Center biostatisticians (CQS), or VICTR Award. Appropriate statistical methods for the crossover study include McNemar's test or a regression model. Without any crossover, a chi-square test could be used.

2020 February 20

Raymond Zhou, Sean Donahue, Vanderbilt Eye Institute

  • We using two separate approaches to study patients that are forwarded to Vanderbilt Eye Institute for ophthalmologic evaluation after a failed vision screening (a test that estimates refractive error and eye alignment). My colleague is completing a retrospective chart review of patients forwarded to VEI after failed vision screening at two specific pediatrics practices. I am working with VICTR on a data pull to look at all patients forwarded to VEI after failed vision screening. Our primary analysis for both projects will be a multivariate regression analysis, to assess for the effects of various demographic/clinical variables (age, race, zip code, etc.) on the prevalence or absence of Amblyopia Risk Factors (a defined set of eye diseases). Other analyses will try to assess for the individual affect of certain variables, e.g. by comparing the positive predictive value of failed vision screening in hispanic patients vs. non-hispanic patients. How we can best conduct these analyses is another question we want to ask of you. Lastly, I am interested in finding out how many patients I would need to manually review to validate a dataset generated from a data pull. VICTR voucher/mentor confirmed.
  • 12/19/19 Meeting Notes: Recommend using a multivariable logistic regression model to evaluate association between risk factors and diagnosis of Amblyopia. The risk factors in the model should be pre-specified. The effective sample size is the number of patients with Amblyopia (case), and recommend 10-15 cases per predictor in the model (need 96 cases just to estimate the intercept in the model). Recommend contacting Dr. Cindy Chen regarding Biostatistics collaboration support. If this project does not qualify, then can apply for VICTR Award for biostatistics support (90 hours).
  • Today's Meeting Notes: Outcome is pre-clinical Amblyopia (yes/no). Making a diagnostic test result binary creates false positives. Recommend using diagnostic test to calculate an individual patient's probability of disease and reporting predictive value of the diagnostic test. See chapter on Diagnostic Models and chapter on Observer Variability Studies (http://hbiostat.org/doc/bbr.pdf). Can use grades of disease severity.

Milner Staub, Infectious Diseases

  • I am putting in a track proposal for a CDC Epicenter grant. I met with Amber Hackstadt who helped me figure out that I need to use a 2-way ANOVA for two separate portions of my proposal to predict power; however, I am not sure how to calculate this because I don’t know some of the parameters that STATA is asking me for or how to calculate them. I would really like some help.
  • Meeting Notes: Collected 4200 surveys from TN providers to gather information on perceived barrier domains for prescribing (clinician, clinic, and community). The providers are categorized into three groups (1400 surveys in each of high, medium, and low prescribers).
  • Recommend including a provider's number of prescriptions per year as a continuous covariate in the model; expect a non-linear relationship with outcome (scale score). Can use a proportional odds model. This is an estimation study rather than a hypothesis test study. The sample size calculation is different for estimation, and a margin of error (precision) calculation is more appropriate. See http://hbiostat.org/doc/bbr.pdf and search "plotCorr".

2020 February 13

Erin Griffin, OB/GYN

  • I am using multidimensional measure of informed choice (MMIC) to describe the choices made by women deciding on prenatal cell-free DNA screening. This measure uses a knowledge “quiz”, an attitude scale to classify decisions as informed or uninformed.
  • An informed decision requires both a sufficient knowledge score and an attitude score that matches the decision (positive attitude and test uptake OR negative attitude and no test uptake). Anyone with insufficient knowledge and/or an attitude that does not match uptake decision is considered uninformed.
  • Validation of the study is performed by measuring decisional conflict, anxiety, and/or satisfaction after the decision has been made. In the literature, those classified as having made uninformed choices have more conflict/anxiety and less satisfaction.
  • QUESTIONS:
    I’m interested in looking at other statistical measures to determine the relationship between individual knowledge questions and conflict/anxiety/satisfaction.
  • Studies using the MMIC have been published numerous times and always dichotomize knowledge into sufficient and insufficient based on arbitrary measures (median/mean/investigator cut-offs).
  • I want to know if I will be able to use statistical measures to determine the predictive value of individual knowledge items on the outcome measures. I have read that dichotomizing continuous variables is dubious; I do not have a statistics background to know what measures I should pursue.
  • VICTR Voucher, mentor confirmed
  • Recommendations: The project is a master's thesis, and data collection has not yet started. Upon data completion, conduct linear regression to predict anxiety level with 12 knowledge items as covariates. Though knowledge items are ordinal, they can be treated as linear. Use linear regression for other outcomes as well. If interested in the probabilities of each item on the dichotomized outcome, use logistic regression. Check out Regression Modeling Strategies book.

2020 February 6

Dan Foster, Pharmacology

  • Grant reviews suggested working with a bio-statistician with regards to analysis of some preclinical behavioral experiments that are proposed.
  • Meeting notes: Grant reviewer stated the mouse model experiments are too complex to be analyzed by one-way ANOVA with post-hoc tests. For this crossover study, have 16 mice in each genotype group (8 males, 8 females) and plan to record genotype, gender, and drug dose (randomize order for control, dose 1, dose 2, and dose 3 which are received for one week each). The most important comparison is the drug dose which will be done separately by genotype using a dose response curve.
  • Recommend using a mixed effects model since have repeated measurements on independent mice. Can create contrasts to compare drug doses, genotypes, or genders. If the budget allows, a larger sample size is preferred to include multiple covariates in the model.
  • Recommend applying for VICTR Award for biostatistics support (90 hours). Contact Chang Yu for assistance with writing statistics portion of application.

2020 January 30

Angela Maxwell-Horn, Developmental Medicine

  • I am providing an education on pharmacotherapy for residents with a pre and post test. I would like to know how to analyze improvement and if anything should be changed in my study design before I implement it. VICTR Voucher (independent investigator).
  • Meeting notes: Approximately 25 residents will participate. Descriptive statistics can be reported for the change in scores between the pre- and post-test. Recommend using a linear regression model for post-score and including pre-score, year of residency, and attendance of similar lecture/education as covariates in the model.
  • Recommend applying for VICTR Award for biostatistics support (90 hours). Contact Chang Yu and Dan Byrne for assistance with writing statistics portion of application.

Kayla Anderson, Human and Organizational Development

  • We implemented a statewide survey to public water operators in Tennessee, assessing the technical, financial, and managerial challenges facing the industry. We are currently working on determining whether or not the survey sample is representative of the overall population and what the correct statistical method would be in order to validate this question. Mentor confirmed.
  • Meeting notes: Recommend reporting descriptive statistics for characteristics of the survey respondents. The R package 'survey' can be used to fit a survey-weighted generalized linear model (svyglm()) to incorporate sampling weights. Rurality, location, and population size may be included as covariates. Some categories may need to be combined due to small sample size. A factor analysis will inform which questions can be combined into different subscale scores for the model. Separate models for stressed and not stressed water systems can be fit to determine any differences in the main effects.
  • Recommend applying for VICTR Award for biostatistics support (90 hours). Contact Chang Yu and Dan Byrne for assistance with writing statistics portion of application.

2020 January 23

Michelle Liu, Pharmacy/VICTR

  • Question regarding how to perform statistical analysis on PK and clinical data set of transplant patients with CYP3A5 genotype information.
  • VICTR Voucher (independent investigator)
  • Want to compare two groups: expressors vs non-expressors (expressing a gene meeting higher doses); grouping can be binary or ordinal with 3 levels (undecided)
  • For 3 level expressor group use a Kruskal-Wallis test to see the relationship between group and another variable. For a 2 level expressor group, use Mann-Whitney U test or Chi-Square test.
  • To control for other predictors in a multivariable model, use a binary logistic regression or ordinal logistic regression.
  • Data cleaning (spreadsheet) is highly recommended.
  • Recommended to apply for VICTR voucher; certificate of completion is proof of clinic attendance (clinic notes) in pdf format.

Mohamed Khattab, Radiation Oncology

  • We are presenting a prospective trial of frameless radiosurgerical thalamoty for essential and parkinsonian tremor. We would like to analyze efficacy data (tremor severity), radiologic accuracy of targeting, quality of life data, and we would also like to discuss potentially producing a matched cohort to compare our modality to a competing modality (deep brain stimulation). We would like to apply for biostatistics support, and also understand the statistical analyses necessary for our goals.
  • VICTR Voucher/mentor confirmed and attended.
  • Recommend doing a power calculation to determine sample size of controls and type of matching (1:1, 1:3.. etc).
  • May want to consider propensity score matching to compare group receiving new treatment versus traditional treatment.
  • If interested in looking at change in each time point, recommend doing longitudinal analysis. If interested in looking at time to improvement, recommend doing Kaplan-Meier analysis and Cox Regression.
  • Recommend first prioritizing clinically important variables to include in the multivariable analysis, as well as deciding the outcome variable (using maximum score of improvement versus delta in improvment between two groups).
  • Recommend looking at all 30 questions in the questionnaire and comparing the global scores between the two groups rather than testing for a difference in each element of the questionnaire. If testing for a difference in each item, may need to apply some type of correction method to adjust for multiple comparisons.
  • Recommended to apply for VICTR voucher; certificate of completion is proof of clinic attendance (clinic notes) in a pdf format.

2020 January 16

Keerti Dantuluri, Pediatric Infectious Diseases

  • We are conducting a retrospective cohort study to determine the rates of acute respiratory infection (ARI), antibiotic use associated with ARI, and inappropriate antibiotic use associated with ARI among young children (based upon the rurality of their county of residence) enrolled in TennCare between July 1, 2007 and June 30, 2017. We have abstracted data from the TennCare data base and are using multivariable poisson regression to measure our outcomes. We would like to review our data analysis with a biostatistician for a second opinion on our choice of regression model.
  • Meeting notes: Would like to know whether there are differences between mostly urban, mostly rural, and completely rural areas of TN. Plan to use Antibiotic Appropriateness Classification Scheme (Tier 1 & 2 considered potentially appropriate, Tier 3 considered inappropriate). Used multivariable fixed effects Poisson regression to calculate adjusted IRR.
  • Recommend generating a histogram to determine the distribution of the number of ARI (ex. Poisson, negative binomial, zero-inflated). Can include calendar year in the model as a non-linear, continuous covariate (e.g. using restricted cubic splines) with fraction of the year calculated using the number of days into the year divided by 365. Depending on whether you want to model rates vs. probabilities will determine which interval of time for observation is used.

Celestine Wanjalla, Infectious Diseases

  • Project is looking at the association between inflammatory cells and carotid plaques as the dependent variable. Question is on the use of PC component analysis to reduce or pick number of variable to include in models and for the paper. The use of multiple step regression analysis versus looking at several different models.
  • Meeting notes: Study includes 70 patients with and 30 patients without HIV.
  • Recommend including log(height) and log(weight) in the model since effects could be different. If same size was much smaller, BMI may be necessary to include fewer covariates in the model. Should utilize variable clustering (varclus()) to prevent reduction in sample size due to missing data; use clinical expertise when interpreting statistical output. May also consider single variable imputation for missing data.
Topic revision: r2 - 18 Dec 2023, IneSohn
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback