You are here: Vanderbilt Biostatistics Wiki>Main Web>Clinics>ClinicGeneral>GenClinicAnalyses>MondayClinicNotes2021 (13 Dec 2021, DalePlummer)EditAttach

Notes 2021

- Polygenic risk scores have association to treatment response in other diseases. Has not been shown in Type 2 diabetes. What is best way to test association between risk score and treatment response to metformin measured by change in HbA1c? Population in BioVU where metformin was initiated. Covariates of age, sex, BMI, ancestry.
- Recommendations:
- Primary analysis - continuous regression, polygenic risk score interacted with baseline HbA1c, both expanded for non-linearity (ex. restricted cubic spline). Adjusted for additional covariates. Include the list of variables for BioVU sampling.
- Explore baseline characteristics for patients who were not followed up three months later. Also the patients who are not in BioVU (who do not have polygenic risk scores). Possible issue of any competing medications that could alter HbA1c.
- Make sure inclusion/exclusion criteria are based on information available at time 0 (initiation of metformin).
- Possible VICTR voucher project for biostatistics support (90 hours). Application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/).

- Survey for students and professors regarding methods for gathering consent. Looking for sample size & power information. Primary research question is what is the most acceptable mode of consent for students and physicians. And if there is difference between students and physicians as the way to do consent. There are six modes of consent (some are general, some are specific). And participants are asked to rate each mode of consent on a continuous scale. Needs to work out how to translate continuous scale to binary acceptable/not acceptable.
- Plans to do pairwise comparison between modes of consent. In total there will be 15 pairs. Needs to establish what is a meaningful difference and the level of precision.

- Collected 400+ stool samples, 123 patients - expecting approximately 1400 stool samples with 280 patients. Babies less than 2000g. Will collect for 1 year; on week 16. Currently have 10 cases, expect to have about 30 cases by the end of 1 year. Variables include baby and maternal demographics. Questions about matching (ie. how to find controls).
- We tend to match so that we can ignore variables/risk factors that we know already impact the outcome. First step, identify those risk factors (gestational age/birth weight, season, los). This forces the distribution to be the same in cases and controls. Can say any differences we now find are differences beyond what we matched on.
- Recommendations:
- Since the number of cases is quite small, maybe do not use too many variables for matching.
- Need to discuss rule of what is close enough for the matching variable(s) to consider it a control. Can automate the list of controls for each case after you make rules.
- Connect with Ran Tao, Jonathan Schildcrout for genomic & outcome dependent analysis

Clinic Notes:

- Comparing two medical devices (holter monitor versus pacemaker/defibrillator) on their ability to detect PVCs. Two devices were worn by the same patient at the same time. The time goes back to 2009 and the total number of patients is about 1,000. Instances per patient is 1 or 2 and almost every patient has PVC. The multiple instances per patient creates correlation structure in the data. Patients' characteristics are unlikely to affect the detection of PVC.
- Possible VICTR voucher project for biostatistics support (90 hours). Application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/).

- Research question: Did integrative model impact any outcomes differently than standard of care model? Timeline: April 2020 - February 2021. Cohort: COVID patients (n=279, events=200). Outcomes of interest: LOS, and time to code status change (patient's preference in end of life situations). One possible drawback of the study is grouping is not the only variable that's affecting LOS (possible descriptive instead of comparative). As time goes by, doctors know more about COVID treatment. Problem with time zero being time of palliative care consult, recommend to use date of admission instead. Then recommend Cox model with time varying covariates. Could stratify by patient location (ICU versus stepdown unit), depending on how different populations are. Include baseline code status probably interacted with formal consult with palliative care team. People who are baseline stage 4 (DNR/DNI) will be excluded. Relative measures treat death as censoring, absolute measures you cannot but can use multi state model. Could calculate absolute measures of code change from multi state model.
- Possible VICTR voucher project for biostatistics support (90 hours). Application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/).

Clinic Notes:

- Interested in number of mental health (suicidal ideation or attempt) cases within the emergency department before and during pandemic. March 2018 versus march 2020/2021. Screened relevant variables from March 2018-Feb 2021. Included patients for study if they had presenting complaint of SI/SA or were found to have SI/SA on chart review. Variables include demographics, Columbia-Suicide Severity Rating Scale, los. N is ~4100 records. Patients who presented for multiple times have multiple records.
- COVID could have affected the number of patients in either direction. Pandemic could increase the mental health pressure but patients may tend to avoid hospital visit because of the pandemic. This could be a limitation of the study since it's hard to distinguish the two forces that affect the number of patients. Looking at associations of clinical outcomes with patient characteristics. Need to specify research question(s). Could describe the data, describe he data in a time series, or do inference.
- Possible VICTR voucher project for biostatistics support (90 hours). Application website (https://starbrite.app.vumc.org/ ) and research proposal template ( https://starbrite.app.vumc.org/funding/templatesforms/ ).

- Looking in to differences of complications after procedure, rare procedure/outcomes. Did Fisher's exact test due to small sample size.
- Should this be descriptive in nature because of sample size? Any model you build needs to be constrained regarding outcomes and sample size of data set. Better off from model stability, power perspective to model underlying probability number instead of 3-level classification (in general, continuous variables provide more power). If continuous number is not available, best to report descriptives. Matching helps build more complex model but with this data will still probably not be able to do much. But could do matching and analyze the data as secondary/exploratory analysis.
- Generally speaking need 10-15 outcomes for each parameter.

- Looking for guidance on statistical methods. EHR, retrospective cohort study. Nov 2017-Nov 2021. All pediatric patients being treated for specific type of leukemia. Outcome is toxicity of pegaspargase. Only patients treated at Vanderbilt for leukemia are included. Race group is self-reported (be careful as EHR sometimes is not). Primary question: Do specific ethnic groups have different toxicity? Toxicity is collected via lab values, typically on weekly basis. Different grades of toxicity. Dose adjustment rarely happens. Patients need to have a baseline lab value (obtained before first drug administration) to be included. The expected cohort is about 100-130 patients. The racial breakdown of the cohort is similar to that of the VUMC patients. Some challenges of the study include estimates of some racial groups may not be precise due to the number of patients. Also the follow up periods of patients would be different.
- Recommendations:
- Think about using severity/degree of toxicity instead of dichotomous (y/n), tend to have more statistical power this way
- If looking at differentiating between length of time before toxicity, use time to event analysis. You will have time varying covariates, modeling response profile over time.
- Descriptive analysis to show difference in racial groups. Report average number of toxicities divide by total follow up time. Could also report on the level of toxicities by patients. This does not take in to consideration any adjustments and is a very crude analysis.
- VICTR voucher for biostatistics support is an option if needed.

- Retrospective cohort study, compare new phenobarbital protocol to benzodiazepine regimen for management of patients with alcohol withdrawal. Is there improvement in new protocol, superiority study. Most of the data will be pulled using ICD-10 codes. Challenge with all outcomes is that they might not be captured after discharge. Could restrict outcomes that only occur during inpatient setting. Estimated sample size of 500-1000 patients. Anticipated incidence rate is 10-15% in the benzodiazepine group, 0-7% in the phenobarbital group.
- Recommendations:
- Should we use delirium free days or delirium days? Delirium days (free days does not handle death).
- Possible VICTR voucher project for biostatistics support (90 hours). Need to see possibility of funding additional resources. Part we are concerned about is cost-effectiveness; that is not in the expertise we have in VICTR team. Application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/).

- Speech rhythm measures, 3 groups (stuttering recovered, stuttering persistent, non-stuttering), 50 kids total (10 persistent, 20 recovery, 20 non-stuttering), Particularly interested in the comparison between recovered and persistent groups. All talk for 4 minutes, devise utterance numbers from recording. Duration of utterance highly impacts speech rhythm measures. In general, all patient characteristics are equal across groups. Want to look at join relation of duration & joint supra.
- If looking for repeated measures type analysis, challenge is that number of measurements you have is related to the performance of the patient.
- There could be multiple chunks within a 4-minute period. And there could be short, moderate, or long utterances in chunks.
- Recommendations:
- For future predictions, think about whether you want summaries of utterances or entire 4-minute recording.
- Look at heat map of duration (x) by sbpr (y), estimates relationship between duration/sbpr as predictors of group (persistent vs recovered).
- To account for non-linear duration, use polynomial or restricted cubic spline.
- Could use a model comparison to determine if a polynomial term/restricted cubic spline term is significant.
- To do a patient-level model, the number of utterances should be the same for all rows for a single patient.
- Create summary measure of number of utterances and control for it in the model.

- Think about prospective analysis predicting group.

- Want to create model knowing inputs to screen for peripheral artery disease - is it cost-effective. Test is binary based on threshold (ankle-brachial index). Screening ABI is done rarely. The cost of the test is consistent. One time test, follow for rest of life to see if can tolerate medicine (statin, ace inhibitor). Using US life tables 2017 for state transition probabilities. Assume general population utilities.
- Need to nail down key questions you want answered at the end of the day. In addition to cost-effectiveness, might explore what populations you see different rates of screening or different levels of implementation, or what point is it cost-effective vs not (what are drivers of this and what makes it flip), what is robustness of conclusions based on assumptions, how does compliance impact this. Think about potential interactions. Evaluate assumptions that the model start with.
- Possible VICTR voucher project for biostatistics support (90 hours). Biostatistics team to see possibility of funding additional resources. Part we are concerned about is cost-effectiveness; that is not in the expertise we have in VICTR team currently. Application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/).

- Series of pairwise ordinal regression models. Population is children with stutter (~270 records).
- Standardized coefficient for ordinal regression model:
- First step - scale outcome measures/ variables to have units of standard deviation. Then coefficients are log relative hazards (translate to odds ratio to report). To interpret - standard odds ratio (change in odds for 1 unit change in predictor values); here it is change in odds for per [1 unit] standard deviation change in predictor.
- You can also have a unit IQR as standardization (interpret - if I change covariate from 25 to 75th percentile what type of change is expected in outcome). For standard deviation, scale function centered and scaled the values. For IQR, center it at 25th percentile instead, scale it by dividing by difference of 75th - 25th percentile (use quantile function to find IQR).

- Another way to look at relations/effects of predictors on outcome is partial effects plot. plot(Predict(orm.model)).
- P-Values: be sure to understand we either see detectable difference or it is not conclusive. If we get additional data maybe we would see a difference.
- Model fit to help choose most efficient model and remove "useless" variables:
- First approach - LASSO (package glmnet), doing model building and variable selection on same data set. Looks at variables simultaneously, creates an importance measure to give ordering of variables. More established techinique.
- Second approach - random forrest importance factor
- If you intend to perform inference, you have to be super careful about the methods you use for variable selection. Most do not work with inference with same data set. We think clinical expertise is best way to choose variables.

- This is a longitudinal study of stroke patients. 360 patients with 4 timepoints. Have 120 with 2+ timepoints. Goal is to describe typical trajectory for recovery. There are missing values. Missing values that are bookended could be imputed. Those that are not bookended have problems with imputation. Some patients are unable to be tested due to severity but have scores at a later time.
- Random intercepts can be problematic in these situations. One option is to model transitions from one timepoint to the next. Robust standard error may be useful in these models.
- https://cran.r-project.org/web/packages/robustlmm/vignettes/rlmer.pdf
- Jonathan Schildcrout could be a good resource.

- Hospital Medicine Resident Elective: we are designing an elective for medical residents to improve in their ability to proficiently care for hospitalized patients supported by use of a self-rated competency-based assessment. Surveys are created and being revised. We would use Pre/Post testing with scaled surveys.
- We have developed a feasibility study involving the design six ninety-minute sessions to occur every other month. We want to measure patient compliance and patient engagement. As this project moves forward, it hopes to improve educational exposure and training on team-based decision-making in taking care of this population. We hope to see an improvement in the skill set of interprofessional team-based care of the inter professional students. We will monitor number of patients invited to the sessions, number of patients who arrived, level of involvement of individuals, and proportions of individuals who return for future sessions. We will also track whether individuals have made contacts with each-other and their level of communication.

- VFF Service, works with high utilizers, developing system for tracking patient-center outomes, creating support group (Team based care) and collecting data by using validated performance surveys. Pilot study to see if feasible to have people show up (biggest barrier). Outcomes in professional students, some outcomes in patients. Goal 1) group of patients who share their own information, second goal 2) what is barrier to decrease utilization from provider/team perspective.
- Recommendations
- First question is qualitative. Use moderated focus group approach (seek help from qualitative core - contact Chris Lindsell for names).
- Second part, collect data from group with trained professionals, then have interprofessinal students conduct them after they observe. Worried about lack of training from students. If you are using same questionairre for pre and post, if there is a recall bias it could be emphasized by that.

- Q1. Power calculation for prediction model-verify what I did. Outcome is remission, based on an ordinal score (PsAID). Rather than dichotomizing into a yes/no, could model the ordinal outcome over time. Three models are planned: using baseline only to predict response, to predict trajectory, and using both to predict response. The data for this study are already collected, so we can assess power of the proposed analysis, rather than a true sample size calculation. There are multiple option to assess model performance. Tom: reccomend thinking about calibration. Goal is to covey that the complexity of the model can be supported by the data that are available.
- Q2. How can we adjust for treatment changes in trajectory analysis (growth mixture model)?

- Developing predictive models for treatment response in a type of arthritis.
- Recommendations:
- Binary outcome is least powerful. Preference is continuous outcome.
- Aim 2 - explain this is an exploratory aim since you do not know the archetype/number of profiles of the response; will be generating hypothesis from it. Exploratory aim generates hypothesis (ex. based on prelim data analysis, we think there are 4 response archetypes), confirmatory aim has a precise hypotheses you already established and you designed a study to answer that question.

- Q1. Power calculation for prediction model-verify what I did. Outcome is remission, based on an ordinal score (PsAID). Rather than dichotomizing into a yes/no, could model the ordinal outcome over time. Three models are planned: using baseline only to predict response, to predict trajectory, and using both to predict response. The data for this study are already collected, so we can assess power of the proposed analysis, rather than a true sample size calculation. There are multiple option to assess model performance. Tom: reccomend thinking about calibration. Goal is to covey that the complexity of the model can be supported by the data that are available.
- Q2. How can we adjust for treatment changes in trajectory analysis (growth mixture model)?

- Looking at impact of pandemic on surgical care and outcomes in Ethiopia. Retrospective analysis. 3 exposure groups based on time 0) pre-covid 1) "lockdown", no elective 2) after "lockdown", no elective lifted. Want to understand case volume, referral patterns, outcomes (28-day cumulative mortality) in the 3 different exposure groups. Primary outcome is 28-day mortality. Secondary outcome is change in surgical case volume in phase 1, 2 compared to phase 0; change in referral pattern.
- Reviewer feedback: Adjust all outcomes and associations for confounders, Sample size/power justification, Type 1 error and adjustment for multiple comparisons, Before-after design need to use segmented regression, plot over time.
- Recommendations
- Plots - For case volume, suggest to create a plot (profile/longitudinal) with cases on the Y, time on the X. Estimate what case volume is in phase 1 with confidence interval. For categorical variables, plot is the same but now Y is proportion of the category over time. Possibly stacked bar chart.
- Could use risk scores if they will apply in a low resource setting. Present analysis as we are interested in descriptives and degree of change/association.
- Type I error rate - Could suggest to reviewer that we are not in a setting that would be concerned with family wise error rate and multiple comparisons. Could convert to estimation with 95% CI, rather than formal hypothesis testings.
- Segmented regression - Helpful in situations where you have administrative thresholds, policy changes. Allowing a jump in outcome at a specific time point. If you do segmented, would report plot.
- Power/sample size - Recommend to not do this. They are basically asking for post-hoc power. Explain, sample size was determined by all available data. Information about effect sizes and precision is contained within the confidence intervals provided, not a power analysis. Large research suggests this should not be done (provide references).

- There has not been comprehensive look at misrepresentation of minorities in human research. Will look at 4 journals, 5 time points across 10 years. ~2000 articles. Coding system to capture information about participants, study design, study region, participant race, ethnicity, sex, gender, etc. This study is setting up idea of reporting and recruiting diverse samples.
- What are reporting practices for race, ethnicity, sex, and/or for research particpants? Are participants representative of broader regional and national population demographics? Will compare to US census population.
- Better off reframing as what magnitude of difference exists (estimation): To what degree are minority racial groups under-represented? Absolute percentage point difference.
- Possible problem with null hypothesis, studies do not represent racial distribution of people across the US. Might be helpful to do random sample of publications weighted by size/number of participants in the studies, do deep dive to identify what the catchment area is for particular studies and its racial breakdowns in that area.
- Statistical methods discussed: Graphically (radar plot, bar chart); will have confidence interval from percentage point difference to dictate "significance".

- NICU ventilator patients. 3 NICUs. 2 yrs of data. We want to validate data coming out of mechanical ventilators to validate data & method of collection. Comparing ventilator data, direct observation (gold standard), EHR data, EPIC data. 7 continuous variables, 1 nominal variable. Have done percent agreement/concordance, correlations. A lot of repeated measures on same patient, every hour. Time is how we match up data but want to compare data values to get at comparison of collection methods. Assume time stamps are accurate.
- How best to analyze the data?
- Analyze continuous measures with Bland-Altman, pairwise with gold standard. Report limits of agreement between alternative and gold standard. Have correlated/clustered data here that Bland-Altman won't account for. To account for the correlation structure, look at Bland-Altman by machine or within each NICU, if comparable then you can combine them for second step analysis. If systematic differences, combine in more of a meta-analysis type of way (would want more sofisticated analysis).

- Possible VICTR voucher project for biostatistics support (90 hours)
- Application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/).

- Do need to include the uncertain in the calculation of PPV. Report how algorithm/ICD9-10 performs vs manual chart review using confusion matrix by reporting count. Confusion matrix contains calculation information for ppv, npv, specificity, sensitivity. Second step, calculate the best case and worst case PPV. Give it a range. Be transparent regarding definitions for manuscript to detail exactly what you did and how things were calculated.

- How well does score (0-100) predict deterioration, score is calculated in EPIC every 15 minutes. Involves demographics, vital signs, nursing assessments, lab results. Retrospective, grab worst score in past 36 hours. What is optical evaluation? How can we evaluate the model outside the ICU? At present, this score is not live to providers.
- Recommendations:
- Discrimination and calibration. Think about prospective approach. Randomize time point at which you evaluate patients with this score and look at next 24 hours for events. How well does DI score discriminate future events, how does it change if you change time window (might be calibrated for a specific time point only). Create ordinal scale of all outcomes by severity. Possible measures: Tau-A, Tau-C, Goodman-Kruskal Gamma, Ordinal C statistic.
- If you want to use all data instead of 1 time point could consider random effects model. Correlation through random effect of patient. Model correlation structure.
- More complicated option: Time dependent cox model, allow score to be covariate that changes over time.

- Retrospective cohort. Reoccurance is second event within 6 months of first. Can have multiple follow up events. Antifungal resistance in yeast. Looking at patients who have vaginal candidiasis, recurrent infections. Typically not due to resistance to medication but due to other infection characteristics; however, it has shown some resistance. We want to see how much resistance we see under different testing conditions. Resistance measured at beginning of study (first sample). Resistance is minimum inhibitory concentration, quantitative.
- Q: Have patient characteristics, is there any value in comparing isolates that showed resistance vs those that did not show resistance? Does resistance predict risk of reoccurrence or not? Also have different testing conditions (pH 7 vs pH 4).
- Use quantitative measure as predictor of number of follow-up events. Looking at correlation of that value with number of follow-up events.

- Q: Sample size to answer this question - yes/no?
- Depends on effect size. Might just have to say, these are the samples we can get and because of this, this is our power & amount of difference we can detect.

- VICTR - does not require same level of rigor for sample size as these are pilot data studies.
- Recommend applying for VICTR Award for biostatistics support (90 hours). Application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/).

- Retrospective, cross-sectional study. ~290 children who stutter. We think that some indices of stuttering might cluster together to yield subtypes of stuttering. Want to clarify research question as we received feedback from last session that subtypes could be misleading term to apply to cluster analysis results. Moved away from subtype terminology, characterized cluster analysis as exploratory, primary research questions (RQ) refined.
- RQ1: to what degree are the difference indices of overt stuttering behaviors correlated?
- Do not recommend Pearson correlation due to assumption of linear relationship. Spearman-rho correlation matrix would be better as it captures non-linear relationship between outcomes. Pairwise examination - create ordinal regression model for all pairs and include covariates in way it allows non-linear effect (include polynomial or in R a restricted cubic spline). Report overall impact of one outcome predicting the other. This will be basis for matrix you create. It will have all measures of association. 6x6 matrix (30 models). Would not worry about p-values, use coefficient and confidence interval.

- RQ2: Do stuttering behaviors predict or correlate with cognitive-affective manifestations?
- Interactions are hard to detect, need large sample size to detect them. Need to determine if interactions are worth it. Ordinal regression model (kiddycat/tocs2 as outcome, all 6 stuttering variables/3 adjustment variables as covariates). Possible to combine variables from Q1 if variables are extremely correlated. This is answering do all these variables as a whole help predict kiddycat or tocs2.

- Have samples from patients with TB with ManLAM bacteria. Want to establish the prevalence of cap among those with backbone. Not sure how to handle false negatives. The concentration varies for a sample (concentration is lower if HIV- vs. HIV+). Greater ability to detect in HIV+ patients, could detect about 60% overall (Detect ~60% of HIV-, unknown what proportion of HIV+ but expected to be more than 60%) . Cap could possible not be detected because concentration is low--they want to conclude not detected because it is really not there (not just because concentration is low).
- Q: How many clinical samples to say these caps cannot be detected in the samples?
- Find amount of tolerance you are willing to accept (comes down to clinical community, what is small enough value to get your clinical community to agree that is it close enough to 0). Then reverse engineer number of samples to get an estimate of the prevalence under assumption that it is 0. Calculate expected confidence interval of prevalence, increase sample size to the point that CI is completely within your threshold.
- FH suggests using the 3/n rule. If one observes no events out of n trials the upper 0.95 confidence limit for the probability of the event is very close to 3/n. To have confidence that the unknown probability < 0.01 one would need to observe 0 events out of 300 trials.

- Q: Any way to compare slopes/rate of change for different pregnancy outcomes over time? A: Options - 1) counts 2) relative proportion. What is the overarching question? Can we predict abortion and surgical pregnacy loss volume from delivery volume? Possible time series data (empirical/count trends don't tell you about the future as well). May consider adding 2020 data (question may come up).
- Q: Surgical early pregnancy loss/abortions by trimester - is there any way to compare the slope before and after a specific time point. Hired a fellow trained faculty in 2016. A: Impact of specific event, in general, best to think about events and code them in the data before you look at any trend (ie. running measure that describes characteristics). Correlate running measures with running trends.
- Non-parametric smoothers in analyzing raw data.
- If you calculate proportions, assess how flat proportions are over time. If it is flat, then proportions are a good way to simplify the data down to. If proportions are constant, would be able to just "pull" proportion forward in time - not really a need for prediction then.
- Time series: for predictions, short term time trend - seasonality, keep it in quarters. For each type of procedure have both long and short term trend. Short term starts over each year. Model both as function of time. Use that to give estimates and project in to future with confidence bands. Have to assume quarters act independently.
- Are these lines impacted because of the population of pregnant women in Davidson county has changed? Try to get total numbers. If efforts are not effective then increase is due to population increase.
- VICTR vouchers tend to contribute to general knowledge, not specifically at Vanderbilt.

- Firearm injury prevention, T32 proposal, very beginning of proposal. Developed a multimedia training platform (SAFER) the help counseling tips and tools for pediatric providers nationwide. Evaluate effectiveness-implementation hybrid type 3 design to evaluate the platform. Survey to collect self-reported data, pre/post/1-month post. 5-point likert scale. Expect ~20% change for both questions.
- Aim 1 - How effective is the SAFER training platform at improving pediatric provider firearm injury prevention counseling self-efficacy?
- Aim 2 - One month after completing the training, to what extent are providers continuing to include firearm injury prevention counseling during routine annual pediatric exams?
- Q: Sample size needs? alpha 0.5, power 0.8, effect size 0.20
- Recommendations:
- Have delayed start for intervention (almost like stepped-wedge design)
- Create slider scale (0-100 continuous instead of Likert) for answers regrading self-efficacy (slider in RedCap) - have to touch the slider, make required field to force answer.
- Check in to pediatric statisticians, or possible VICTR voucher for help with analyses.

- Review of stat plan for VICTR voucher. Study is regarding disparities cataract surgery.
- Q: How much detail is needed in the stat proposal? A: Enough detail so that we can understand, definitions of variables, source of data. Specify statistical model, covariates. This blog may be useful: https://hbiostat.org/post/addvalue

- Predictive for vision problems in preschoolers. Outcome is binary, results of vision screen. Hope to report Odds Ratios. Suggest model data, probability of spacial measure. Could use restricted cubic splines, or linear splines.

- Sample code:

- Summary will be the picture.

- Want to understand water quality based on news articles. Use text analysis, break down to analyze large patterns. Most use people who read articles for content analysis. This is more big data approach. Aims: 1) What social, physical, and political factors influences whether or not an article was published ina county receiving news distribution about an SDWA rule from 2009 to 2018 2) Among countries receiving water quality-related news, what social, physical, and political factors influences the annual frequesny of published articles from 2009 to 2018. Data cleaning is complete, looking for recommendations for analysis. Currently looking at multiple regressions, 1 for each of the rules. Outcome: county distribution (count or binary) by year.
- Benefit of count data vs. binary - binary only if there is no difference between having 1 vs 6 publications. Only difference is no vs yes, not how many yes. Use proportional odds model for count data. Think about correlation structure of rows of data, do you need to take in to account clusters/random effects? Also keep in mind temporal issues related to newspaper distribution over time (volume of articles published).
- For later: look in to shared models; however, it is best to start the way you have done it as shared models are more complicated.

- Temporal impact of hypertension on diabetic macular edema
- Timing? Last known eye exam for those who did not develop DME is correct censoring time, in time to event analysis. Censoring time for those who developed DME is date of DME dx. There are not really 2 groups.
- Should we exclude those who do not have interval between date of DR and DME (ie. those not in our health system)? Would make it cleaner to apply medical home definition.
- Recommendations:
- Possible look in to time-varying covariate model - take first date of DR as starting point for everyone, time goes forward and see what happens. Variable number of records per patient (BP measurements), diagnosis of DME as outcome. Time: How many days it was since DR, interact BP with how many days it has been since DR. Would need to consider making sure patients have a window of time between DR and DME dx (ex. 6 months). Can have historical BP that has constant effect. Non linear decay - weight of historical blood pressures decreases as time goes on.

- Stuttering is characterized by frequency or standardized measurement to get severity. It is more complex than that. Our question - is there a way to look at subtypes of stuttering? Are there certain characteristics that tend to stick together? If we do get subtypes, do they correlate with contributing factors driving the profile? ~140 patients.
- Two-step vs. hierarchical cluster analysis? What this does is solve for centers of clusters, some clusters can be bigger than others. What matters is how close the individual is to the center of the cluster. People misuse cluster analysis due to individuals who are equidistant from 2 cluster means. Non-Heiractical cluster analysis may be preferred to avoid making a determination on heirachy.
- Are we looking at dimesions or subtypes? Is this a continium? Do we have an achor that measures impact of the characteristics? Something to scale against, the most "important" thing. What this could be is not clear for this topic, reasonable people may disagree. Stuttering presentations are diverse, team is trying to shift from outcome to categorizing patients to gain undstanding.

- Return visit to review analysis updates. This is a biomarker study to predict treatment failure relapse or death).
- Suggest using log normal model.
- Frank: concern about investigator created degrees of freedom. Also suggest age to be included in model.

- Wanting to describe legal outcomes for pregnant/post-partum women with opiod use disorder who have particpated in perinatal recovery program (VMARP). Exclude those who experience pregancy loss. Include those who delivered between 01/2017 to 12/2020. Data extracted from medical record.
- Questions:
- 1) our descriptive primary analysis plan (specifically what would be the best test for a count without categories). Looking to do descriptive straitified by outcome of interest (dcs referral, criminal penalties, housing/employment)
- First, will want to quantify what proportion 1) should have been referred and were, 2) should not have been referred and were not, 3) should have been referred and were not, 4) should not have been referred and were. Wilcoxon is for comparing two groups on contiunous or ordinal outcome.

- 2) our logistic regression plan
- Possible to concentrate power by ordering list of outcomes by severity. Look at peers/literature to see if that type of outcome is "usual". If they are all similar in severity, counting seems fine. If not, ordinal is better. Would be best to have clinical consesus from a group about severity/ordering of each outcome. Proportional odds model for ordinal outcome will handle count, binary, continuous, ordinal data and will allow you to adjust for items. Ordinal scale, the more categories the better.

- 3) what to do with missingness for our covariate data
- Administrative, Informative censoring. First thing is honest reporting, descriptive statistics that quantify how big holes in data are. What type of person has missing data alot? How does it vary with race, etc? Treat missingness as an outcome to predict it to try to learn patterns of missingness. Proportional odds model can be used to predict missingness as well as binary logistic model.

- 1) our descriptive primary analysis plan (specifically what would be the best test for a count without categories). Looking to do descriptive straitified by outcome of interest (dcs referral, criminal penalties, housing/employment)
- Recommend applying for VICTR Award for biostatistics support (90 hours). Application website (https://starbrite.app.vumc.org/) and research proposal template (https://starbrite.app.vumc.org/funding/templatesforms/). Please contact Tom Stewart with questions.

- Chronic graft-versus-host disease is number one cause of late treatment related death after stem cell transplant. Aims: 1. track temporal course of erythema and sclerosis 2. extent of skin involvement and survival (visit 1 & 2 erythema body surface area, added prognostic value). data from 9 centers, 2 populations (incident - enrolled within 3 months of diagnosis, prevalent - enrolled 3+ months since diagnosis). Visits every 6 months. ~185 patients.
- How to display data and summarize change over time for erythema?
- Recommend incorporating patient level information (random effect of individual)
- Desire to use cox model is desire to do time to event. When you have competing risk it is very hard to interpret cox model. Look in to state transition models.

- Stem cell transplant (40-70% die within 3 years-post transplant). Disease risk index is used to assess risk pre-transplant. Lack of predictors/biomarkers that can be measured post transplant. Adherent and rolling leukocyte be predictive of survival? DRI is a predictor of failure as well.
- Continuous A&R Non-linear model: Cubic splines assumes no threshold so can't use it to show no better threshold. Only way thresholds can be valid is if the 2 populations on each side of the threshold have homogeneity. Odds ratios/survival curves arbitrarily move all over the place when adding high values/low values when dichotomizing (extremely sample dependent). Keeping items continuous gets rid of that.
- Current literature is interpreting wrong, you want to contradict it. Choose one with lowest AIC from the 3 models. AIC tells you how powerful the fit is when you penalize it based on number of opportunities to not be flat. General principle is likelihood ratio is gold standard if not pre-specifying.
- Recommend transforming heavily skewed distribution of A&R (cube root) then complete 3 models (linear, cubic spline 3 knot, cubic spline 4 knot). Using cube root and cubic splines in essence undoes the cube root in the end.

- Should we reduce number of confounders? Should probably reduce number of categories on some variables (race/ethnicity)
- RR or OR? Multivariable analysis would be much better. Univariate ignores all possible confounding so not too useful. Run risk of analysis changing between abstract and manuscript.
- Significance does not really mean anything anymore. Would not present any univariate as it is not adjusted. Would only present test statistic and 95% CI if anything - not p-value.
- Software for Multivariable analysis - R, Stata, SPSS
- Medications in SD - not simple, can check with bioinformatics

Edit | Attach | Print version | History: r1 | Backlinks | View wiki text | Edit wiki text | More topic actions

Topic revision: r1 - 13 Dec 2021, DalePlummer

Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback