You are here: Vanderbilt Biostatistics Wiki>Main Web>Clinics>ClinicalHealthResearch>ThursdayClinicNotes (22 Aug 2024, JacksonResser)Edit Attach

Clinical and Health Research Clinic

Click here for 2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, and before.

Current Notes (2024)

2024 August 22
- Elena Bagatelas (Gabrielle Rushing), Neuroscience
- Han Su, Nursing
2024 August 15
- Geonte Jackson (Aaron Aday), Cardiovascular Medicine
2024 July 18
- Josephine Jung (Uchenna Anani), NICU
2024 July 11
- Elena Bagatelas (Gabrielle Rushing), Neuroscience
2024 June 20
- Kelsey Gastineau, Pediatric Hospital Medicine
- Alessandra Tomasello (Katherine Cahill), Allergy, Pulmonary and Critical Care
2024 May 30
- Lexi McKeown (Brighton Goodhue), Genetic Counseling
2024 May 23
- Tim Harris (John Trahanas), Cardiac Surgery
- Josephine Jung (Uchenna Anani), NICU
2024 April 25
- Makayla Hall (Gillian Hooker), Genetic Counseling
2024 April 18
- Eesha Singh (Jillian Berkman), Neurology
2024 April 4
- Madisen Cook (Jill Slamon), Genetic Counseling Program
2024 March 21
- Annick Tanguay (Melissa Duff), Hearing and Speech Sciences
2024 March 7
- Antonia Kaczkurkin, Psychology
2024 February 29
- Jacob Franklin (Allison McCoy), Biomedical Informatics
2024 February 22
- Niharika Ravichandran (Soha Patel), OB/GYN
- Eriel Confer (Angela Bonino), Audiology, Hearing & Speech Sciences
2024 February 15
- Mark Rolfsen (Wes Ely), Pulmonary and Critical Care
2024 February 8
- Marissa Khalil (Rondi Kauffmann), Surgery
- Mikaela Bradley (Gillian Hooker), Genetic Counseling
2024 February 1
- Andrew DeFilippis, Cardiology
2024 January 25
- Shayan Rakhit (Amelia Maiga), Surgery
2024 January 18
- Alvin Jeffery, Nursing (12:30 - 1 pm)
2024 January 11
- Serena Fleming (Sarah Stallings), Genetic Counseling
- Alvin Jeffery, Nursing

2024 August 22

Attendees: Frank Harrell, Cass Johnson, Jackson Resser, Elena Bagatelas, Gabrielle Rushing, Han Su

Elena Bagatelas (Gabrielle Rushing), Neuroscience

Follow up on figures for review paper on OCNDS

Prism computed an exact P value (0.0568), which takes into account ties among values. Note that most other programs do not compute exact P values when there are tied values, but would instead report an approximate P value (0.0540).

- P-value threshold traditional, but problematic. No excuse for using exact calculation when available

- Wilcoxon is exact in that it accounts for ties

Show proportions rather than counts -- concerned with ratio of grey to black

Don't use asterisks -- show p-value to three decimal places (

In writing: if p value is 0.01 -> evidence for difference in probability of response between the two groups (if binary outcome)

if p value right at 0.05 -> there is mild evidence...

if p value is 0.3 or something big -> there is limited evidence...

Count variable: want to show confidence limits rather than standard errors -- rectangle doesn't add any information (take away bar so confidence limits will show up)

Confidence interval going below zero highlights issue with parametric CIs -- doesn't go below by much so not a big deal (remove ns)

- Same issue with count variable -- bootstrap CI will work better than parametric

Bootstrap samples data multiple times -- CI will behave better (bootstrap "nonparametric" percentile confidence interval one of many variations)

Don't use fisher's exact test -- not accurate -> use wilcoxon exact; ordinary pearson chi square is better than fisher

Han Su, Nursing

I’m working on a grant proposal to study the financial burden on older ICU survivors using secondary data. We aim to compare financial burdens between ICU survivors and those hospitalized without ICU care, and identify factors associated with higher burdens. We’ll analyze individual data collected every other year before and after hospitalization until death. Financial burdens will be categorized as no burden, high burden, catastrophic burden, or death. Some covariates, such as rehospitalization during follow-up, will change over time. We’re seeking advice on the best statistical modeling approach for this study.

Case-control study design -- case = ICU admission, control = no ICU admission

Primary outcome: financial burden -- out of pocket expenses < 20% = no financial burden, 20% - 40% = high financial burden, >40% = catastrophic financial burden

Secondary outcome: mortality

Covariates...

Want to model longitudinal and investigate in-person difference

Problem with splitting resilience into quartiles -- outer quartiles are wide, but you count them as the same

- Make sure you analyze resilience continuously

Hypotheses worded asymetrically -- non ICU hospitalization controls -> hospitalization survivors

Want to control for number of prior hospitalizations and comorbidities (Elixhauser) -- want comorbidity index that is high resolution (not charlson)

Population restricted to age 65+ participants with medicare

Follow-up every other year -- measure cumulative new experience from last year

Issue with death: early death means reduced cost

Order and use in state transition model

- If you didn't die what was the cost? Order by non-fatal cost

Data is family-wide but death is one person

In most analyses, death is an absorbing state -- but not here, where family continues to get bills in the mail

Financial burden -- how many dollars spent in the last year relative to family income in the last year

- Putting ratio into categories lowers sample size -- thresholding loses a lot of power

Leave ratio continuous

VICTR studio

VICTR voucher with 90 hours of help -- takes a month or so for statistician to be assigned to our project

2024 August 15

Geonte Jackson (Aaron Aday), Cardiovascular Medicine

Systematic review of the prevalence of reporting of medical therapy in peripheral artery disease clinical trials. We want to know the percentage of trials that report baseline medical therapy at all, and of those trials that do report it, what is the percentage breakdown of the various medications that are taken (ie 30% of patients were on aspirin at time of trial).

We have a question regarding the best way to report and display these data. The purpose of this project is to identify the need for future investigators to report medical therapy data even in trials studying devices or interventions. Basically, it may add a new facet to their outcomes when medical therapy is reporting in the patients they are studying.

Attendees: Frank Harrrell, Geonte Jackson, Aaron Aday, Jackson Resser, Cass Johnson

Investigators are primarily interested in meta-regression. Chapter 8 of Welcome! | Doing Meta-Analysis in R (bookdown.org) does cover Meta-regression.

35 studies; let's say 10 achieve a goal / try a therapy, then effective sample size is close to 10. Therefore, number of things you can look at to predict will be limited.

Wilson interval: proportions could be calculated during clinic, if there's a small enough number (5-10, perhaps).

Please note potential confounders that may be adjusted for: publication year, publication quality, investigator group.

Investigators plan to return when they get further along in the project; scheduling on Thursdays will be ideal so that Frank / Jackson / Cass are sitting in on clinic.

Select a time · Biostatistics Clinics (youcanbook.me)

2024 July 18

Josephine Jung (Uchenna Anani), NICU

I met with the clinic a few weeks ago about a project titled Neonatology Attitudes and Practices for Neonates with Kidney Failure. We were hoping to meet again to generate the randomization excel spreadsheet to import into redcap and review the survey questions for any framing biases.

Forgo randomization -- send out survey without mortality data

Walked through survey on call

Capture age and years of practice as continuous variables rather than as categorical ranges

NICU level -- capture highest level

Survey will be sent to neonatologists in listserv

- N = 500 neonatologists in undefined number of centers

How much duplicate center-level information will each survey participant provide?

Survey looks a bit long -- survey fatigue is a danger

If partial surveys are recorded -- make sure key info like demographics are first -> do analysis based on those who bailed out, see if it was a random sample

- Participants who respond may be extremely passionate/dispassionate of topic

Problem with listserv: can't tell how many had opportunity to respond

If response rate < 80% -> cause for worry

Correlational analysis -- focus survey on decision-making and attitudes

What incentives you can use to get people to respond

- Frank's two-dollar bill survey

Order demographics in survey from most to least important

Next steps when you have data: stat help from pediatrics dept

What types of variables will you be examining the correlation between? Binary, ordinal, etc.

- More levels of ordinal variable -> spearman is best choice

- Use rawest form of the data -> correlate matter of degree of one variable with matter of degree of another

If something does not apply to someone, best to omit

2024 July 11

Elena Bagatelas (Gabrielle Rushing), Neuroscience

I am an intern at the CSNK2A1 Foundation, and we are needing some guidance on analyzing natural history data for publication.

Incidence of disorder (Okur Chung Syndrome) of interest is one in 100,000
Start with dataset of 220 -- exclude those negative for the variant, include those with c-nomenclature (n = 85)
Pooling patients and groups in current analyses (Loop, Non-Loop). Biologically motivated.
Can data be kept closer to its original state? Or are these mutually exclusive categories needed? For example, should these be presented as seizure (Y/N), sleep problems (Y/N), etc. Creating exclusive categories doesn't scale very well -- adding one other attribute, for example, creates many more categories that gets hard to handle.
Non-Loop: n = 8. Loop: n = 30. We don't usually recommend doing statistical analysis with small samples such as this; the p-value is only meaningful when small. However, confidence intervals (or confidence limits) may be a great goal for your statistical analysis. This will "honestly" present the uncertainty resulting from small sample sizes.
Let's say you start with a two by two table -- perhaps a confidence interval for the difference in proportions (Loop vs. Non-loop) would be a good place to start.
Dependant Variable: Seizure. Independant Variable: Loop (Y/N). Proporiton that have had seizures in loop and non-loop -- find proportions for each group, and a confidence interval for the difference of two proportions.
Proportion w/ sleep problems, loop vs. non-loop, etc.
Presenting as confidence intervals vs. p-values will help immensely. Focus on combinaiton effects as opposed to interactions.
To really emphasize the "Grand Total" number of symptoms may be best for that analysis -- include GI category, neuro-related, etc (focus on broad patterns) and present in the dot plot to avoid breaking the data down too finely. Give mean counts by Loop - Non-Loop. Note that medians are probably not the best choice -- since you're working with counts, mean is the safer choice.
Ideal number of non-loop patients? The more the better, but 30 tends to be a minimum.

2024 June 20

Kelsey Gastineau, Pediatric Hospital Medicine

The goal of our project is to understand the feasibility and acceptability of firearm safety counseling and utilization of storage devices when offered to families of children admitted to the Behavioral Health service at MCJCHV. I would appreciate assistance in optimizing our statistical analysis plan given this is a non-randomized pilot trial.

How useful and effective are secure storage counseling and secure storage devices?

Firearm owning guardians of youth admitted with mental health needs -- receive 5-10 brief injury prevention educational session

Aim 1: evaluate utilization

Aim 2: assess feasibility and acceptability

VICTR research proposal submitted a year ago

Do folks actually take these devices?

Primary measure: proportion of eligible families who self-report using chosen device at time of 3-month follow-up

Feasibility: compare enrollment rates, recruitment rates, completion rates

Big questions:

- improve rigor of SAP

- confirm appropriate data collection process

Expected sample of 180 guardians

Key word: "report" -- distinction between reported activity and actual activity (actually storing firearm securely?)

- Social desirability bias

Rural vs urban distinction

No gold standard way of asking survey questions -- pictures of firearms not feasible

Recognize biases that inherently lie in self-reported storage activity

Push confidence intervals -- not so much point estimates and p-values

Wilson interval method -- R package

Numerator: those who take a firearm storage device

Denominator: enrolled and had full opportunity to take storage device

wilson confidence interval when it's a regular proportion

concordance between historical and current use -- __ confidence interval for difference in two paired proportions (McNemar 's test)

Mcnemar's: yes's vs no's

Frank's Hmisc R package binconf

Prospective study -- pre period, post period 3 months later

- Cannot have any drop-outs -> drop-outs don't occur at random

Biased non-response to follow-up is fatal

Alessandra Tomasello (Katherine Cahill), Allergy, Pulmonary and Critical Care

We propose a prospective, randomized, parallel-group, single-site two-period study of ICS/SABA rescue and Physical Activity (PA) in adults (age ≥ 18) with physician-diagnosed asthma. Primary outcome is mean daily steps.
I would like to discuss Statistical Analysis.

Therapy for asthma

mHealth intervention -- receive interactive text messages

Compare step count among participants who receive text messgaes vs those who don't -- no cross-over component

Sample size of 32 patients (16 in each group) to detect an increase of 2488 steps after intervention assuming a standard deviation of 3241 steps (sd so wide that negative step counts are possible)

* Step count assymetric -- sd may not be best measure of variation, nor normal distribution the best distribution for step count

* Floor effect

Sample size more effective if step counts measured daily

No established minimally important difference -- dig a little deeper/survey disinterested investigators

Increasing mcid until sample size is feasible -- not a good approach

Frank thinks there is a statistician assigned to allergy/pulmonary and critical care division

If no existing collaboration, VICTR voucher is an option -- disadvantage that you get a new statistician each time and slow to start

Issue with keeping people enrolled -- drop-out rate important issue

SMART (Sequential Multiple Assignment Randomized Trial) design -- used frequently for smoking cessation studies

8000 step goal -- asthma patients often fail to achieve that

Area that is ready for advancement -- VICTR studio, more organized and multidisciplinary clinic

* Could be useful if disciplinary literature is limited -- exposed to new ideas and alternative design methods

2024 May 30

Lexi McKeown (Brighton Goodhue), Genetic Counseling

Utilizing a survey, we will gain insight into the current postnatal depression levels and coping efficacy of this patient population in OB/GYN Clinics at VUMC by employing the Brief-COPE and EPDS scales, respectively. We will further ascertain interest levels and recommended facilitation methods in this survey by utilizing predominately closed questions crafted by the research team. With the knowledge obtained from the patients within this population, we hope this project will provide valuable insight for the implementation of a future support group.

Our main questions surround statistical analysis as well as power calculations. We want to ensure that our survey questions and statistical analysis are adequate prior to starting data collection.

Attendees: Frank Harrrell, Alexandra McKeown, Brighton Goodhue, Cass Johnson

Pending IRB approval, but look to recruit via MyHealthatVanderbilt.
Getting an accurate count of the number of women receiving that message is important.
Representativness of those who respond to the overall population (Does age differ? Race?)
Giving thought to this beforehand, and knowing which variables you would like to compare, will help. Non-reponse to survey is a potential source of bias.
Survey will be given via REDCap. If you would like, you can bring a draft of the REDCap survey to clinic for a second set of eyes.
Information-filled questions: Did you graduate HS (Y/N) vs. the better option, years of education. Just a general example
If asking about feelings, REDCap has option for slider buttons -- more continuous that Likert, allows for 0-100. Please note that patients would need to touch the slider button, otherwise it will automatically registered at 50 and it is difficult to tell if the value is missing or a true value of 50.
Estimation and CI's are probably the best goal for surveys, as opposed to hypothesis testing.
Question from Lexi: Power analysis? This is almost never directly applicable for a survey. What's more typical is you want to calculate the margin of error in estimating something. Assuming a 95% CI, half the width of that interval would be the margin of error. for example, 37% plus or minus 15% -- 15% is MOE. Please note that to estimate non-continuous variables, like binary Yes/No responses (with 10% MOE) is 96 individuals, with 5% MOE would be 384. That can be a goal for enrollment, as estimating non-continuous variables can be the toughest / worst case scenario.

2024 May 23

Attendees: Frank Harrrell, Cass Johnson, Jackson Resser, investigators

Tim Harris (John Trahanas), Cardiac Surgery

Heart transplant allografts at VUMC are now procured and stored in a new manner as of July 2023. We would like to compare the impact of this new storage method to our historical method on how the hearts function immediately post-operatively as well as short-term outcomes.

New cooler (traferox) -- perception: longer duration, better preservation of hearts

Treatment for heart failure: heart transplant

Current standard of care: transport heart allograft from donor to recipient on ice -- recommended time limit is 4 hours (due to risk of primary graft dysfunction)

- 2021 study showed lungs transported at 10 degrees celsius had better outcomes -> same for hearts?

Traferox used starting on 2023-07-23 -- store hearts at 10 degrees

- N = 77 hearts -> 52 after exclusion criteria

Aim: compare prevalence of severe PGD in heart transplants

Study design: propensity match 3:1 control -- transplants from 2/2020 to 7/2023

Primary outcome: evidence of severe PGD

Secondary: markers of intraoperative and postoperative performance

Sub analyses: of all 10 degree hearts with an ischemic time > 4 hours, is there a difference? of all 10 degree hearts where donor heart > 40 years old, is there a difference?

All or nothing -- no phase-in period between device usage

- Phase-in allows for stronger inference. All or nothing allows for hidden time trend to have effect

No close calls with yes/no outcome -- sample size needs to be larger

Higher resolution variables are better (added sensitivity)

Could be a problem if matching algorithm is order-sensitive

Each patient matters -- avoid matching methods that would delete patients

If change procedure such that risk can be tolerated, people will take on more risk and benefit is decreased

- Example: instant brakes on a train, more speed

At same length of cooling time, was there an advantage?

Propensity score only needed when # of items to adjust for is large, but you lose ability to ask if there is a differential effect

Initial analysis using just iced hearts -- assess whether there was any time trend at all in outcome measure

- If so, outcome contaminated

- Go as far back as you can (will help a moderate amount), adjust for procurement time, model shape and slope of time trend

Proportion of DCD: 50%

Next Steps: potential VICTR voucher -- up to 90 hours of help

Change outcome to multi-level ordinal variable

Josephine Jung (Uchenna Anani), NICU

This survey will collect individual demographics, center demographics, and neonatal attitudes regarding prenatal counseling and management of babies with Chronic Kidney Disease. There will be 2 study arms- 1 without any center specific data/statistics regarding outcomes for CKD and the other with this data.

I would like to see how many participants I would need to make the outcome statistically significant.

Attitude survey

Randomization algorithm in REDCap -- tricky with blinding/blocking

Survey population: neonatologists

Expect for respondents to have more optimistic view than data suggest

Expected sample size: hoping for 300-400 respondents

- Restate sample size question: sample to estimate proportion within an acceptable margin of error -- is margin of error small enough that we can say we learned what we need to learn

- Signal could be trivial

Create randomization list longer than you would ever need, make it reproducible (define randomization seed, hide it somewhere)

Framing/wording of survey questions = important!

- Get as many eyes as possible on it before launch

Schedule future clinic meeting to review survey questions

2024 April 25

Makayla Hall (Gillian Hooker), Genetic Counseling

I am doing a quantitative study analyzing attitudes toward genetic testing among parents of children with inflammatory bowel disease (IBD). I will be sending a survey to parents of children with IBD that are seen in Vanderbilt's Pediatric Gastroenterology, Hepatology, and Nutrition Clinic. My aims are to assess attitudes and beliefs toward genetic testing among parents of children with IBD, assess the impact of family history on attitudes and beliefs toward genetic testing for parent's children with IBD, and assess the impact of previous genetic testing experiences on the attitudes and beliefs of parents of children with IBD. We will be doing a bivariate and multivariable analysis on this data. We need assistance on a power calculation for our study to identify how many people we would need to identify to analyze attitudes toward genetic testing in parents of children with IBD.

Attendance: Makayla Hall, Gillian Hooker, Frank Harrell, Jackson Resser, Cass Johnson

Mixed-methods study -- Parent's Perception of Genetic Testing:

How are people surveyed? What are you expecting to collect?

Aout 500 - 600 patients in patient population. Tracked in Improve Care Now database. Estimate of 100 patients that will respond and be able to be included in sample.

Subset of very young patients that have all had genetic testing.

Nonresponse rate will be an issue you have to contend with. The reason for non-reponse is important.Anything you can do to understnd non-reponse, or give some kind of incentive, would be useful. Is there anything you can learn about how typical responders are vs. non-responders?

Making survey brief is also important for responsiveness as well.

Online survey -- through REDCap, message sent through My Health at Vanderbilt.

Are any questions asked in manners of degree? Yes, some are Likert scale, some open-response.

REDCap gives the option to use a slider -- that can give better degree of variation, help break ties. These can be a great option to use. Calculating mean of 0-100 can give better info. Although if you can instruct patients to click the scale, even if they agree it would be at 50, that is best -- otherwise, REDCap will just mark it as 50, and it can be difficult to assess if the participant skipped the quesiton vs. truly agreed the answer was at a 50.

Makayla describes a previous study that is very similar, but done on adult patients (very descriptive). Driver of her research questions, as genetic testing is actually done in pediatrics, and there's no research in how primary caregivers of these patients (parents) feel about genetic testing.

Another study describes identifying predictors of positive attitude towards testing, which would be an ideal characteristic for her Master's thesis.

Frank's major point: main concern is not power, but representativeness and how trustworthy it is. Not a hypothesis testing framework, but instead looking at confidence intervals on things like means and proportions.

For multivariable analysis, large sample sizes are required. For example, to estimate one proportion for a binary variable, you need about 96 participants for an overall proportion. If you want a margin of error plus or minus 0.5, you need 394 people. Multivariable is more extreme than this. To predict yes or no here, you need thousands of patients.

In smaller sample sizes like 100, you'll need a very high signal-to-noise ratio if you're doing something complicated. Non-continuous variables have much less signal-to-noise ratio. Being able to measure something as continuous or ordinal helps immensly in helping you do more complex analyses with smaller sample sizes.

They are interested in predicting how positively people feel about genetic testing for their multivariable model, and looking at predictors for that attitude. Those REDCap sliders come into play here -- that will help to be able to treat those predictors as continuous where possible.

Another suggestion from Frank -- descriptive analysis for how each patient factor correlates with the degree of feeling positively or negatively, and add CI's to those correlations. May or may not serve your ultimate goal, but could be a different strategy.

CI's help tell the reader the limitations of your sample size, which is great for good research.

Also, you can ask the question -- "If you had many patient characteristics, which ones are "winners" and which are "losers" when it comes to predicting response?"

Adding CI's here is also very helpful. Can help you find if there's a "smoking gun", one dominant characteristic helpful in predicting sponse. Check out Bootstrapping: Biostatistics for Biomedical Research (hbiostat.org)

Look for "precision" here -- we can get a margin of error (half the width of a CI) for estimating a correlation coefficient, etc. 400 pairs are needed to do this well, but again, the CI will help describe your level of certainty and provide transparency.

Or, Frank's RMS course notes / R code examples: Regression Modeling Strategies (hbiostat.org)

Suggestions for R Workflow for Reproducible Data Analysis: R Workflow (hbiostat.org)

2024 April 18

Eesha Singh (Jillian Berkman), Neurology

We would appreciate help in refining our proposal for analyzing the relationship between the management of Moyamoya and social determinants of health

Retrospective study, adult patients with Moyamoya since 2006 -- N = 350

Aim: medical & surgical management

Surrogate markers: cholesterol, worsening stroke

Impact from social determinants, ADI

Prelim analysis

Long list of variables team is collecting

Scope of study: causal or explanatory

- Example: inequities in opportunities for women -- looked like women were receiving fewer opportunities to be engineers, but that was because fewer applied

Have to deal with confounders with causal

Investigators are looking for somewhere in between explanatory and causal

One overall outcome: recurrent stroke after initial presentation

For patient to enter cohort, must have had prior stroke (could be old or recent)

- Date of prior stroke not reliably measured; we do know date of acute presentation at VUMC

Inclusion: VUMC patients since 2006 with Moyamoya

Exclusion: one time visit

Surveillance for finding new occurences of stroke: routine imaging after surgery; more frequent if symptoms

Time to first recurrent stroke:

- For patients with no documented recurrent stroke, documenting last time you know their status

Patient stops being followed because they're failing -- would be fatal

Cumulative incidence -- don't have to have strokes

- To learn other relative things, will need recurrent strokes

- Rule of thumb: need min of 15 recurrent strokes to study one variable -- with estimated 30 strokes, two variables

- Additionally need to have some in each group (if you want to study sex, need some male and some female)

There are black box methods, but may not be interested as they are not interpretable

- Put hopes that low dimensionality data will have enough info to learn what you want to learn

Increased risk given ADI

- Interpret ADI as unique risk factor for stroke and rule out explanation by something else (adjust for confounder)

- Collect variables you don't want to evaluate on their own, but adjust primary variable for -- propensity score adjustment

- What does having a high ADI go along with?

Put propensity score in model, how much of stroke recurrence explained by actual ADI?

Percentiling makes sense when there is competition

- Grade on curve: beat other students in your class, not absolute knowledge

Investigators will look into ADI literature

- ADI percentile may be best metric given that you can't derive raw values (which would be better)

Biostat support = $5k, 90 hours of help

- Scope of help: publish one paper

2024 April 4

Madisen Cook (Jill Slamon), Genetic Counseling Program

I will be surveying sperm donor recipient parents to see what information they value in selecting their donor. I plan to present the information in a matrix survey for them to rank in importance. I will also utilize a tiered approach to determine if they interacted with genetic counselors at any point in the process and if they found that information helpful.

This study will be mixed-methods as I will also include free response portions to collect contextual data. Our questions are what softwares would be best to analyze the data after collection, what types of statistical analysis would best support any patterns seen, and how could we calculate power for this study/what sample size should we aim for?

Rather than a 5 - option Likert scale, recommend using a slider scale on REDCap with no numbers shown, but has an internal scale of 0-100. Labels on the slider will be "Not at all important" and "Highest possible importance". Note that the slider will default to 50, make sure to tell participants to move the slider to the desired location even if they wish to have it somewhere in the middle. With this data you can estimate the mean for each question and a confidence interval. At a higher level, you could also use bootstrapping to determine the order of means of the questions, which would allow you to rank the importance of each question.

Consider randomizing the order of questions so that if people don't fill out questions at the end, coverage of all questions is still achieved. Still make sure demographics are asked at the top. Need to make sure the survey can be completed quickly to ensure high participation.

For sample size calculation, would need an estimate of the standard deviation in order to estimate the sample size needed for a particular margin of error for a continuous outcome. Without this SD estimate, you could make a conservative estimate using known sample sizes for binary (yes/no) questions. A sample size of 96 will give a proportion estimate with a margin of error of + or - 0.1 (10%), and a sample size of 96 x 4 (384) will have a margin of error of 0.05. As the number of response options increases we get more resolution, less ties within the data, and therefore more power.

Statistical software suggestions: SPSS, Stata

2024 March 21

Annick Tanguay (Melissa Duff), Hearing and Speech Sciences

I am running a study using a new technique here, magnetic resonance elastography (MRE; collected in MRI scanner). To provide a good basis for future studies, we want to replicate a previous study AND we want to add an extension to look at sex-differences (it's basically a correlation between a single MRE measure and performance on a memory test). Because of the cost of MRI ($600/hr), it can be only a small sample size (total 50). Based on conventional power analysis, this is enough. I am requesting VICTR funding. I have been unable to satisfy the statistician and I don't fully understand the issue, so I would like to consult you! They wrote in the rejection letter that the stats section was ill-defined and "PI needs to show the margin of error in estimating r with the planned sample size, under the worst case scenario where the true correlation is zero and then justify the planned sample size after describing that margin of error." There is also concern around confounds (which is addressed). It's essentially a big data issue due to the neuroimaging component, but we have a really simple analysis plan (i.e., a correlation, because that's what the replicated study did).

Attendance: Frank Harrell, Annick Tanguay, Melissa Duff, Cass Johnson

Sex Differences in Hippocampal Viscoelasticity and Relational Memory: A Replication Study

We know relatively little about brain health in women, despite well-established sex-differences in episodic memory. There are also sex diferences in the hippocampus.

Magnetic Resonance Elastography shows some promise -- determine how elastic a tissue is in the brain (found to correlate with memory tasks in prior study). The present study by Annick and Melissa hopes to replicate this study and investigate potential sex differences.

Confounders will be controlled for (neurological conditions, regular menstrual cycles in women), questionnaires will be used to gather additional data.

Statistical Analyses: Partial Correlations (controlling for age and educaiton) within each group (male, female) between viscoelasticity and SR Shape, SR Object, etc.

Feedback: Lack of well-defined statistical section,

Frank's Comments:

Statisticians can improve in how hypothesis testing is marketed / taught. Correlations are almost never 0; this is less of an existence hypothesis, and more a matter of degree.

Not "does there exist a difference", but "how big is the difference"?

We want to estimate an effect whether or not you're willing to assume there is an effect -- hypothesis testing shouldn't be used as a method for screening here, particularly with a smaller sample like the current case. A confidence interval should be calculated to help factor in sample size.

"Is there a sex difference that researchers should care about?"

Partial Correlations -- removing the effect of anoter variable.

What would inform readers of the research would be, what is correlation for men, correlation for women, and CI for both, as well as the difference between both and uncertainty between how different the correclation is between men and women.

Resource: Biostatistics for Biomedical Research \x{fffd}\x{20ac}\x{201c} 8 Correlation and Nonparametric Regression (hbiostat.org) (Specifically, Chapter 8, and figure 8.5: "Margin for error in r estimating the correlation, when correlation is 0, 0.25, 0.5, 0.75")

Frank would usually recommend we do the calculation for the "worst case scenario", when your correlation is very small. If, in truth, the correlation is very small and there is a small sample size, the margin of error is 0.4 or greater. Does that result in you knowing more than you did before the study began?

If you wanted to calculate the difference in correlations, that requires 4 times whatever sample size is needed for one correlation. The nice thing about CI's is that it is very "honest"; if there's little information, that is an honest way of representing what you know and what you don't know. A better approach than an existance hypothesis, in this case.

If between subject variance is small (women appear alike in characteristics), and technical replication is very good, then you could expect precision to be a bit better than what is shown in the graph.

Annick's Question: Being explicit about what we do if we are disappointed -- what the next step would be -- would that be helpful? Frank says yes; confidence intervals will be shown regardless of the calculated correlations (for example).

Melissa's Question: What's the likelihood of a good return on VICTR's investment -- if we calculate CI's, is there a window where things look positive vs. negative? May help conserve resources. Frank points out that they might also look for whether the study will give good data and foundation for future work, that would be factored in. Clarifying what exactly the results will be used for would prove advatageous for you; it's definitely not hopeless. Rejection would happen if, for example, something is measured very crudely (whether a part of the brain is actuvated vs. degree to which it was activated, for example. Binary outcomes w/ a sample of 20 would result in incredibly large margins of errors that would result in rejection).

plotCorrPrecision in R -- looking for a difference in correlations is a bit more complicated, but isn't necessarily covered in that chapter. Fisher's Z transformation of r will be helpful for this. If you also included that, if you had solid evidence that your correlation is greater than 0.25, you might be able to use 0.25 as the worst-case scenario instead of 0.

Frank suggests removing the classic power analysis portion of first paragraph of power analysis.

Another comment, on Annick's earlier slide on goal of individualization: this is incredibly difficult to do. To do personalized medicine on a solid foundation, you need deeply rich data.

Annick's interpretation: By having a population measure, we may have a sense of what would ork best for them (female vs. male, age, etc).

Melissa's Note: Let's say there are two patients with TBI, that look similar on several characteristics, but outcomes are very different; as a clinician, we have no reliable way to describe individualized outcomes.

Group-Level Interaction Effect: maybe keep in mind that when studies are designed to compare two groups, they barely get enough to estimate clinical impact, let alone adjusting for confounders. The sample size needed to estimate an interaction effect is 4x greater than a simple effect. Under some assumptions, this can be up to 16x greater (to test the effect, get evidence for differential effect existing)

2024 March 7

Antonia Kaczkurkin, Psychology

I am writing an R01 proposal examining two continuous variables (distress and fear) and a continuous outcome (brain activation). I predict that as distress symptoms increase, brain activation will decrease and this effect will be stronger in distress than fear. I can run a regression with distress and brain activation and another with fear and brain activation, but I\x{fffd}\x{20ac}\x{2122}m not sure how to compare distress and fear without making them into dichotomous groups (which I\x{fffd}\x{20ac}\x{2122}m trying to avoid - I want to keep all variables continuous). I\x{fffd}\x{20ac}\x{2122}m looking for an approach where I can say that a continuous measure of distress shows significantly lower brain activation than a continuous measure of fear. I am also interested in looking at sex differences.

ABCD data - 11868 youth followed over 10 years

- 9-10 years of age at start, data collected annually

- Right now, have data for first 4-5 years

Variables of interest: sex, gender identity (youth report = skewed, parent report less so), puberty (skewed at baseline as expected, more normal as participants age), distress (depression), fear, brain activation (collected every other year)

Aim 1: look at mechanisms underlying distress & fear

- Predict distress will show deficits in positive valence (blunted reward responsiveness)

- Predict fear will show excess negative valence (greater threat responsiveness)

Investigator plan: Brain = covariates + distress, Brain = covariates + fear, compare coefficients

- Want to avoid group-based analysis

Correlation between distress & fear is high

Could include 18 items in a scale to predict brain using a shrinkage method (like ridge regression)

- Frank likes sparse principal components analysis

- With correlation of 0.9 between distress and fear, variables would be inseparable

Compare big model to submodels, leaving out one covariate at a time

- Frank book chapter, added value and adequacy index (towards end of chapter): https://hbiostat.org/rmsc/mle

- False discovery rate doesn't consider false negative rate (Frank really doesn't like FDR)

- FDR gives people false sense of comfort that ones you've chosen are winners, and ignores possiblility your losers are actually winners

- Alternative: convert to bayesian analysis (simple bayesian prior distribution for effects)

- Bayesian posterior distribution gives you evidence in all directions

Orthogonality restriction could keep distress & fear from measuring what they need to measure

Aim 2: How distress and fear change with age and pubertal development

- Linear vs exponential vs other

More helpful to think of measurements at dates rather than yearly measurements (since measurements are not taken the same time apart)

- Fixed effect for time, time-dependent covariate in puberty status,

- Handle correlation (optimum power if specified well -- serial/AR1 usually works well) and time-response profile (spline function, e.g. restricted cubic spline)

- Frank book link: https://hbiostat.org/rmsc/long

Calculate R^2 where model allows variable to be linear/non-linear, compare them

ChatGPT: combined model matrix algebra doesn't work

Specify time-response and get confidence bands

Aim 3: Estimate sex and gender identity differences in distress and fear

- AOV not great option with correlation structure

Chunk test -- example in handout

Dichotomizing gender identity could make it worse -- depending on choice of cut-point

- Want to treat ordinal predictors as ordinal

If sex of person predicts trajectory, trajectory predicts sex

2024 February 29

Jacob Franklin (Allison McCoy), Biomedical Informatics

The project is clinical evaluation of the utility, usability, and impact of a pilot trial using ambient AI documentation vendor solutions. I would like to discuss how to statistically analyze the survey and various clinical electronic health record metrics for each vendor solution and then how best to compare amongst vendors with the different pilots.

Attendance: Jacob Franklin, Allison McCoy, Frank Harrell, Jackson Resser, Cass Johnson

Goal for Clinic:
Scoping; give project explanation, data they will be accumulating, and get analysis recommendations for Jacob's own analysis and what he should seek help for.
Autogeneration of notes for physicians, PA's, etc. Want to evaluate utility and usability of vendors for this; how dies it work, is it good, can e make recommendations to executive leadership.

Data:
Four IRB-approved surveys - how many questions can they get pilot physicians to answer. EPIC will document reports and data. Start with 10 physicians, then maybe 20-25, etc.
Between 100-200 metrics per physician, per month in an excel spreadheet - they are interested in about 20. "Buckets" of data that are generated - 10 are around notes, specifically. So those are of interest - also, length of physician days.
Month or so delay for the metrics, but REDCap survey results will be immediately available.

Pre-Survey Instrument:
Many are Likert scale, 1-5. There are a few Yes / No and minimal free text.
Two free text potentials - if you can anticipate a few reasons and then provide an "Other" option with a free text option to help narrow down options. You may not be able to draw statistical inference, but you can describe your patient population if needed in the publication.
Post-Survey Instrument:
Average documentation time saved, per patient visited - this is more a subjective number, the objective data/ average will be derivable in the dataset. You could make this a slider on REDCap - coded as 0 - 100, but the physician wouldn't see that. Helps break ties in assessment. Note that a physician must actually use the slider, otherwise REDCap will record it as 50 automatically.
"Why did vendor not help you in this way" - you would likely use the binary variable aove it in actual analysis.
"How many more patients would you be willing to see per clinic session, in order to keep vendor" - Jacob may pursue slider here - he will see if he can figure out making the slider from 0 - 10 or another reasonable number, if he can't, Jackson and Cass may reach out to others who work in REDCap design.

Analysis:
Per Frank, it's more informative to give the mean than it is the percentages of Likert responses. Confidence intervals would also be helpful. Showing a distribution of responses via charts would also prove helpful for the reader.
Representativeness of respondents is incredibly important. Jacob is optimistic that there will be a high response rate in the pilot, but if missing data occurs, that will be an issue.
You must know who was targeted so that you can determine who responded (if missing data occurs).
Jacob will know who the pilot users are; is there a systematic approach to understanding who didn't respond if it occurs? Seniority, area of specialty, demographics like age and sex, hours worked.
45 questions could be a bit daunting as far as time requirement. Having most important questions first, and demographic information if not available elsewhere, could be useful.
Note that pre and post surveys will be paired, and physicians will fill out both. Keep in mind that pre-post design has limitations, especially if there's a long turnaround time or you end up with individuals who did not respond to post survey.
Casting a larger net with sample size could be helpful to increase power in case you would like to look at differences between demographic groups. Possibly 50 participants of different vendors.

Excel Data:
Several metrics of interest, including Pajama time. In this example, numerator is minutes outside of regular window, denominator is days scheduled. Multiple weeks for one participant; how much variability might there be in the number of weeks with responses per physician? Jacob expects this to be fairly symmetric, maybe 6-8 weeks per physician, but he expects to aggregate this data.
Mean with confidence intervals will be a strong tool. Spaghetti plots showing trajectory for each physician may be useful. One physician would be one lines, with each data point being a week; you would then have trajectories for all physicians for a specific metric on a chart. You could see if coloring lines by specialty is helpful for description. Normalization, standardization, time alignment; maybe just play around and see if anything serves the analysis question.
Is time of day important to you? Occasionally could help with interpretation. If you were to align based on time of day, lines may start at different times. A moving average, or an average over physicians, could both be useful to smooth these curves.

Note Composition method - Characters per week, per method. If possible, capturing patients transcribed per method could aid in interpretation, if at all possible.

2024 February 22

Attendees: Frank Harrell, Cass Johnson, Jackson Resser, Angela Bonino, Eriel Confer, Soha Patel, Niharika Ravichandran

Niharika Ravichandran (Soha Patel), OB/GYN

Influenza, tetanus toxoid, reduced diphtheria toxoid, and acellular pertussis (Tdap) and COVID vaccines are routinely recommended during pregnancy to prevent adverse maternal and neonatal outcomes. It is well known that pregnant individuals infected with influenza or COVID are at increased risk of severe illness and adverse perinatal outcomes compared to non-pregnant individuals. Prior research has shown that global COVID-19 vaccination prevalence in pregnant women is low. Multiple factors are suggested to be associated with vaccine uptake including age, ethnicity and social living conditions.

The purpose of this study is to conduct a preliminary analysis of vaccine uptake before and after the COVID19 pandemic at our institution and understand the determinants associated with decreased uptake.

Population: Pregnant patients who delivered Vanderbilt with at least one pre-natal visit

Questions from investigator: sample size

Compare rates pre-pandemic to post-pandemic

Use highest resolution data: address > zip code

- population density, median family income for area

Initial step: understand who is coming into clinic. Relevant to understand change in participant characteristics over time

- trend in median family income over time

- population density over time

- trend in vaccine receipt over time

Look at trends in raw form and adjusted form

- estimate prevalence of vaccine over time, adjusted for covariates

10 years pre-pandemic sounds good, but subject-matter knowledge should guide that decision

CDC - social vulnerability index

Potential exclusions: allergy to vaccine, fetal anomalies

List variables to adjust for, factors that would alter tendency to receive vaccine

Analysis methods: Logistic regression model (probability of uptake by time trend, age, address, etc)

LR: to estimate prevalence in a single group well, need sample size of at least 400

Think about as prospective cohort study

Potential next steps: VICTR voucher, VICTR studio

Potential second part: vaccines administered to new-born after discharge

Eriel Confer (Angela Bonino), Audiology, Hearing & Speech Sciences

We would like to create a cohort of children who received either received an ASD diagnosis or a speech/language disorder diagnosis by our department\x{2019}s clinic. We would like to be able to pull some data from EHR (diagnoses, demographics) but know that some of the audiological data will not likely be able to be pulled by your system (it\x{2019}s housed on a 3rd party system that then interfaced with eSTAR.)

Population: children with autism

- Look at population being discharged from clinic vs not

Big study question: how many visits did patients have, what information was used to make decision (four potential pieces of info that could be used)

- Clarification: what information was available for them to use

- Laying out things that were important to capture, make sure you can capture them accurately

Most children will have between 1-3 visits

History variables to understand current context

Potential exclusion criteria: facial cranial, certain ages, language, family/family history

Boys more likely to be diagnosed with autism earlier

- Age against something else

Goals sound more descriptive

- Estimate proportions of sample characteristics with confidence intervals

Make study questions as specific as possible (specific enough that it's possible the data may not be able to answer the question)

Combining EHR data with third party data

- Need to figure out what is extractable from third party

- Ask around dept

Sample size: to estimate prevalence of Yes's to +/- 0.05, need sample of at least 400

- Confidence intervals will be self-limiting

Get flexible time trend

- create windows of interest in overall smooth trend

- superimpose discontinuity

Interrupted time series analysis

2024 February 15

Mark Rolfsen (Wes Ely), Pulmonary and Critical Care

With CIBS biostatisticians we have created a prediction model for cognitive impairment following the ICU using logistic regression technique. My question is how we can adjust this based on the initial results (e.g. we have 2 outcome variables but might want to reduce to 1 outcome variable) and how could we turn this into a reasonable clinical tool/calculator? In general looking for an open discussion on CPM\x{2019}s to help guide next steps

Attendees: Frank Harrell, Mark Rolfsen, Rameela Raman, Onur Orun, Wes Ely, Jackson Resser, Cass Johnson

40 \x{2013} 60% of patients may have cognitive impairment, but having a model to help individuals understand their own risk would be new.

Logistic Regression \x{2013} using prespecified baseline and in-hospital characteristics. Outcome variable was either cognitive impairment or functional disabilities. Two models: one three month, one twelve month

541 and 465 patients per model, respectively

Outcome variable occurred in 50% of 3-month patients and 43% of 12-month patients

Calibration curve \x{2013} predicted probability of outcome vs. actual probability.

Clinical context \x{2013} loved one may be at high risk of impairment, inform clinical conditions or potential support options (Bedside tool towards end of hospital stay).

Questions from Frank:

External validation \x{2013} separate study of 300 patients. Same variables were collected. Is the development sample big enough to stand on its own, without validation patients?
One issue here \x{2013} inclusion / exclusion criteria could vary patient population. You could explore if there is a more narrow range of important predictor variables.
Drop off from internally-validated to externally-validated was .04 \x{2013} R squared measures may also be used. Frank likes 90^th percentile of absolute differences, as well.
Half patients had outcome \x{2013} the way you analyzed, we don\x{2019}t take into account whether a patient was very close to being cognitively impaired but wasn\x{2019}t. An overall scale (Frank will provide FDA talk; https://www.fharrell.com/talk/cos/) could be a path forward.
Maximum resolution outcome variable is best. Averaging ranks of two scales, using one scale to predict another to that one scale may be calibrated to another\x{2026} some additional discussion could be done to help here.

Question from Mark: How do we feel about taking these tools and boiling them down to a usable bedside tool?

Support study, end of life decision making; if personalized, reliable survival curve is provided, provisions didn\x{2019}t make as big of a difference as one may think. Misinterpretation of results was common, and patients didn\x{2019}t latch on to risk scale.
Median life expectancy would maybe have been more effective.
The proposed model will give a risk-based estimate, but volume, amount of cognitive challenge, other unit of measurement. may be effective for patient interpretation.
Wes\x{2019}s summary: if you take away from this paper that we enable a clinician or team to tell patient that they have X% likelihood of new brain disfunction, and they way we would handle that is support / classes / etc, that would be a win. Calculator may be less important; a distillable statement may be preferable.
Mark\x{2019}s concern: Many people will fall into a category that, rehab may be needed \x{2013} maybe a very small low-risk and small high-risk population.
Also worth noting that this is a survival-only model. Ordinal longitudinal analysis may be communicated as % of being included in a cognitive level or worse; if you\x{2019}re excluding the people that die after counseling, as in the current analysis, that is likely misleading.
That would align with being prospectively defined. Chance of being in good cognitive function + alive, poor function but alive, dead.
Having one scale would make the biostatistical problems easier to solve. Can estimate median scale for a person.
If the model was to be changed to be just cognitive impairment, but death was included, with an ordinal scale \x{2013} patients would have a median. Challenge would then be in communication to patients.
Frank\x{2019}s comments based on similar study; density function with most likely level of disability a year after surgery. Median could be target summary, but 10^th and 90^thpercentile could be provided as well.
- Hui Nian may be able to help.
- 4-8 levels, a stacked bar chart may be effective for discrete / ordinal outcome.
- Adjusting of independent variable; can one or two variables be subbed in and out while remaining methodologically rigorous?
  - Just don\x{2019}t try a lot of variables that you then discard, and then in the subsequent validation fail to repeat those \x{201c}tries\x{201d}.
  - But yes, adding a few variables at this point (length of stay, for example) is not an issue.
  - If many experts were assembled, and various levels of the two scales was given, and you asked each which one is worse; if you can order those combinations such that they agree, that may be effective. 20 or so experts would be needed, though.
  - Bare minimum would be five ordinal levels (not including death); ten would be ideal.
  - Takeaways: Ordinal scale is best, combination of outcomes or only one, adjustment with independent variables is possible, and communication + interpretation of results will need to be thought about for patients.
  - Multiple observations over time as a longitudinal analysis may be another good option.
    - Ex: If you died after the first time point, that\x{2019}s an absorbing state.
  - Frank may be an author if desired, or perhap acknowledged.

Frank confirms that this project would be a good fit for a VICTR voucher.

2024 February 8

Attendees: Frank, Cass, Jackson, Marissa Khalil, Mikaela Bradley

Marissa Khalil (Rondi Kauffmann), Surgery

We are conducting a retrospective analysis comparing the sociodemographics and outcomes between average onset of diagnosis and young onset of diagnosis breast cancer in Kenya. We have completed the data collection and review and are looking to start univariate analysis and multivariate analysis. We have 3 tables in place:
Table 1: Patient and Tumor characteristics comparison between young onset and average onset patients
Table 2: Survival and Recurrence rates between the 2 populations
Table 3: odds ratios, kaplan meir curves vs logistic regression etc

Two ways of getting into system:

1) oncologist started database for cancer patients (all cancer patients at the hospital)

2) At every follow-up visit, patient added to database

Some women could have failed to enter the study population because the cancer became severe quickly

Want to guard against ill-defined denominator

- Problem: patients who die before entering population (not a random sample)

- Example: cats falling off buildings; cats that died the moment after the fall were excluded

Paucity of data for breast cancer in this area of Africa

Time-oriented outcome like age of onset prone to bias

Could be difference in types of breast cancer in population

Value in determining pieces in the data that don't matter and then confirming that they don't matter

- Negative controls give you more confidence in positive controls

Data exploration: make a model to predict a missing lab value

First: dig into data, build demographic tables

Multivariable analysis of the differences (logistic regression model) to predict age cohort

- Looking for unique differences

Pre-cursor analyses: degree of missingness could limit types of analyses you could run

- Cluster analysis: understand degree of missingness on the same individual

Regression analysis: using R - https://hbiostat.org/rmsc/software

Also resources available to help get data from REDCap into R

Kaplan-Meier vs logistic regression

- LR better when time is not important

In some cases, not confident whether participant died from breast cancer or another cause

Mikaela Bradley (Gillian Hooker), Genetic Counseling

Neurofibromatosis Type 1 (NF1) is a common genetic condition that affects approximately 1 in 2,500-3,000 individuals. The goal of this study is to investigate if a reported family history of NF1 influences perceived levels of stress and coping styles in adults with NF1. To do this, adults with NF1 completed a survey that includes questions about their diagnosis, their family history, the Perceived Stress Scale 10-Item Version, the Brief Coping Orientation to Problems Experienced Inventory, short response questions, and demographics.

We have completed a lot of our bivariate analyses and are working on a hierarchical multiple linear regression to identify other variable that modify people\x{2019}s experience of stress. During this clinic, I would like to review the analyses that I have run to ensure I am reporting things correctly. Within this, I would like to talk through the blocks through which I created to make sure we are teasing out the variables correctly.

Grand question: is there a difference in stress levels and coping styles based on family history?

How do people get into cohort?

- Survey, recruited from three different sources

- Diagnosis of NF1, > 18 years old, US resident who can speak english

- Current age and age of diagnosis available

Stage-wise multi-linear regression in SPSS

- Base model (outcome is stress level):

- M1: demographics

- M2: demographics + NF1 characteristics

"stage-wise" = different models with additional covariates

F Change = overall F for corresponding model

Additional variables adding half as much explained variation (stress level hard to predict)

Spline functions to deal with non-linear relationships

- F statistic for joint influence of age and age^2

Too many covariates to look at each individual -- result = a lot of noise

Can't look at correlation to determine which variables to analyze (double-dipping)

- Wouldn't do any statistical testing (remove p-values), report correlations to two decimal places

Can compare correlations, never p-values

Three coping subscales and have them interacting with family history in stage 3

- F test with six degrees of freedom and R^2

- Do subscales predict stress level for either family history group?

The more chunk tests you use, the more license you have to deal with things without p-value corrections like Bonferroni

Can remove p-values, report correlations as descriptive measures: do better to assume correlations are non-zero

Grouping variables into blocks is a good practice

Adjusted R^2: tells you if added variables are worth the $$

Cass suggests using "nested" terminology

2024 February 1

Andrew DeFilippis, Cardiology

I have the privilege of reporting on a pre-specified subgroup analysis of a RCT (MINT Trial, NEJM).

Briefly, MINT randomized participants with an acute MI and anemia to a liberal versus restrictive transfusion strategy. I am reporting out on a stratified analysis by type of MI (Type 1 vs Type 2 MI). If possible, I would very much like to discuss how to address the fact that the size of MI differs between Type 1 and Type MI in this trial (likely confounding the interpretation of MI type on the outcome).

Attendance: Frank Harrell, Andrew DeFilippis, Jackson Resser, Cass Johnson

\x{201c}Not all heart attacks are the same\x{201d}

Prespecified subgroup analysis \x{2013} differ by index enrollment MI was Type 1 or Type II

MINT \x{2013} 3,500 patients with heart attacks who were also anemic, randomized to liberal transfusion strategy or restricted. Outcome is 30 day death, MI.

Protocol specifies that index hospitalization includes designation of Type 1 or II MI. Very few unknowns.

Primary result: Whether allcomer MI\x{2019}s did better with liberal or restricted transfusion. Death / MI in Type 1 vs. Type 2, liberal vs. restricted

Troponin measurement: Many different assays, but in a heart attack, troponin value can change by 10,000 fold. Size of MIs were categorized (somewhat arbitrarily) into 5 categories -- <1, 1 to <10, 10 to <100, 100 to <1000, greater than or equal to 1000.

Frank: Wouldn\x{2019}t patients who got more troponins drawn have a better possibility of having the peak value found?

Troponin stays elevated for 2 weeks \x{2013} peaks 12 -48 hours after MI
Dynamic range is very large
Possibility of secondary analysis \x{2013} log ratio to upper limit of normal. Relationship between log ratio and outcome, as well as same relationship for number of components drawn, to assess if here\x{2019}s bias that makes interpretation difficult.
- Help determine if peak troponin should be adjusted for number of draws
Andrew: If Frank were reviewing, would he want to see an analysis where size of MI is held constant? Or would he ask for a second stratification (within Type 1, then by size; within Type 2, then by size)?
- The current display is not that helpful due to heterogeneity between Type 1 and Type 2
- Graph that shows log ratio vs. outcome; if adjusted for log ratio vs. outcome, does type add anything to predicting the outcome? (In Andrew\x{2019}s words, MI by size vs. outcome, and see if that is impacted by size of MI)
  - May be more useful to see if log ratio interacts with treatment
  - May be fit with spline function

Type 1 / Type 2 variable is low resolution compared to size variable; Frank thinks size may be more important to show in table compared to Type 1 and 2 because of this. You could do it both ways.

Andrew: Are additional analyses irresponsible? This could be a concern of coauthors.

Perpetuating clinical trials to give minimal information to the reader.
MI\x{2019}s are being treated as \x{201c}equally big\x{201d}. An analysis that looks at relationship between liberal and restricted, and how big of an MI someone got as a second MI, would be encouraged.
- Andrew notes that this is set for a second paper; Frank thinks it may be best used here.

Andrew\x{2019}s position is that it would be best to be liberal with analyses performed, but conservative with interpretations. Other investigators have taken opposite approach.

Fundamentally flawed design \x{2013} if we know that people have different variables at baseline, especially when they are predictors of the outcome, makes results incredibly difficult to interpret.

Clinical trials have nothing to do to control for within-group variability. We need to know which ones can be defended as big players, not in pursuit of controlling for every single variable.

Back to Table 6 \x{2013} MI size would be very important to relate to the outcome. Push to do analysis looking at if transfusion strategy would impact large vs. small MIs

Figure 2 \x{2013} concerns with confounding, propose analyses for quantifying MI type and treatment strategy. Frank thinks that size variable is likely to be more important, would hesitate to call that confounding.

Relationship between troponin and outcome \x{2013} log ratio

Is the number of troponins reported related to actual outcome?

Of your ability to predict something, how much of it comes from variable x / y/ z. Dot plot in descending order; big prognostic players that can\x{2019}t be learned from what is currently provided.

Regarding Figure 2; age, LVF are not included.

Scatterplot of MI size at study start vs. second MI, with two colors for treatment type, could be good to look at. Then we can include baseline characteristics (hemoglobin)

Andrew suspects that size will be a second paper; Frank thinks best route for improving Figure 2 would be baseline size vs. outcome, stratified by treatment and type (four curves). Could also do it without stratifying by type for larger denominators / greater stability.

These would not be Kaplan-Meier curves; logistic regression models (size on X axis, yes/ no at 30 days). Don\x{2019}t assume that log ratio is linear (Frank Harrell and Magnus Olsen, nonparametric regression on Troponin in NEJM, or spline function)

Also \x{2013} Spearman correlation coefficient between size and LVF

When firm threshold is present for qualification into the study (hemoglobin), which is also the variable being treated to, you may need to verify that there's no boundary artifacts. People at 9.9 hurt by treatment vs. helped, for example.

2024 January 25

Shayan Rakhit (Amelia Maiga), Surgery

This is an already completed analysis of which the abstract is posted below. We would would like to discuss potential methods to account for unmeasured confounders:

Introduction:
Research in animal studies, retrospective cohorts, and secondary clinical trials analyses suggests that plasma may improve outcomes in traumatic brain injury (TBI). We examined the association between plasma administration and mortality in moderate-severe TBI, hypothesizing plasma is associated with decreased mortality after accounting for confounding, including by indication.

Methods:
Patients greater than 18 years with moderate-severe TBI from the 2017-2020 Trauma Quality Improvement Program (TQIP) dataset were included. Patients with anticoagulant/antiplatelet use, specific comorbidities (bleeding disorders, cirrhosis, chronic renal failure, congestive heart failure, chronic obstructive pulmonary disease), outside hospital transfer, and missing hospital mortality were excluded. Multivariable logistic regression examined the association between plasma volume and hospital mortality, adjusting for sociodemographics, severity of injury/illness, neurologic status, and volume of other blood products, including interaction terms of plasma with shock and need for hemorrhage control procedure, respectively (see Table 1 for details). Sensitivity analysis excluded patients with shock and hemorrhage control.

Results:
Of 4,273,914 patients in TQIP, 63,918 met inclusion. Hospital mortality was 37.0%. 82.8% received no plasma. Other cohort characteristics were mean age: 44.9; mean Injury Severity Score: 28.5; percent female: 24.4%; percent severe TBI: 69.4%; percent in shock: 7.4%, percent needing hemorrhage control procedure: 12.2%. Unadjusted, each categorical increase in plasma volume (from 0 to 0-2 to 2-6 to 6-12 to greater than 12 units) is significantly associated with greater odds of mortality. Confounder adjustment attenuates this effect (Table 1): the odds ratio (95% confidence interval) increasing from 0 to 0-2 units is 1.23 (1.09-1.38); from 0-2 to 2-6 units is 0.96 (0.83-1.11); from 2-6 to 6-12 units is 1.19 (0.96-1.47); and 6-12 to greater than 12 units is 1.68 (1.20-2.34). Similar results are seen in sensitivity analysis. Shock and need for hemorrhage control procedure significantly (p less than 0.001) modify the relationship between plasma and mortality.

Conclusions:
Plasma\x{2019}s effect on mortality in TBI remains unclear. Likely due to residual confounding despite adjustment, plasma is associated with increased mortality in moderate-severe TBI in this retrospective cohort. Interaction term analysis suggests this is confounding by indication, specifically because plasma is usually administered for hemorrhage (which in turn, increases mortality). A prospective randomized study of plasma for nonbleeding patients with TBI would better answer this important clinical question.

Discussion notes:

Association of plasma and mortality in severe TBI

Question: other methods ot account for unmeasured confounding? Instrument variable analysis (generally limited to randomized study), e-value sensitivity analysis

Inflection point found between 6-10 units of plasma (categorized exposure at clinically relevant threshold)

- Frank: categorization approach counts all values in a category as the same. Categorizing at inflection point does NOT respect the form of the data

Survival bias (patients that die early don't receive as much plasma)

- Higher resolution data needed to address

We want to adjust for confounding of bleeding for the relationship between plasma and severe TBI

- Difficult to disentangle bleeding from plasma -- so intimately intertwined (hard to do without randomized design)

- Question that can be answered: investigate relationship between bleeding and plasma, characterize by other variables

- Could analyze quality of clinical practice, variation in how much plasma was given

- Could inform later analysis when you bring in mortality

Frank R package (rms) to perform instrument analysis

High proportion of participants who did not receive plasma at all

- Need to choose knots in spline function. Placing knots difficult when lots of zeroes

- Manual override places knots using non-zeroes

Retrospective data -- feedback loop

2024 January 18

Alvin Jeffery, Nursing (12:30 - 1 pm)

Follow-up from 1/11/24 to meet with Frank

Attendees: Frank Harrell, Alvin Jeffrey, Marianna LaNoue, Dagmawi Negesse, Jackson Resser, Cass Johnson

Summary of last week:

Implementation of complex, quantitative risk information
And, how should results of predictive models and other tools be applied?
SPECTACULAR \x{2013} rapidly & empirically look at design elements
Primary test: four different ways (across 10 nurses, 12 timepoints) to display content
- Measured preference, what action would be taken from that (Contact RRT, Contact MD, Contact Charge, Contact Peer, Increase monitoring, Continue Same)
- Many other things could be modified
- Can we take a factorial design and merge it with a Bayesian adaptive trial to start eliminating design elements that are not preferred / do not lead to the outcome we want?
- Can this be done on a voucher? Is this project too broad for 90 hours of work?

Discussion:

90 hours of work for pre-award work; so, this may not be a good fit for a voucher
Factorial design may be performed where some factors get dropped
- Ensures balance; gets best average power
- Marianna: From modelling perspective, how should this be done iteratively?
  - Possible messy part could be a factorial design where some factors interact (if one thing is red, another doesn\x{2019}t work entirely)
  - If these factors can be thought of as independent, that is best for sample size calculation
  - Alvin: Similar paper w/ 72 possibilities. Bayesian (non-adaptive), so comparable framework has been done, just not in the field of interest
  - Statistical simulation study may be another possibility
  - Response-surface design may be comparable. Breakfast cereal industry is notable here.
    - Polynomial regression estimating optimum combination of factors (optimize response surface)
    - Outcome is average taste test rating
    - By solving for optimum, that\x{2019}s how they decide what to market. Could be comparable.
    - Per Alvin: Different \x{201c}types\x{201d} of users (field, facility type, etc) may have different preferences, which would be great to parse out
    - Thermometer plot may be recommended. \x{201c}People\x{201d} plot (X out of 100) may be preferred by patients, but not preferred by Frank / Alvin / previous sample of nurses.
    - Would there be biostatisticians able to work on this?
      - May depend on timing. Frank may discuss with other department members.
      - If design can be nailed down, that may help determine what statistical support is required.
    - Fractional Factorial Design: By not having balance in every possible cell, occasionally of use

2024 January 11

Serena Fleming (Sarah Stallings), Genetic Counseling

I am working on a master\x{2019}s thesis project to conduct a retrospective chart review for individuals undergoing testing for Huntington Disease at VUMC across two decades to assess whether there are differences between asymptomatic and symptomatic individuals. The question I would like to address is \x{201c}How do asymptomatic and symptomatic individuals who decide to pursue genetic testing for Huntington Disease differ?\x{201d} I need assistance with descriptive and comparative statistics.

Clinic attendees: Dandan Liu, Cass Johnson, Jackson Resser

Years: 2001-2022; Huntington = neurodegenerative

Population of interest: Tested for huntington's disease (by ICD) initially pulled from pathology

- Within this, those symptomatic or asymptomatic

Symp/asymp: motor symptoms at initial visit or at test

Descriptive study with subgroup comparison

Dates: initial visit date, blood draw date, results disclosure date

Criteria for neurologist used to assess symptomatic/asymptomatic

406/415 have complete symp/asymp: need to really think about how to handle missingness for symp/asymp

Age = continuous variable; if normally distributed -> two sample t-test

If non-normal -> non-parametric Wilcoxon rank sum test (preferred because it makes less assumptions)

Chi-square test to compare categorical variables

Use test statistic and p-value to assess whether results are significant; report raw differences

Alvin Jeffery, Nursing

We have received a small foundation grant to build a clinical decision support tool evaluation system that we hope can randomize design elements (like a factorial research design) within an adaptive Bayesian analysis (where we can eliminate design features that we no longer need to evaluate). We have software developers who can build the front-end system, but we are looking for assistance with creating the conceptual analysis framework and helping to write the python (PyMC3) code to conduct the analysis.

Clinic attendees: Bryan Blette, Cass Johnson, Jackson Resser

Prior work: tested four risk formats for same underlying info (latin square randomization -- three scenarios)

SPECTACULAR

Phase 1: build your own adventure (pick which do you want to see)

Phase 2: keep chosen design from P1 on one screen, randomize pieces on the other

Can you merge bayesian design with factorial framework?

Could embed rules such that a given participant, based on their info, is more to be randomized a certain way

6-8 factors, about 1000 total combinations

Proposed: drop conditions after person has completed one hour of data collection

Aim: formalize framework then operationalize bayesian-adaptive design

VICTR voucher could be a good fit but Bryan thinks 90 hours might not be enough

- voucher might get desired deliverable

Idea for paper: simulations & power calculations

- Would help to assess feasibility and whether you want to drop factors

Topic revision: r1084 - 22 Aug 2024, JacksonResser

Main

Department Home Page

Biostatistics Graduate Program

Vanderbilt University Medical Center

Biostatistics Webs
- Archive
- Main
- Sandbox
- System

Copyright &© 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback