You are here: Vanderbilt Biostatistics Wiki>Main Web>Clinics>ClinicGeneral>GenClinicAnalyses>MondayClinicNotes2014 (15 Jan 2021, DalePlummer)EditAttach

- Frank's note: Design is confounded with time/fatigue/learning. Also there is little precedent for doing a pre-post study with such little time between pre and post. I think you will need to do a randomized study to attribute any effect to the intervention. Randomize 1/2 of families to get the intervention, 1/2 to get the prevailing treatment, and give survey at the "after" time point for both groups.

- TB clinic in Nashville, 203 cases (information on case only) in year 2013.
- Treatment: completed vs not completed (refused, lost to follow up, etc.)
- Research question: 1. treatment completion rate. 2. the association between patient's characteristics and treatment completion.
- Prepare data set as http://biostat.mc.vanderbilt.edu/wiki/Main/DataTransmissionProcedures
- Is there an association between country of origin and acceptance of treatment (accepted vs. no accepted)
- Apply logistic regression analysis (dichotomous or binary response variable) and include the variables of interest. *General rule of thumb the smaller sample size/10 will help you assess your regression power or how many variables you can include in regression model *N=48 that refused and will be the limiting sample size in regression analysis
- Country of origin main factor and will have to think of best way of grouping *The covariates of interest: Age as continuous non-linear; gender, marital (married vs. non-married) and country of origin

- Total number of instruments used per tray (25-100), usually less than 50% is used.
- Will compare unnecessary cost between specialty
- Will apply for VICTR voucher. A standard $2000 is appropriate.

- Patients underwent liver transplant who had plastic stent to treat leak, about 20-30% needed mental stent later
- Want to predict early whether patient needs mental or not so pt does not need to surfer pain
- The current data only gives conditional needs to mental if had plastic already
- Suggest do descriptive statistics and plan bigger study to develop prediction model
- Use R for internal validation and calibration using bootstrapping method (rms package)

- Want to know the relationship between Cortisone treatment and bacterial change.
- Each subject will be his own control: cortisone on one arm and no cortisone on the other. Each arm will be tested at two sites, one normal skin and one tape stripping skin. Observe bacterial change. Therefore, each subject will have 4 tested samples and each sample measured twice (total 8 per person)
- Look at treatment effect on normal skin. Suggest amount of $2000.

- There are limitations of pre post design. Many factors will affect the outcome besides the intervention like time.
- Box plot with raw data to explore the distribution
- Can use Wilcoxon signed rank test to compare continuous outcomes before and after
- Consider ANCOVA (Analysis of Covariance) to analyze post while adjust for pre specified covariates like previous experience

- Retro spective study of post renal transplant patients. Follow those patients for two years to observe a rare event.
- Describe users characteristics. Some pts took medication for entire 6 months, some stopped prior 6 months for certain reasons, some retook it later. Can consider using certain amount of time to define user.
- Binary/categorical variables can be described as frequency and percentage

- Logistic regression model with robust standard error is appropriate.

- Could include baseline RSA if colinearity is not an issue.

- Generalized Linear Regression with Negative Binomial Distribution is good.

- Probably.

- Not a real outlier

- Take into account the correlation within each subject.
- Might have carry-over effects between different periods. Could test on equivalent carry-over effects.

- Outreach for engineering education
- looking at data from engineering camp for girls (looking for changes in self- efficacy)
- self- efficacy- feeling that you can accomplish something in your life (this scale has been validated) * some girls have participated for one year, some have participated for about three years

- also interested in differences in pre-post scores for the one year attended by student

- descriptive statistics: consider summary statistics across different categories (i.e. pre and post scores by different school types, year of study, grades)
- Consider repeated measures type analysis (longitudinal data analysis) for assessment of slope over time (year) of self-efficacy variable
- Per question of interest - may need to reformat data to “long style or vertical format” (i.e. have row 1 id=1: 2012 post self-efficacy score, row 2 id=1: 2013 post, row 3 id=1 2014 post self-efficacy score, etc. for each of the girls)
- adjust for age , school type (consider the role of additional potential confounders)
- Consider applying for VICTR funding– for assistance in repeated measures type of analysis.
- Need to account for the correlated nature of data and verification of assumptions (such as in Mixed effects modeling or generalized least squares)
- Account for the missing data

- Limitation: lack of control group (there is no way to conclude that the program is the only thing that is improving self efficacy)

- pre vs. post score for any given year
- consider doing boxplots for each of the pre and post scores for each year (these can serve as your summary statistics)
- Univariate analysis: Wilcoxon Signed rank test to see if there is a difference between the distributions of pre and post scores (data in horizontal format works)
- cautioned combining the pre-scores over the three years, and post scores over the three years (year may be a confounder and impact trend of the data)
- pre vs. post study (may see a difference, however no guarantee that improvement is from the program- not an randomized controlled trial)
- Motivated and selected group of girls and may have higher self-efficacy baseline score (pre) - consider comparing self-efficacy scores with those reported in other studies among girls.

- Two primary endpoints; one with greatest variability (glucose infusion rate needed to maintain desired blood glucose level) has most variability and hence will be conservative to plan for
- 10 with type I DM 10 without
- Other covariates: age, insulin required to maintain blood glucose, HbA1c
- Baseline liver glycogen assessment
- Start with 3 hour fructose infusion to stimulate liver glucose update vs. saline infusion (randomized), then insulin infusion then 2 hour period where become hypoglycemic (using clamp)
- Need a good estimate of the standard deviation across patients for infusion rate - use the dog data taking all relevant time periods and stratify by liver glycogen to compute 12 SDs; then we can compute an averaging by averaging the variances and taking the square root
- Need clinically relevant difference (in mean infusion rates) not to miss: estimate 1 mg/Kg/min
- Language for grant application something like: The power calculation was based on a 2-sample t-test without covariate adjustment for HbA1c, age, etc. The actual statistical test will be ANCOVA adjusting for these factors, which will increase the actual power a bit (increase would be more had the sample size been larger; the sample size chosen has a penalty for estimating the effects of the baseline covariates).
- Last aim: most general way to assess to to fit a smooth function of time to the longitudinal (serial) measurements, separately for each of two groups, and test for differences in shape of the two curves. A convenient choice is to fit a quadratic function of time to each curve. This increases power over individual time point tests. Suggested statistical method: generalized least squares or mixed effects linear model.
- Suggested contacting Li Wang to tell her that a VICTR voucher is in the works

- Want to assess the impact of the implementation of an ASP on antibiotics use.
- Monthly antimicrobials (AMs) use in days from 2009-2012 April. Data from many hospitals including Vanderbilt. Want to compare Vanderbilt to ALLCHA.
- ASP intervention started 2012 March at Vanderbilt. Can see less use of AMs after intervention.
- The comparison of pre and post might be biased by other factors like time not just by intervention. Institution effect is hard to assess since all institutions started intervention at different times.
- Also needs to adjust for other factors like date for seasonal effect.
- Linear model of VCG ~ intervention + rcs(time)
- Better to have individual data for all the hospitals which had both pre and after data to assess intervention effect using mixed-effects model, or just compare between hospitals using data after intervention to see whether Vanderbilt does better than others
- Consider get Vanderbilt rank among all CHA

- wants to do a pilot study to get preliminary results for a grant submission.
- requesting data from Southern Community Cohort Study (SCCS). Needs power analysis and statistical plan for the data request.
- applying for VICTR biostats support for funding for this prelim project. Needs estimate.
- about 3200 men enrolled. max follow-up 10 years. about half finished the whole study period.
- Prediction of screening frequency by baseline characteristics. Association between prostate cancer stage and frequency of screening.
- all patient self-reported data, at 5 year and 10 year. (have you had screening within the last year?)
- GEE model of screening frequency (recent screening yes/no at 5 year, 10 year) on age, race, interaction between age and race, ...
- Ordinal logistic regression model of prostate cancer stage/grade on screening frequency (need be carefully defined) prior to diagnosis. Need consider different follow-up of the patients.
- Contact Li Wang(li.wang@vanderbilt.edu) for budget estimate.

- How can I determine the required sample size (i.e. number of subjects or raters) for interval estimation of the Kappa statistic for an intraobserver and interobserver study with multiple raters? Our number of subjects is currently 20 (N=20) and our current number of raters is 27 (n=27). Further, we are hoping the given sample size will give at least 80% power at the 0.05 level of significance (two-sided).
- >library(kappaSize)

- Email: I am submitting an early career grant for a starter type project due August 1 and needed help with performing and writing up power/sample size calculations.
- Specific Aim #1: identify group of lupus patients of about 1135. Lupus nephritis patients of about 400. Nephritis is severity indicator.
- Specific Aim #2: Determine the association between ED use and meeting standards of quality of care in management of SLE and in the treatment of SLE nephritis, as defined by the Quality Indicator Set for SLE. For aims #2, I would likely be performing Chi squared tests comparing 3 groups (non, occasional, and frequent ER users) for most of those sub-aims.
- Specific Aim #3: Determine the association between ED use and corticosteroid use in SLE and SLE nephritis. For aim #3, I would likely be using multiple linear regression.
- For binary outcomes, use logistic regression with adjustment of other confounders.
- Ratio will be treated as continuous variable and will be analyzed using general linear model.
- Hypothesis: more ED use will have higher steroid dose. Will analyze current steroid dose and #ED visits in the past 12 months. Steroid dose will be a ordered categorical variable with 4 levels. Can use Chi-square test. Proportional odds model can be used to adjust for other confounders.
- Grant due Aug 1st, need to be done July 21st.

- Survey on quality of life (N=1000). There are 7 GOSE questions about health states (0-100). Can describe the distribution for each GOSE. Predictors include gender, age, and years of education.
- Want to compare between GOSE scores. Multiple comparison issues (21 comparisons).
- Can use mixed-effects model taking into account of within subject correlation.

- N=36 patients who had CLL transplant with two types (8 vs. 27). Want to compare survival between two groups.
- Time from transplant to death or relapse. Sample size is limited. Mainly descriptive. Want to write manuscript.
- Can apply for voucher of $4000.

- I am fourth year medical student doing a project for dermatology. We are doing a meta-analysis of pediatric vitiligo patients to assess which populations need thyroid studies performed. I have a spreadsheet of the data. I need help analyzing it.
- Research question: the percentage of thyroid abnormalities in pediatric vitiligo patients.
- Only have aggregated data. Could have an overall estimate of percentage. Also could explore the variability between studies.
- Apply for a $2000 Voucher.

- One-year prospective study. Will record the numbers of surgeries in Ethiopia (an African country) and the number of perioperative mortalities.
- Sample size calculation to reach a desirable precision of mortality rate estimate.

- we want to find out if the IRLS estimation algorithm is reversible -- e.g., given only the Fisher information matrix and scoring function (and \beta coefficients), can we go back to the original Y or X matrices
- Context is confidentiality with data coming from multiple sites, with each site's data maintained independently, and controlled
- How to do model diagnostics without residuals?
- Does the distributed computing model lead to good statistical modeling practice? E.g.: covariate transformations, Y transformation, normality of residuals [could compute residual vector separately by center and share an ECDF of the residuals)
- How often are practitioners of distributed statistical analysis assuming linearity of covariate effects? Being careful about transforming Y or modeling Y robustly?
- Can't reverse the process to solve for an individual's datum if model is full rank, n > p, no parameter is devoted to only one subject, residual vector is secret
- If a single parameter is devoted to 5 subjects at one site, may possibly be able to solve for a summary statistic for the 5 (e.g., race has 4 levels and one of the levels only applies to 5 subjects at a site)

- May be able to discern that one site has an overall better level of Y than another site
- Not able to get a robust sandwich covariance matrix estimator if residual vector is not provided; sandwich estimation requires U matrix not just U vector
- Even if residuals are available, it may not be possible to work backwards to an individual from a given site because estimates come from a global beta vector over all sites
- We seldom use OLS with health care data; the need for weighted X'X (X'VX) instead of X'X as used in OLS makes the identification problem more difficult in general, because V is a function of the current beta estimate (for all sites combined)
- Worthwhile working out the special case where Y is binary and there is a single X that is binary or polytomous, and there is no special knowledge (e.g., k subjects are of type x and all have the same Y)
- Worth taking another look at data squashing

- Metabolic flux analysis
- Rate of metabolite turnover
- Which metabolic phenotypes are produced in high titre-achieving production processes
- Protein therapeutics; cost of production
- 14 conditions (cell lines); correlations between fluxes (80 reactions- flux, mass spec); looking for up-regulation
- 80 Spearman rank correlations x 14; each correlation 10 observations (clones)
- Two controls; secondary controls
- Independent experimental units: clones, manipulations of cell lines
- See if a unified model would be a better approach than pairwise analysis
- Must be able to precisely estimate a quantity such as a correlation coefficient in order to be reliable in picking "winners" across reactions
- Low precision (low number of independent experimental units) implies low probability of selecting the optimum reaction/condition
- Dimensionality is high enough that an "omics" method may be needed
- Recommend contining discussion at a Tuesday or Friday clinic

- My project involves survey data of 220 Spanish and Arabic-speaking patients in the Center for Women's Health. I've completed all of the descriptive statistics but need help with the correlations. For example, I know from having surveyed patients myself that those patients who reported speaking "Arabic only" at home were more likely to self-report speaking English "not very well", but I don't know how to express this statistically.
- To test association between two variables A and B,
- If A is a continuous variable and B is categorical variable, use Kruskal Wallis test (or Wilcoxon rank-sum test)
- If A and B are both categorical variables, use chi-square test
- If A is ordinal variable and B is binary, use chi-square trend test
- If A and B are both continuous variables, use spearman's correlation coefficient.

- What are major factors of degradation? Pulling apart mechanisms.
- Clinical target: liver tumors/biopsy; visualize needle
- What is the best study design?
- Ask trained readers to assess utility of image
- Discusssed hypothesis testing vs estimation study
- One estimand could be the mean absolute number of levels different
- Can relate an ordinal measure to quantitative measures of image quality
- Can estimate # patients needed if have a reliable estimate of the standard deviation of an absolute difference of interest
- May consider progressively ruining an image to see when it becomes uninterpretable
- One goal is to develop a model to predict expert's quality rating from multiple quantitative physics-based measures
- May consider an ordinal response model / multinomial model

- can't arrive before 1pm on Wednesdays, so attending Monday clinic
- "I am going to perform an email survey of surgical residents (approx 5500 in the US) and wanted to know what you think an appropriate response rate would be and the best method to do statistical analysis (rough draft of survey attached). Or should the questions be revised to facilitate a better statistical analyisis?"
- make the variable as continuous as possible using sliding bar

- grant proposal relating to the development of new diagnostic technologies for neglected tropical diseases

- Survey on two cohorts, VA-based cohort and university-based cohort.
- Outcome: global physical and mental health score. Pain is part of global score, and also a barrier to level of reintegration success. Could calculate a global score without pain. Could examine how pain correlates with reintegration and outcome.
- A specific question (meaning of life) in two standardized questionnaire. Could include both in the model predicting outcome.

- Cortisol measures 3 per day
- % of increase because times not noted accurately
- Need Bland-Altman plot to check proper transformation: post - pre vs. (post + pre)/2 or log(post) - log(pre) vs. geometric mean of pre and post
- want the transformation that makes the graph flat and random

- 1/2 of families received a service dog after 3 weeks
- Suggest longitudinal analysis using 3 daily x 15 weeks, allowing for correlation; only one day per week
- Correlation structure based on approximate time of measurements in days + fraction of day
- Model smooth time trend, allowing for separate trend in those randomized to service dog; check for shape change between two groups
- Easiest-to-interpret method generalized least squares with AR1 continuous-time correlation structure

- ECMO: what predicts survival to hospital discharge; initiated by cardiac surgeons
- Collecting patients from last 2 years (N=60 so far)
- Discussed margin of error of 0.1 in estimating a single probability with n=96
- Alternate endpoints: LOS, censor on death, i.e. Y=time to successful discharge
- Or: ordinal outcome Y=1, 2, 3, ... longest LOS, dead = longest LOS + 1; effective sample size almost equal to # subjects
- Also have Glasgow coma scale at discharge; could factor into ordinal outcome
- May be possible to use a complex high-information scale to derive a severity of illness-based score that is then used to predict mortality
- Has reduced many variables to one

- What to do with patients who died before ECMO was available?

- CTE - Chronic Traumatic Encephalitis caused by multiple concussions. Survey is designed to ask questions about awareness of CTE among parents of young athletes (junior high and high school). The plan is to distribute the survey using Vanderbilt connections with local high schools.
- Recommendations:
- Maximize response rate (by giving parents incentives of some sort)
- Ensure that the survey is brief
- Make sure the responses are anonymous
- Use numbers instead of categories
- Simplify the language
- Branch questions
- Incorporate visual analog scale (instead of categories)
- Order questions in a logical way

Edit | Attach | Print version | History: r1 | Backlinks | View wiki text | Edit wiki text | More topic actions

Topic revision: r1 - 15 Jan 2021, DalePlummer

Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback