Recommendations, Analyses, and Data for Health Services Research, Diagnosis, and Prognosis Clinic
Notes 2017


Claire, Lo. Medical student.

I have data and preliminary analyses from a repeated measures longitudinal study assessing the impact of variable exercise intensity and volume on inflammatory markers (hs-CRP, IL-6, epinephrine). I am trying to create generalized linear models for the data and I'm not sure where to start (or if GLMs are the most appropriate model for this data set).


Sarah Diehl, Hearing and speech sciences (doctoral student)

The speech perceptual characteristics of people with dysarthria due to chorea can vary tremendously (Darley, Aronson, & Brown, 1969a). The current study aims to identify distinct clusters of speech perceptual characteristics within a group of 51 speakers with dysarthria resulting from Huntingtonís disease (HD). All speakers will be within a mild speech severity range.

Raters (4 graduate students complete, 6 to be recruited) completed a speech perceptual characteristics checklist for each person with HD. The speech perceptual characteristics checklist contains 38 items separated into 7 separate dimensions as follows: pitch characteristics (1-4), loudness (1-9), vocal quality (10-18), respiration (19-21), prosody (22-31), articulation (32-36), and general impression dimension (37-38). The general impression dimension also includes an estimated percent intelligibility (without formal calculation) and graduate studentsí proposed dysarthria type. Each checklist item is rated individually on the ordinal scale from 1 (normal) to 7 (very severe).

The following research questions will be addressed in this study:

  • What are the speech perceptual characteristics consistent with diagnosis of HD and how do they compare to previous literature on hyperkinetic dysarthria?
  • Are there distinct clusters of speech perceptual characteristics within speakers with mild dysarthria due to HD?
  • If distinct clusters of speech characteristics exist within speakers with HD, do the individuals who belong to the same cluster also share other disease- or treatment-related features (i.e. type of medications, number of CAG repeats, and the length of disease duration)?

Our questions for the meeting are primarily focused on the cluster analysis, however, we may bring additional questions at that time. We plan to present preliminary results at a conference in mid November. We will bring full data for the first 4 raters.

  • Do not do statistical tests comparing groups on the 38 items that went into the cluster analysis. It would be reasonable to do tests on things like medication use, etc., that did not go into the cluster analysis.
  • To show group differences visually, consider plotting the first two principal components using colors for the different clusters, and/or making a parallel coordinates plot
  • With so few patients relative to the number of items, the clusters are likely to be unstable, although if the plots show large separation between groups, this may be less of a concern
  • Consider trying a few different clustering methods to see whether they all suggest the same four clusters
  • Consider sparse principal components analysis, and either cluster on some or all of the principal components, or use the PCA results to help you decide which variables to cluster on
  • For a manuscript, present intra- and inter-rater reliability

Supisara Tintara, Nephrology (medical student)

We are studying the tissue sodium levels in peritoneal dialysis patients compared to controls without kidney disease. My question is determining whether the sodium levels in dialysis patient is different from the sodium levels in the controls. Also, are sodium levels different among age, race, or gender in controls and dialysis patients.

  • Because some of the data has been published already, focus on two comparisons, using Wilcoxon rank-sum tests: PD vs. HD and PD vs. control
  • For comparisons involving race and gender, use descriptive plots rather than statistical tests (because the group sizes are very small)


Petrice Cogswell, Radiology (resident)

Analysis of survey response data from survey polling radiology programs directors of attitudes towards MD-PhD vs non-PhD residents. I verified that the distribution of respondents (program size and PhD residents) is representative of the polled group and would like to continue with assistance on the statistical analysis.


Petrice Cogswell, Radiology (resident)

Survey on radiology program directors toward MD PhD residents and resident research. The responses were likert scale, how do you view PhD residents vs non PhD score -2, -1, 0, 1, 2 representing much worse, worse, similar, better, much better in multiple areas. Question: is there a statistical test to evaluate this type of data?


  • First step: see how comparable the responders are to the non-responders (or the whole set of programs) in terms of # residents, # MD/PhD residents, NIH funding amount
  • From there we can talk about statistical testing. We will probably want to use a finite-population correction since the whole population = 63 programs
  • Regardless of comparability, descriptive statistics (10 out of 23... etc.) will still be interesting to report

Yolanda McDonald, Human & Organizational Development/Peabody College (Faculty)

Project Title: An environmental justice review of drinking water quality in the United States, 2011-2015

Abstract: Despite the need for potable water for human life and EPA regulation of U.S. public water systems, there has not been a comprehensive study to quantify disparities in residential drinking water. This research systematically reviews results of the National Primary Drinking Water Regulations (2011-2015) by community water systems at the county-level. This study utilizes an environmental justice framework to (1) elucidate if legally enforceable drinking water quality standards differ based on community race/ethnicity, socioeconomic status, and rural-urban classification and (2) determine if communities with predominantly underrepresented groups are disproportionately burdened with repeat violations of drinking water violations.

Data Sources and Variables:

Dependent Variable: Drinking water violations for arsenic, atrazine, chlorine, coliform (Pre -TCR), coliform (TCR), combined uranium, di(2-ethylhexyl) adipate, di(2-ethylhexyl) phthalate, nitrates, nitrate-nitrite, lead and copper rule, radium, TTHM, TCE, haloacetic Acids, and Trichlorethane were downloaded from the Safe Drinking Water Information System (SDWIS) federal reporting services for the years 2011-2015 (N = 58,018). Of the violations, there were N = 30,981 repeat violations. Violations and repeat violations were operationalized as dichotomous variables (0 = no violation, 1 = violation).

Explanatory Variables: Race/ethnicity and socioeconomic status variables were obtained from the U.S. Census, American Community Survey, 5-year estimate (2011-2015) and were operationalized as continuous variables measured as proportions. The rural-urban classification is based on the USDAís Rural Utilities Service (USDA RU) definition of rural. Rural-urban classification was operationalized as dichotomous variables (0 = urban, 1 = rural).

Data structure: The database structure format is one violation per row, Public Water System ID (PWSID) is the unique identifier. A PWSID may appear more than once in the database. For example, a PWSID could have multiple violations during the study year. The column data points are violations, race/ethnicity, SES, and rural-urban classification.

Proposed Data Analysis Strategy: The unit of analysis is county-level. Descriptive statistics were run to characterize the data. Correlation matrix measured the magnitude and direction of association between water violations and explanatory variables. To determine the relationship between water violations and the explanatory variables univariable and multivariable logistic regression analyses were used. The Variance Inflation (VIF) diagnostic was used to detect multicollinearity in the multivariable and interactions analyses. To detect confounding, all explanatory variables unadjusted odds ratio were compared to adjusted odds ratio to determine if there was a change of ≥ or ≤ 10% in the odds ratio (Szklo and Nieto 2014). And, multivariable logistic regression was used to adjust for confounding (Pourhoseingholi, Baghestani and Vahedi 2012). The Pearson goodness-of-fit statistic was used to compare the observed values to the expected. Covariates that had a likelihood ratio P value of <0.050 (two-tailed) and an odds ratio that did not cross 1.00 with a 95% confidence interval were considered to be statistically significant in the univariable and multivariable analyses.


Do we need to adjust for water systems, i.e. counties vary in the number of community water systems that service the area? If so, which of these options are recommended? a. Do we need to adjust the variance estimators of the estimated coefficients to account for the variance within the county, i.e. robust standard errors using clusters? b. Weight counties by population served by the community water systems? c. Stratify by community water system size (i.e. number of people served): Small Level 1 ≤ 3,300; Small Level II 3,301 ≤10,000; Medium 10,001 ≤ 50,000; and Large ≥ 50,001.

Do you recommend that we use Pearson goodness-of-fit statistic to compare the observed values to the expected?

Do you recommend post-hoc analysis for logistic regression? If yes, are there different post-hoc test for interaction terms?


  • We are concerned because we don't know the number of times each system was tested. If it's not possible to get this information, one possibility might be to simulate data to try to get a sense of the possible scope of the impact of frequency-of-testing
  • The overall project seems like a good fit for a VICTR voucher or short-term biostatistics support (its scope is too large for clinic). To inquire about short-term biostats support, email Yu Shyr, Chair. Another possibility might be working with a student (email Jeffrey Blume, director of graduate studies).
  • Will need to keep in mind: some systems get swallowed up into other systems.
  • Longitudinal data analysis won't be feasible without the complete testing data (we would need the non-violations in addition to the violations).
  • Next level (after other issues resolved): geospatial correlation (tricky, though, because of the upstream/downstream issue)


Kelly Schuering, Internal Medicine/Vanderbilt Familiar Faces (medical student), with Ed Vasilevskis (mentor; Department of Medicine, Division of General Internal Medicine and Public Health)

This study is looking at the prevalence of housing instability, risk factors for instability, and utilization of community resources among patients working with the Vanderbilt Familiar Faces program. Our research questions are as follows: Primary: Among patients with high health care utilization working with the Vanderbilt Familiar Faces staff, what is the prevalence of housing instability, potential future housing instability, and secure housing and what factors predict this? Secondary: What community resources are people using to help address housing instability and how would individuals describe their relationship with those resources? What predicts whether patients are connected to resources to assist with finding housing?

Data will be collected through a self-administered redcap survey on an ipad while patients are in the hospital.

Our analysis plan is as below:

Housing stability (ordinal):
  • Pearsonís chi-squared or Fisherís exact test, depending on n in each category
  • Ordinal regression (vs. multinomial?)
We plan to include the following variables in the regression based on literature review and experiences with similar populations: consistent income (binary), employment (binary), current substance abuse (binary), legal history (binary), and current/recent intimate partner violence (binary)

Resource usage (binary):
  • Pearsonís chi-squared or Fisherís exact test, depending on n in each category
  • Logistical regression
Based primarily on our experiences as there is not literature in this area, we plan to include having an outside case manager/social worker (binary), having a regular monthly income (binary), history of drug use (binary), and current housing stability status (categorical) in this logistical regression.

Since our multivariable analysis will not be able to account for every potential confounder, we will also conduct a sensitivity analysis to determine how big of an effect an additional cofounder would have to be to change the observed relationship.

Finally, we will also do a subanalysis of the pre-identified VFF patients compared with those who were assigned to the VFF team due to risk and bed space. This binary variable could also be included in the multivariable analyses.

I was hoping to get feedback on the above analysis plan and input on how many variables can realistically be included in the regressions if the estimated sample size is 200. I am also hoping to get an estimate for how much time we would need to purchase from biostats in order to get the above analysis completed.


For the binary outcome, the best-case scenario would involve 100 patients per outcome group, in which case it would be reasonable to adjust for 5 covariates in the regression model. Pre-specifying the covariates without looking at the data would preserve the Type I error rate, but with an exploratory analysis like this, that might not be your highest priority. If you are planning to present the analysis as exploratory and don't need to prespecify the model, a good starting place would be to look at plots and descriptive statistics for all variables by outcome group, and then to look at a scatterplot/correlation matrix and also to make a variable-clustering plot to get a sense of whether some variables can be used to "represent" others.

An ordinal logistic regression would be appropriate for the ordinal outcome, but depending on the sizes in the groups, you may need to collapse two of the outcome groups.

Possibilities for longer-term help: VICTR voucher and/or contacting Dr. Yu Shyr, Chair, to see whether short-term help is available.


Christian Okitondo, Psychiatry (Staff)

Topic: Increased tendency for proximal proprioceptive errors in limb bisection for individuals with autism spectrum disorder is not mitigated by too use.

Previous studies involving tool use tasks have shown that typically developing (TD) individuals commit distal errors in limb bisection after using tools, presumably due to perceptual extension of the peri-hand space. Given that individuals with ASD are less susceptible to visual override of veridical proprioceptive information in other proprioceptive paradigms, we hypothesized that individuals with ASD would not demonstrate these distal errors after tool use.

Questions I would to address: How to incorporate repeated measure on my ANOVA? For each subject, I have a pre training means and post training means. How to explain the repeated measure ANOVA to the world with no statistical background?

Ricky Shinall, Surgery (Assistant Professor)

I have a dataset consisting of about 350 responses to a quality of life instrument that has not been previously validated. I would like to get an estimate on the biostatistical effort needed to analyze the data for consistency and validity in order to obtain a VICTR voucher.


Devika Nair, Nephrology (Postdoc)

Attending this clinic is part of a requirement for my Biostatistics I class that I am attending for my MSCI, but I do have a question related to one of my projects.

I'm interested in exploring the coping behaviors of African American patients with advanced, non-dialysis dependent CKD. Based on what is available in the literature (which is limited), minority patients in general use religious coping to deal with the stresses of chronic illness. African American patients in particular seem to use denial/avoidant coping mechanisms. I believe that these coping mechanisms could in part explain why many of these patients disengage and disappear when the need for dialysis is mentioned. I believe that these behaviors are more related to cultural differences, rather than socioeconomic status or educational level.

If I am trying to illustrate a causal mechanism for why AA patients with adv CKD disappear when dialysis is mentioned (independent of their SES/educational status), would the best study design be to compare AA patients of both low and high SES, or would it be to compare AA patients with low SES with patients of other races with low SES?

  • You are welcome to come back to clinic, but as a member of the Nephrology division you are also welcome to work directly with Thomas Stewart
  • Recruit patients across a range of SES's; will probably want to limit to patients who are either AA or white, due to likely low numbers in other groups

Baldeep Pabla, GI (Fellow)

Also attending clinic as part of a requirement for Biostatistics I class for MSCI; particular project involves looking at a predefined set of SNPs in patients with and without GI cancer or metaplasia. Current literature suggests that environmental factors may play a greater role than genetics in the development of these conditions.

  • You are welcome to come back to clinic, but as a member of the Gastroenterology division you may be able to work directly with Chris Slaughter
  • Identifying appropriate controls for this study will be tricky


Shaina Willen, Clinical Fellow (Pediatric Hematology/Oncology)

I am preparing a VICTR proposal to study the impact of biomarkers of lung injury on complications in children and adults with sickle cell disease. I would like some assistance with my statistical analysis plan and how to determine power and sample size calculations.

  • for enrolled patients, look at previous 3 years then follow forward 2 years
  • 250 children & 300 adults seen at clinic
  • plasma & DNA samples
  • outcome is pain and acute chest syndrome
  • 3 genotypes and plasma biomarkers
  • believe 2,2 genotype will have increased pain and chest syndrome (1,1 (26%) 1,2 (55%) 2,2 (19%))
  • look at incident rate at 1 yr and 2 yr; poisson model; need to know what difference would be expected between the groups
  • sample size - graph of incidence rate that can be detected vs sample size needed
  • Simplest approach: find confidence interval formula for a Poisson rate; assume lowest true rate and solve for n such that multiplicative margin of error is 1.5 with 0.95 confidence
    • Simplest confidence interval is lambda +- 1.96 * sqrt(lambda / n); once you have an upper limit on lambda can solve for n to give acceptable margin of error for lambda
    • Would be better to get the multiplicative margin of error for the ratio of two Poisson rates (to simplify we may assume the sample size in each group is the lowest of the three genotype group sizes)
    • See

Alice Hoyt, faculty (Medicine/Allergy)

The aims of this R21 are to determine the preparedness and knowledge of K-12 schools on the topics of asthma and food allergy, then to pilot an asthma telemedicine program.


Ricky Shinall (Surgery)

Palliative care consultation has been shown to reduce utilization in end of life care, but this hasnít been rigorously studied in trauma patients. I have access to Vanderbiltís trauma registry which can be cross referenced against the palliative care registry to identify patients with palliative care consultation. Iíd like to discuss the procedures for creating a propensity matched comparison of patients with and without palliative care consultation to compare resource utilization between the two groups. Iíd also like to get a sense of the complexity of this analysis and the amount of effort from a biostatistician it would take to complete it.

Propensity score model resource:

Katie Tippey (Anesthesiology)

We did a card sorting projects where each participant sorted 4 decks of index cards. We recorded the time it took them to sort and notes based on asking them to think aloud while they sorted the cards. We created annotated files of photos of their final sorting arrangement and have created a raw data set based on these photos. We think we may want to use factor analysis on this raw data set but are unsure both if this is the appropriate analysis and, if so, how to perform this analysis.

Resources: section 16.1


Nicolas Forget (Emergency Department)

  • Perception of collaboration between doctors and nurses in Guyana
  • Pre- and post-team-building exercise, then 4-6w later
  • 27 participants with 2 dropouts by the end; 15 nurses, 10 doctors at the end
  • Issue of using means vs. proportions for Likert scales; want to look at disagreements of perception before and after training
  • Nurses used more spread of answers than doctors
  • 2 demographic variables, 15 Likert questions; need to combine into a single global scale for graphical individual profiles and for stat analysis
  • Can do a formal analysis of variability of responses within subject, e.g. compute the SD over 15 questions within subject and see if nurses have more variation than doctors
  • Main analysis on mean
    • Form within-person difference from baseline (paired data)
    • Do 2-sample (unpaired) t-test comparing these differences - nurses vs doctors
  • Be sure to graph all measurements (on summary score)
  • Pre-post design often provides an upper limit to an intervention effect


Maureen Saint Georges Chaumet (fellow)

Project description: I am starting a project that compares the cosmetic outcomes of 3 different laceration closure methods in kids: sutures, tape and glue. I will also be looking at several secondary outcomes.

Overall, study design seems reasonable. Recommended that parents rate the cosmetic outcome of the laceration in addition to 3 reference pictures. If parent's responses can be captured via an online Redcap survey, consider the possibility of a sliding scale response.

Recommend 90 hours of biostat work. (Does not involve writing more than one manuscript)

Samuel Younger (Nurse Practitioner)

Interested in determining sample size and best statistical approach. HLM-SEM vs Path analysis?

Research Abstract

Many organizations are looking to their staff to creatively engage in improving the safety of patients. Further, within the Magnet health care environment, transformational leadership is the theory that has been promoted as core to the achievement of patient outcomes, thus is the core focus of this study. The purpose of this research is to examine the role that leaders play in bringing together elements of a safety culture and a climate of innovation that support and enable staff to engage creatively in improving the quality and safety of patient care. There is little empirical evidence in the nursing literature related to patient safety in an innovative climate, and none could be found that study the leadership behaviors of nursing managers that are conducive to an innovation climate and impact on patient safety outcomes in a Magnet designated, Academic Medical Center. Therefore, this study seeks to fill that gap in knowledge and expand the leadership and innovation literature to include patient safety within a Magnet work environment.

This research uses a multi-level, cross-sectional, descriptive correlational design aimed at examining the relationship between nurse manager transformational leadership and front line nurse rated patient safety score, and to further investigate how, if any, does communication and feedback about error and the innovation climate influence the relationship. The independent variables in this study are transformational and transactional leadership. The dependent variable is front line nurse rated patient safety score. The innovation climate is proposed to be a mediating variable. Feedback and communication is proposed to be a moderator variable between transformational leadership and patient safety score. The variables will be measured through an online survey based the three validated and reliable survey instruments (54 questions): the MLQ-5x short (MLQ-5x), the Team Climate Inventory-short (TCI), and Feedback and Communication About Error and Patient Safety Grade (subscales of the AHRQ Hospital Survey on Patient Safety Culture) which are all appropriate for collecting data about the perceptions of front line nurses.

If findings confirm these relationships, then in order to impact outcomes, nursing managers may need to be adept at navigating and promoting the complex nature of innovation through communication and establishing an innovation climate. In this context, leadership facilitates communication and an understanding of the innovation climate, which supports creative solutions to patient outcomes and improved quality, in this case, patient safety. On a practical level, this study will contribute to a greater understanding of how to prepare future nursing leaders for the challenges of a changing healthcare landscape through an understanding of what behaviors are necessary to generate innovative and safe care delivery models.

H1a: There is a significant, positive relationship between nurse managersí transformational leadership as measured by the Multifactor Leadership Questionnaire (MLQ-5X) and nursesí perception of patient safety as measured by patient safety grade (AHRQ HSOPSC).

H1b: There is a significant, positive relationship between nurse managersí transactional leadership as measured by the Multifactor Leadership Questionnaire (MLQ-5X) and nursesí perception of patient safety as measured by patient safety grade (AHRQ HSOPSC).

H1c: There is a significant relationship between nurse managers transactional leadership as measured by the Multifactor Leadership Questionnaire (MLQ-5X) and nursesí perception of patient safety as measured by patient safety grade (AHRQ HSOPSC), but to a lesser degree than transformational leadership. Included per our discussion on transformational leadership predicting quality above and beyond that of transactional leadership.

H2a: There is a significant relationship between nurse managersí transformational leadership as measured by the Multifactor Leadership Questionnaire (MLQ-5X) and innovation climate as measured by the Team Climate Inventory (TCI-short).

H2b: There is a significant negative relationship between nurse managerís transactional leadership as measured by the Multifactor Leadership Questionnaire (MLQ-5X) and innovation climate as measured by the Team Climate Inventory (TCI-short).

H3: The relationship between nurse manager transformational leadership as measured by the Multifactor Leadership Questionnaire (MLQ-5X) and nursesí perception of patient safety as measured by patient safety grade (AHRQ HSOPSC), will be mediated by innovation climate as measured by the Team Climate Inventory (TCI-short).

H4: The relationship between transformational leadership and patient safety grade will be moderated by feedback and communication about error. In terms of this relationship, transformational leadership will have a stronger, positive relationship with patient safety scores when feedback and communication about error is high.


Ryan Skeens (fellow)

This is a patient activation measure survey conducted on parents/caregivers of NICU patients. Survey will be conducted at NICU enrollment, NICU discharge, and 30 day after discharge. The hypothesis is that patient activation measure will decrease at NICU discharge but increase over time (30 day after discharge). In addition, characters such as social economic status that links to high patient activation measure will be identified.

The measure has been validated and used by mentor team. This is a fellowship project, and Ryan will apply an internal grant for the 6-9 months project. Further, CTSA support will be explored.

  • Sample size is fixed based on fellowship time. Power and sample size should be calculated accordingly.
  • Keep the measure in the continuous form (0-100) instead of dichonimization.
  • Consider to have CTSA statistician's early involvement at the design stage. Given this involves design, grant writing, data collection, data analysis, and manuscript preparation, a 90 hour work maybe needed.
  • As prediction is involved (identify characters that are related to high measures), model validation should be considered.



Danxia Yu, Epidemiology (faculty)

We will examine the associations of diet quality scores (assessed at baseline) with body weight change (from baseline to following visits) in a prospective cohort study. Generalized estimating equation model has been used in other studies, which we are not familiar with. We need statistical inputs on this model and the power estimation. We also would like to find a statistician whom we may work with on this project. Thank you.

  • If dropout is not random, either GLS with a serial correlation structure or a linear mixed-effects model would be more appropriate than GEE.
  • Do not collapse the diet variables into quintiles; leave them as continuous variables
  • For the power calculation, it may be possible to ask for conditional approval to have access to a subset of the data to get estimates of the quantities needed for a power calculation.
  • You can do a simplified power calculation with just one wave of data, and argue that the power will be higher when there are more data points per person.
  • Possibly useful R packages: longpower (thank you for bringing this to our attention!), pwr (in particular, the pwr.f2.test function).
  • Simulation could also be a useful approach, but it would also require some background information about the standard deviations of the variables

Joshua Cohn, Urologic Surgery (clinical fellow)

I have two questionnaire-based databases on overactive bladder that I have merged. I would like to use this data to develop a model that predicts bother based on symptoms and comorbidities and prioritizes necessary treatments. I am not sure if cluster analysis is the best way to do this.



Paul Yoder, Special Education (faculty)

I'd like evaluation of area under the curve (AUC) as a way to quantify the magnitude of the between treatment-group-difference and its confidence interval for RCT with repeated measures of the dependent variable. A reference for an example is Gallop, R. J., Dimidjian, S., Atkins, D. C., & Muggeo, V. (2011). Quantifying treatment effects when flexibly modeling individual change in a nonlinear mixed effects model. J Data Sci, 9, 221-241.

  • Email Hakmook Kang to talk about the possibility of working through the KC biostatistics core to get an estimate of how many children and timepoints you would need to do the flexible-breakpoint approach discussed in the article
  • We also discussed an approach using restricted cubic splines. It's possible that this approach would let you use fewer subjects; it may be useful even though you are expecting a linear relationship

Bryan Hill, OB/GYN (fellow)

This is a follow up from recommendations from 5/15/2017 regarding a logistic regression model of post operative complications as the output variable and clinical and demographic variables as the independent variables. The recommendations, in summary were:

1) Treating the outcome as an ordinal, rather than binary, variable if there are enough people in the additional groups

2) Look at the cross-tabulation between physician and sling type to see whether it is feasible to include both

3) Leave the continuous variables as is (do not categorize them). May want to consider log-transforming age.

4) Try variable clustering to see which variables may be collinear/redundant

5) Consider combining less important (less interesting) variables into a score

Goal for the session: to discuss results of the model.



No clinic---Memorial Day



Bryan Hill, Fellow, Gynecology

Reporting complications after surgery are important for quality improvement. Two methods of finding complications are: 1) administrative data from diagnosis codes and 2) key-word search from a manual chart review. We suspect the administrative reporting method under-reports complications. The primary aim of the study is to determine sensitivity and specificity of the administrative method compared to the manual reporting method. The secondary aim is to determine which risk factors are associated with having a complication.

We think that creating a logistic regression model would help address our secondary aim. Our plan is the following: setting the output as "complication present (1)" and using the variables: asa class, age, body-mass index, setting (outpatient or inpatient), sling type, attending, if a concomitant procedure was done, anesthesia time, operation time, smoking history, diabetes, and prior surgery.

Question #1: We need guidance on how many variables we can include in our model. Some have high numbers, and some are quite low.

#2 Some variables may influence each other. For example, sling type is heavily dependent on attending (they like to chose a particular brand or type). How do we adjust our model for that?

#3 It is known that older patients are more likely to experience complications. How do we determine if age is independently associated with "complication presence" versus just being a confounder influencing other variables?

Files we plan to append: data dictionary, STATA file, table of variables with total numbers of responses.

  • In deciding which categories to collapse, look at the sample overall (not by complication status)
  • To increase power, consider treating the outcome as an ordinal, rather than binary, variable if there are enough people in the additional groups
  • Look at the cross-tabulation between physician and sling type to see whether it is feasible to include both
  • Leave the continuous variables as is (do not categorize them). May want to consider log-transforming age.
  • Try variable clustering to see which variables may be collinear/redundant
  • Consider combining less important (less interesting) variables into a score
  • For binary logistic regression, we generally want to have 10--20 people in the smaller outcome group for every degree of freedom (continuous variable or single category) in the model
  • If you apply for VICTR funding, we recommend the larger time amount if you are interested in a publication or presentation. In your application, you can cite these notes as evidence that you have been to a biostatistics clinic.

Mike Temple, Biomedical Informatics, faculty

I am comparing the results of 2 surveys and need help calculating p-values and odds ratios to determine significance between the 2 surveys. I am using R

  • Get more information about the survey design (especially number of people surveyed) so that you can compare the response rates in 2012 and 2016. If they are not close to each other, it will be harder to justify comparing the results of the two surveys
  • If possible, get info about demographic makeup of the people surveyed in 2012 and 2016 from the organization's records. If, for example, the mean age of respondents is very different from the known mean age of the people surveyed, you will know that in at least that one aspect, the respondents are not representative of the people surveyed.
  • Chi-squared tests should be fine if the categories are exhaustive (but this is secondary to the nonresponse issue)
  • If possible, get more info about the outcomes and model specifications used for the regressions in Table 3.


Chirayu Patel, resident physician, radiation oncology

The project is VEEP-C - Visually Enhanced Education for Prostate Cancer, a randomized, controlled trial to assess the impact of a visual presentation on prostate cancer treatment decision-regret, anxiety, satisfaction, and patient-reported symptoms, in the radiation oncology department. The expected accrual for patients was 112 patients based on 120 prostate cancer patient consultations seen within a 6-month timeframe. Unfortunately, due to a drop in consultations, only ~30 patients have been accrued, and only 1 patient has completed external beam radiation therapy over a 6 month timeframe (other have undergone brachytherapy, surgery, active surveillance, or are still deciding).

1. The sample size is based on an instrument which only 1 patient has completed. As originally written, the study is not feasible. Determination of new outcome and sample size?

2. Role for interim analysis on secondary outcomes?

3. Thoughts on closing the trial due to poor accrual?


Cara Singer, PhD Student, Speech and Hearing

  • This project investigates speech-language imbalances in children. We are interested in the best way to measure imbalances using five standardized tests. Simple range scatter and standard deviation have been discussed. We are also interested in the best way to analyze whether increased synchrony between the five tests is associated with a decrease in stuttering frequency based on two years of development.

Hatun Zengin-Bolatkale, Faculty, Hearing and Speech

The purpose of the present study was to longitudinally assess sympathetic arousal (i.e., physiological correlate of emotional reactivity) of preschool-age children with persisting stuttering (CWPS), those who recover from stuttering (CWRS), and their normally fluent peers (CWNS) during a stressful picture-naming task. The apriori research questions/ hypotheses are as following:

The first question addressed whether change in SCL in response to stress at initial testing - close to the onset of stuttering - is associated with stuttering chronicity (i.e., persistence vs. recovery). We hypothesized that children whose stuttering persists, compared to those who recover and those who do not stutter, would exhibit increased skin conductance reactivity to a stressful picture naming task at their initial testing (i.e., prior to stuttering resolution for children who recover).

The second question addressed whether change in SCL in response to stress - approximately 18 months after their first testing Ė is associated with stuttering chronicity (persistent vs. recovered patterns). We hypothesized that children whose stuttering persists, compared to those who recovered and those who do not stutter, would exhibit increased skin conductance reactivity to a stressful picture naming task at 18 months-post-initial testing (i.e., after stuttering resolution for children who recover).

The third question addressed whether changes in SCL in response to stress are associated with changes in stuttering frequency. We hypothesized that for children who persist, compared to children who recover and children who do not stutter, increased skin conductance reactivity would be associated with increases in stuttering frequency.

We would like help from the clinic with the analyses of the hypotheses above, especially for #3.





Sarah Diehl, Hearing and Speech Sciences , PhD student

* Questions for the clinic:

1. After removing the ratings that have a mean score of 2 or below, there will be ratings that will highly correlate. Should we first do something like a multi-dimensional scaling approach to identify dimensions and then a cluster analysis to see how these dimensions cluster? Or do we throw all ratings (potentially 38 if none receive a mean score of 2 or below Ė realistically perhaps something like 20 to 25) into a cluster analysis.

2. If we expect at least 2 or 3 clusters, what is a reasonable sample size given the number of items we have on the rating scale?

3. What do we need to put into a proposal that is going to use cluster analysis? What kind of information is critical?

4. Is there another approach that would work better than cluster analysis?

Gurjeet Birdee, Health Services Research, Faculty.

  • The objective of this study was to measure the energy expenditure (oxygen consumption O2/kg/min) of adults practicing common yoga movements. For each individual, participants were asked to do movements in a standing position, lying position, and seated position (body orientation). In addition, each movement was done with different variations serially. In addition, participants were asked to walk at low and moderate intensities to compare energy expenditure of a comparative aerobic exercise to yoga.

The main questions we would like addressed:

What is the best approach to measure if there was significant variation between individuals for mean energy expenditure by body orientation?

What is the best approach to measure if there was significant variation between individuals for each movement?

When considering if variation exists above, should we take into account resting energy expenditure for each individual?


Cara Singer, Hearing and Speech Sciences, PhD student

  • This project investigates differences in skin conductance levels in children who stutter and are persisting, children who stuttered and recovered, and children who do not stutter. All children were followed 3-4 times across a two year period. At each visit, skin conductance levels were measured during a neutral video and speaking task, a positive emotion-inducing video and speaking task, and a negative emotion-inducing video and speaking task. We would like to discuss the best statistical models for our hypotheses.

  • Note that at each timepoint, there are 7 skin conductance measures (a "baseline" and 6 other measures)

  • Recommendations:
    • Keep all possible timepoints from all possible subjects. Do not exclude subjects based on their trajectories or baseline characteristics
    • Use continuous versions of the stuttering outcomes if possible; at a minimum, collapse the outcomes into 5 ordinal categories
    • Use a longitudinal mixed-effects model. Each subject will contribute 1, 2, or 3 rows depending on how many of the timepoints they have. You can model severity as a function of time-1 severity, age, sex, the seven time-1 conductance measures (or a reduction thereof; try a redundancy analysis first), time in days, and squared time in days, with random effects for subject (and possibly time and squared time). We recommend a continuous-time correlation structure, but this might be tricky with the mixed-effects model; generalized least squares might work better.
    • If we can get a clear, simple plan and the analysis is not a multi-step analysis and the dataset is clean (and tall and thin, with the relevant time-1 variables and non-identifying subject ID on each row), we may be able to conduct the analysis during a clinic.
    • Starting next month, we will be able to take on longer short-term projects for a charge.
    • The Kennedy Center statistics core may also be able to do this. If you come back to a clinic, please remind us to invite Hakmook.


Kristy Broman, Surgery Resident

Method to compare standardized incidence ratios using SEER data


Katie McGinnis (MPH candidate)

(followup from last two weeks)

  • For each overall question category, try a scatterplot of a) the means and b) the standard deviations for each item, with staff values on the x-axis and parent values on the y-axis (or vice-versa). Label each point with the question number or a short phrase to identify it
  • Do variable clustering within the staff items and the parent items, to see which items tend to be answered similarly by the same person (hcavar in stata)
  • Rather than doing several univariate analyses comparing the relationship between the demographic items and each survey item, do a single regression analysis for each survey item, with all the demographic items included in the model at once. Collapse the categorical items into 2 or at most 3 categories, and just assign numeric values (e.g. 1--5) to the levels in the binned continuous items like distance and treat those as continuous variables (so they will have just one term in the model). Actually, though, drop distance altogether and just use travel time. The overall F-statistic from the regression will tell you whether anything in the model matters. The best approach would be a proportional odds model, but ordinary regression will be next best.
  • It's ok to take the means of means (across items in a particular category) and talk about those, but there aren't enough data points to warrant a statistical test.


Katie McGinnis (MPH candidate)

(followup from last week)

  • Instead of doing t-tests, do wilcoxon rank-sum test (only 5 response options)
  • Rather than overlaying the parent and staff histograms, show the parent mean as a dot on the staff histograms
  • Do the "dot-histograms" by hospital because the hospitals are so different, even if tests comparing hospitals are not significant
  • Don't put too much weight on the p-values; this is exploratory research with relatively small sample sizes
  • For the two similar staff questions, run a correlation on the responses to help justify using only one of the questions. Use a Spearman rank correlation.
  • We don't think it would make sense to take the mean of the responses for the parent "how often" questions
  • For any set of questions, it could be interesting to order the means to see which questions had the highest or lowest means, but it wouldn't make sense to do a statistical test comparing the means of the different items.


Antje Mefferd, Hearing and Speech Sciences

Iím an assistant professor in the Hearing and Speech Science department and Iím currently preparing a manuscript. I would like to have someone take a look at the analysis that I completed to make sure they are correct. Iím a bit unsure about some things (assign fixed and random effects, reporting of degrees of freedom). I have my data in excel spreadsheets and can share it ahead of time.

The topic is how the tongue and the jaw change in their range of motion during various speech tasks (speaking typical, loud, slow, clear). These speech tasks are used in speech therapy to help people with brain diseases (Parkinsonís disease) to be better understood. In this data set I look at this in just one group of speakers (healthy speakers).

Participants complete 5 repetitions for each task (5 reps x 4 tasks = 20 data points from each participant). There are 11 females and 10 males in this study (sex has a significant main effect due to anatomical differences between males and females, but it is typically not statistically controlled for in our field in repeated measures). There are three measures Ė tongue movement, jaw movement , and the acoustics. For all three I need to analyze task effects in separate analyses. I also need to look at how changes in tongue movements predict changes in acoustics and how well changes in jaw movements predict change sin acoustics using data of typical to loud speech, typical to clear speech, typical to slow speech -- this time regressions within females and within males.

In the meeting I would like to make sure that I ran these analyses correctly and also would like to verify that I used to correct degrees of freedom in my write-up.

Recommendations: 1. For primary analysis, either ANOVA using each subject's mean or mixed-effects model with fixed effect for task and random effects for subject would be fine. 2. For secondary analysis, it would be best to use the same approach (either one mean data point per person per task, or a mixed-effects model). If doing mixed effects model for secondary analysis, be careful with the interpretation of R-squared.

Katie McGinnis (MPH candidate)

I have questions about my MPH Thesis project, specifically related to the best options for comparing some of my variables and running a few other statistical tests

Practicum in Kenya; originally a needs assessment, not designed for research. 16-page staff surveys (n= 94) & parent surveys (n= 69) from 2 children's hospitals, plus demographic data. Hoping to compare parent responses to staff responses in some way. Challenges: 1. parents are responding about 1 child but staff are responding about all children, and 2. for some items, the response scales for parents and staff are slightly or very different. She is comfortable treating the response options as numeric (taking the mean would be meaningful to her). The thesis does not have to contain a formal statistical analysis.

Recommendation for next steps: For survey items where the response scales are the same, continue the exploratory data analysis by plotting histograms for the staff responses, and then marking the mean of the parent responses on the x-axis.


Frances Anderson, MPH Global Health

I am an MPH Global Health track student and I need some assistance with ANOVA analysis on my thesis project. My project is an evaluation of Minnesota's TB screening of refugees and immigrants across four counties in the state. The data I am looking at for ANOVA includes mean days to initiation (TB testing) and mean days to disposition. There are some outliers in the data that I need to consider dropping. I seek advisement in this, completing the test, and if ANOVA is not appropriate for this dataset finding a new test.


Joshua Cockroft, MD student

We are looking to design and validate a new psychometric scale that measures a patient/client's trust in new providers. Though psychometric scales currently exist that measure trust in healthcare systems, trust in existing personal providers, and measures of global trust, there is currently no scale described in the literature that specifically measures trust in new providers. The hope is that such a scale would be of use in many underserved populations, particularly those populations with histories of either substance use disorder or severe mental illness, who are not regularly active participants within the healthcare system. We would hope to be able to use such a survey to measure the effect of this specific type of trust on outcomes such as healthcare service utilization. Like other healthcare trust-related scales, this scale would likely be a Likert-scale with questions that would span multiple domains of trust (i.e. competence, dependability). As there is no current gold standard for this type of measurement, advice on important considerations for internal validation would be greatly appreciated. We may consider the validation of this scale in multiple sub-populations if able. Conceptualization of this scale will be derived from the literature and our own qualitative research.
Topic revision: r1 - 15 Jan 2021, DalePlummer

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback