Statistical Thinking in Biomedical Research

Discussion Board

To add new topics or modify existing ones, click on Edit at the bottom, and click on the GoodStyle link that will appear, for formatting suggestions. When you save what you've typed, please check the "Release Edit Lock" button that will appear at the bottom of the screen, so others can also edit this page.

Statistical Graphics

DavidAirey 08 Nov 2004: During Frank's entertaining lecture on use of graphics, Frank mentioned the desire to see more use of not only confidence intervals for mean estimates, but when the interest was in the difference between means, the confidence interval for that difference should be plotted too. I'm sifting through some of the notes and graphics for the example. Anyone see it?

FrankHarrell: See p. 14 of http://biostat.mc.vanderbilt.edu/twiki/pub/Main/ClinStat/ci2.biostat1.pdf

DavidAirey 11 Nov 2004: In addition to Stata (www.stata.com) (and fumbling around with R 2.0), I use a fun statistics package for exploratory graphics called Data Desk (www.datadesk.com), sadly now in a developmental coma. My confusion with the above stemmed from my use of boxplots in this package. I usually overlay a 95% confidence interval on the box. This CI is actually for the median and constructed in such a way as to allow inference at the 0.05 level across the boxes graphed (median ± 1.58(high hinge - low hinge)/sqrt(n); derived in Chapter 3 in Velleman and Hoaglin (1981)).

FrankHarrell: There is a way to make approximately correct conclusions about significant differences based on overlap of two intervals, when those intervals have roughly 70% coverage. But it is always best to display an interval for the actual difference.

Statistical Software and Excel

Lynne Hinger (lynne.hinger@Vanderbilt.Edu) 10 Nov 2004: In the ExcelProblems handout it states:
 On allstat a few years ago, a weird example was mentioned. Enter a 
 column of zeros and set a cell to contain their sum. Now change one cell 
 to O (upper case O). Now change another cell to l (lower case l). What 
 happens? What would be reasonable behaviour */for statistical 
 purposes/*? Do you want to trust this software?
I tried this in excel and wasn't really sure what the problem was - it was just the sum. He doesn't really explain the "reasonable behavior" he is looking for? It made sense to me ?? (And that worries me!)

FrankHarrell: I haven't tried that myself. I take it that the non-numerics were not properly excluded from the sum or more likely that a sum of non-missing values was computed instead of giving an answer of NA that would alert the analyst about bad data in the column. There may also be problem when counting the number of non-missing entries for the purposes of getting a mean.

Electrophysiological Data Analysis

OctavioRuiz - 11 Nov 2004
We record action potentials ("spikes") from awake animals.  Each spike signals the firing of a neuron.
The firing obeys to complicated non-stationary non-linear dynamic probably deterministic processes 
within the neuron, that we cannot measure. The rate of spikes change during certain experimental 
manipulations, and the change exhibits variability.

First problem (difficult): Estimate physiologically relevant parameters of the underlying process within 
the cell from the spike data.

Second problem (easier): Assess the effect of the experimental manipulations on the “activity” of the 
cell during different conditions, as measured in terms of the cell spike rate.  The conditions are:
  A:  No stimulus --> Basal activity
      Stimulus A  --> Response to A
  B:  No stimulus --> Basal activity
      Stimulus B  --> Response to B
Each condition is repeated several times in a block design (e.g. A: 20 trials;  B: 20 trials;  A: other 
20 trials or so, B: ...).  The basal activity, and the response to the stimuli may change along time, 
sometimes monotonically, sometimes exhibiting a convex shape.  The experiment is repeated for different 
cells from the same animal (never the same cell), from two or three animals.

The problem will then be (1) found the relative effect of condition A and condition B on the population 
of cells, regardless of non-stationarities, and intertrial, cell-to-cell, and animal-to-animal 
variability, and (2) asses if the sample contains “atypical” response cells, i.e. cells for which
Resp(A) is significantly larger / smaller than Resp(B), with a given level of confidence.

Any suggestions?

DavidAirey 12 Nov 2004: You bring to the table a fundamental analysis paradigm in neurophysiology that I had a running argument about with two editors in chief of the Journal of Neurophysiology, to get them to defend appropriate and inappropriate analysis of subsampling from intact organisms exposed to multiple treatment conditions. Or at least publish guidelines. They have only just instituted some guidelines for the Journal, but ignored the questions raised by you and I. The problem is that with subsampling organisms in the way described, results in two sources of systematic error that lead to data clustering (TIME and ANIMAL) such that failure to deal with this complex error structure will have consequences on your ability to make correct inferences. What really needs to be done in this field is to assess the intraclass correlation for in vivo neurophysiological data, to assess the extent to which the animal correlations are weak enough to warrant making inferences on the population of neurons from the population of animals by ignoring variation at the level of the animal or having a very sparse (N = 3) level to the data set. My personal opinion is that there is something fundamental that has gone unaddressed in the field of non-human primate neurophysiology work. An early opinion of mine, which has changed a little since posted, can be read at http://homepage.mac.com/david.airey/vita/pooling.htm. I am trying to rework this piece for the Journal of Neuroscience Methods and would be happy to collaborate. I personally think poking the same exact 10 neurons (they are numbered you know) in the worm C. elegans from different genetic strains would be informative, in that a good ICC could be estimated. I invite the Department of Biostatistics to weigh in on this question with their expertise. The field of neurophysiology is very sophisticated for within-animal analysis.

FrankHarrell: Yu-Chieh Yang, Anna Liu, and Yuedong Wang have done some nice work on pulsatile hormone release that may possibly relate. See http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=statistics+pulsatile&btnG=Search and their R software. Dan Keenan at the University of Virginia has also done some nice work, with Johannes Veldhuis.

DavidAirey 14 Nov 2004: Seems the thing to do is to set up a formal interaction with a statistician from Biostatistics. I'm sure you will eventually get answers that satisfy your requirements. My own opinions (above) are not informed by expertise in your field, but are motivated by the need to critically evaluate published results. I just looked up the Keenan and Veldhuis work on Entrez. They typically have many more patients than a non-human primate electrophysiology experiment can hope to have. I've also skimmed the paper of Wang's (2004). In this paper they discuss data from 72 patients.
Edit | Attach | Print version | History: r14 | r13 < r12 < r11 < r10 | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r12 - 14 Nov 2004, DavidAirey
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback