Statistical Thinking in Biomedical Research

Discussion Board

To add new topics or modify existing ones, click on Edit at the bottom, and click on the GoodStyle link that will appear, for formatting suggestions. When you save what you've typed, please check the "Release Edit Lock" button that will appear at the bottom of the screen, so others can also edit this page.

Statistical Graphics

DavidAirey 08 Nov 2004: During Frank's entertaining lecture on use of graphics, Frank mentioned the desire to see more use of not only confidence intervals for mean estimates, but when the interest was in the difference between means, the confidence interval for that difference should be plotted too. I'm sifting through some of the notes and graphics for the example. Anyone see it?

FrankHarrell: See p. 14 of http://biostat.mc.vanderbilt.edu/twiki/pub/Main/ClinStat/ci2.biostat1.pdf

DavidAirey 11 Nov 2004: In addition to Stata (www.stata.com) (and fumbling around with R 2.0), I use a fun statistics package for exploratory graphics called Data Desk (www.datadesk.com), sadly now in a developmental coma. My confusion with the above stemmed from my use of boxplots in this package. I usually overlay a 95% confidence interval on the box. This CI is actually for the median and constructed in such a way as to allow inference at the 0.05 level across the boxes graphed (median ± 1.58(high hinge - low hinge)/sqrt(n); derived in Chapter 3 in Velleman and Hoaglin (1981)).

FrankHarrell: There is a way to make approximately correct conclusions about significant differences based on overlap of two intervals, when those intervals have roughly 70% coverage. But it is always best to display an interval for the actual difference.

Statistical Software and Excel

Lynne Hinger (lynne.hinger@Vanderbilt.Edu) 10 Nov 2004: In the ExcelProblems handout it states:
 On allstat a few years ago, a weird example was mentioned. Enter a 
 column of zeros and set a cell to contain their sum. Now change one cell 
 to O (upper case O). Now change another cell to l (lower case l). What 
 happens? What would be reasonable behaviour */for statistical 
 purposes/*? Do you want to trust this software?
I tried this in excel and wasn't really sure what the problem was - it was just the sum. He doesn't really explain the "reasonable behavior" he is looking for? It made sense to me ?? (And that worries me!)

FrankHarrell: I haven't tried that myself. I take it that the non-numerics were not properly excluded from the sum or more likely that a sum of non-missing values was computed instead of giving an answer of NA that would alert the analyst about bad data in the column. There may also be problem when counting the number of non-missing entries for the purposes of getting a mean.

Electrophysiological Data Analysis

OctavioRuiz - 11 Nov 2004
We record action potentials ("spikes") from awake animals.  Each spike signals the firing of a neuron.
The firing obeys to complicated non-stationary non-linear dynamic probably deterministic processes 
within the neuron, that we cannot measure. The rate of spikes change during certain experimental 
manipulations, and the change exhibits variability.

First problem (difficult): Estimate physiologically relevant parameters of the underlying process within 
the cell from the spike data.

Second problem (easier): Assess the effect of the experimental manipulations on the “activity” of the 
cell during different conditions, as measured in terms of the cell spike rate.  The conditions are:
  A:  No stimulus --> Basal activity
      Stimulus A  --> Response to A
  B:  No stimulus --> Basal activity
      Stimulus B  --> Response to B
Each condition is repeated several times in a block design (e.g. A: 20 trials;  B: 20 trials;  A: other 
20 trials or so, B: ...).  The basal activity, and the response to the stimuli may change along time, 
sometimes monotonically, sometimes exhibiting a convex shape.  The experiment is repeated for different 
cells from the same animal (never the same cell), from two or three animals.

The problem will then be (1) found the relative effect of condition A and condition B on the population 
of cells, regardless of non-stationarities, and intertrial, cell-to-cell, and animal-to-animal 
variability, and (2) asses if the sample contains “atypical” response cells, i.e. cells for which
Resp(A) is significantly larger / smaller than Resp(B), with a given level of confidence.

Any suggestions?

DavidAirey 12 Nov 2004: You bring to the table a fundamental analysis paradigm in neurophysiology that I had a running argument about with two editors in chief of the Journal of Neurophysiology, to get them to defend appropriate and inappropriate analysis of subsampling from intact organisms exposed to multiple treatment conditions. Or at least publish guidelines. They have only just instituted some guidelines for the Journal, but ignored the questions raised by you and I. The problem is that with subsampling organisms in the way described, results in two sources of systematic error that lead to data clustering (TIME and ANIMAL) such that failure to deal with this complex error structure will have consequences on your ability to make correct inferences. What really needs to be done in this field is to assess the intraclass correlation for in vivo neurophysiological data, to assess the extent to which the animal correlations are weak enough to warrant making inferences on the population of neurons from the population of animals by ignoring variation at the level of the animal or having a very sparse (N = 3) level to the data set. My personal opinion is that there is something fundamental that has gone unaddressed in the field of non-human primate neurophysiology work. An early opinion of mine, which has changed a little since posted, can be read at http://homepage.mac.com/david.airey/vita/pooling.htm. I am trying to rework this piece for the Journal of Neuroscience Methods and would be happy to collaborate. I personally think poking the same exact 10 neurons (they are numbered you know) in the worm C. elegans from different genetic strains would be informative, in that a good ICC could be estimated. I invite the Department of Biostatistics to weigh in on this question with their expertise. The field of neurophysiology is very sophisticated for within-animal analysis.

FrankHarrell: Yu-Chieh Yang, Anna Liu, and Yuedong Wang have done some nice work on pulsatile hormone release that may possibly relate. See http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=statistics+pulsatile&btnG=Search and their R software. Dan Keenan at the University of Virginia has also done some nice work, with Johannes Veldhuis.

DavidAirey 14 Nov 2004: Seems the thing to do is to set up a formal interaction with a statistician from Biostatistics. I'm sure you will eventually get answers that satisfy your requirements. My own opinions (above) are not informed by expertise in your field, but are motivated by the need to critically evaluate published results. I just looked up the Keenan and Veldhuis work on Entrez. They typically have many more patients than a non-human primate electrophysiology experiment can hope to have. I've also skimmed the paper of Wang's (2004). In this paper they discuss data from 72 patients. Here are some other interesting articles, and another typical paper in the Journal of Neurophysiology with 2 monkeys.

1: Neuroimage. 2004 Apr;21(4):1732-47. 

Multilevel linear modelling for FMRI group analysis using Bayesian inference.

Woolrich MW, Behrens TE, Beckmann CF, Jenkinson M, Smith SM.

Oxford Centre for Functional Magnetic Resonance Imaging of the Brain, John
Radcliffe Hospital, University of Oxford, Oxford OX3 9DU, UK.
woolrich@fmrib.ox.ac.uk

Functional magnetic resonance imaging studies often involve the acquisition of
data from multiple sessions and/or multiple subjects. A hierarchical approach
can be taken to modelling such data with a general linear model (GLM) at each
level of the hierarchy introducing different random effects variance components.
Inferring on these models is nontrivial with frequentist solutions being
unavailable. A solution is to use a Bayesian framework. One important ingredient
in this is the choice of prior on the variance components and top-level
regression parameters. Due to the typically small numbers of sessions or
subjects in neuroimaging, the choice of prior is critical. To alleviate this
problem, we introduce to neuroimage modelling the approach of reference priors,
which drives the choice of prior such that it is noninformative in an
information-theoretic sense. We propose two inference techniques at the top
level for multilevel hierarchies (a fast approach and a slower more accurate
approach). We also demonstrate that we can infer on the top level of multilevel
hierarchies by inferring on the levels of the hierarchy separately and passing
summary statistics of a noncentral multivariate t distribution between them.

PMID: 15050594 [PubMed - indexed for MEDLINE]



2: Neuroimage. 2004 Apr;21(4):1639-51. 

Variation of BOLD hemodynamic responses across subjects and brain regions and
their effects on statistical analyses.

Handwerker DA, Ollinger JM, D'Esposito M.

Henry H. Wheeler Jr. Brain Imaging Center, Helen Wills Neuroscience Institute
and Department of Psychology, University of California, Berkeley, CA 94720, USA.
werker@socrates.berkeley.edu

Estimates of hemodynamic response functions (HRF) are often integral parts of
event-related fMRI analyses. Although HRFs vary across individuals and brain
regions, few studies have investigated how variations affect the results of
statistical analyses using the general linear model (GLM). In this study, we
empirically estimated HRFs from primary motor and visual cortices and frontal
and supplementary eye fields (SEF) in 20 subjects. We observed more variability
across subjects than regions and correlated variation of time-to-peak values
across several pairs of regions. Simulations examined the effects of observed
variability on statistical results and ways different experimental designs and
statistical models can limit these effects. Widely spaced and rapid
event-related experimental designs with two sampling rates were tested.
Statistical models compared an empirically derived HRF to a canonical HRF and
included the first derivative of the HRF in the GLM. Small differences between
the estimated and true HRFs did not cause false negatives, but larger
differences within an observed range of variation, such as a 2.5-s time-to-onset
misestimate, led to false negatives. Although small errors minimally affected
detection of activity, time-to-onset misestimates as small as 1 s influenced
model parameter estimation and therefore random effects analyses across
subjects. Experiment and analysis design methods such as decreasing the sampling
rate or including the HRF's temporal derivative in the GLM improved results, but
did not eliminate errors caused by HRF misestimates. These results highlight the
benefits of determining the best possible HRF estimate and potential negative
consequences of assuming HRF consistency across subjects or brain regions.

PMID: 15050587 [PubMed - indexed for MEDLINE]



3: J Neurophysiol. 2004 Jan;91(1):286-300. Epub 2003 Oct 01. 

Activity of neurons in cortical area MT during a memory for motion task.

Bisley JW, Zaksas D, Droll JA, Pasternak T.

Department of Neurobiology and Anatomy and Center for Visual Science, University
of Rochester, Rochester, New York 14642, USA.

We recorded the activity of middle temporal (MT) neurons in 2 monkeys while they
compared the directions of motion in 2 sequentially presented random-dot
stimuli, sample and test, and reported them as the same or different by pressing
one of 2 buttons. We found that MT neurons were active not only in response to
the sample and test stimuli but also during the 1,500-ms delay separating them.
Most neurons showed a characteristic pattern of activity consisting of a small
burst of firing early in the delay, followed by a period of suppression and a
subsequent increase in firing rate immediately preceding the presentation of the
test stimulus. In a third of the neurons, the activity early in the delay not
only reflected the direction of the sample stimulus, but was also related to the
range of local directions it contained. During the middle of the delay the
majority of neurons were suppressed, consistent with a gating mechanism that
could be used to ignore task-irrelevant stimuli. Late in the delay, most neurons
showed an increase in response, probably in anticipation of the upcoming test.
Throughout most of the delay there was a directional signal in the population of
MT neurons, manifested by higher firing rates following the sample moving in the
antipreferred direction. Whereas some of these effects may be related to sensory
adaptation, others are more likely to represent a more active task-related
process. These results support the hypothesis that MT neurons actively
participate in the successful execution of all aspects of the task requiring
processing and remembering visual motion.

PMID: 14523065 [PubMed - indexed for MEDLINE] 
Edit | Attach | Print version | History: r14 < r13 < r12 < r11 | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r13 - 15 Nov 2004, DavidAirey
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback