On allstat a few years ago, a weird example was mentioned. Enter a column of zeros and set a cell to contain their sum. Now change one cell to O (upper case O). Now change another cell to l (lower case l). What happens? What would be reasonable behaviour */for statistical purposes/*? Do you want to trust this software?I tried this in excel and wasn't really sure what the problem was - it was just the sum. He doesn't really explain the "reasonable behavior" he is looking for? It made sense to me ?? (And that worries me!) FrankHarrell: I haven't tried that myself. I take it that the non-numerics were not properly excluded from the sum or more likely that a sum of non-missing values was computed instead of giving an answer of NA that would alert the analyst about bad data in the column. There may also be problem when counting the number of non-missing entries for the purposes of getting a mean.
We record action potentials ("spikes") from awake animals. Each spike signals the firing of a neuron. The firing obeys to complicated non-stationary non-linear dynamic probably deterministic processes within the neuron, that we cannot measure. The rate of spikes change during certain experimental manipulations, and the change exhibits variability. First problem (difficult): Estimate physiologically relevant parameters of the underlying process within the cell from the spike data. Second problem (easier): Assess the effect of the experimental manipulations on the “activity” of the cell during different conditions, as measured in terms of the cell spike rate. The conditions are: A: No stimulus --> Basal activity Stimulus A --> Response to A B: No stimulus --> Basal activity Stimulus B --> Response to B Each condition is repeated several times in a block design (e.g. A: 20 trials; B: 20 trials; A: other 20 trials or so, B: ...). The basal activity, and the response to the stimuli may change along time, sometimes monotonically, sometimes exhibiting a convex shape. The experiment is repeated for different cells from the same animal (never the same cell), from two or three animals. The problem will then be (1) found the relative effect of condition A and condition B on the population of cells, regardless of non-stationarities, and intertrial, cell-to-cell, and animal-to-animal variability, and (2) asses if the sample contains “atypical” response cells, i.e. cells for which Resp(A) is significantly larger / smaller than Resp(B), with a given level of confidence. Any suggestions?DavidAirey 12 Nov 2004: You bring to the table a fundamental analysis paradigm in neurophysiology that I had a running argument about with two editors in chief of the Journal of Neurophysiology, to get them to defend appropriate and inappropriate analysis of subsampling from intact organisms exposed to multiple treatment conditions. Or at least publish guidelines. They have only just instituted some guidelines for the Journal, but ignored the questions raised by you and I. The problem is that with subsampling organisms in the way described, results in two sources of systematic error that lead to data clustering (TIME and ANIMAL) such that failure to deal with this complex error structure will have consequences on your ability to make correct inferences. What really needs to be done in this field is to assess the intraclass correlation for in vivo neurophysiological data, to assess the extent to which the animal correlations are weak enough to warrant making inferences on the population of neurons from the population of animals by ignoring variation at the level of the animal or having a very sparse (N = 3) level to the data set. My personal opinion is that there is something fundamental that has gone unaddressed in the field of non-human primate neurophysiology work. An early opinion of mine, which has changed a little since posted, can be read at http://homepage.mac.com/david.airey/vita/pooling.htm. I am trying to rework this piece for the Journal of Neuroscience Methods and would be happy to collaborate. I personally think poking the same exact 10 neurons (they are numbered you know) in the worm C. elegans from different genetic strains would be informative, in that a good ICC could be estimated. I invite the Department of Biostatistics to weigh in on this question with their expertise. The field of neurophysiology is very sophisticated for within-animal analysis. FrankHarrell: Yu-Chieh Yang, Anna Liu, and Yuedong Wang have done some nice work on pulsatile hormone release that may possibly relate. See http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=statistics+pulsatile&btnG=Search and their R software. Dan Keenan at the University of Virginia has also done some nice work, with Johannes Veldhuis. DavidAirey 14 Nov 2004: Seems the thing to do is to set up a formal interaction with a statistician from Biostatistics. I'm sure you will eventually get answers that satisfy your requirements. My own opinions (above) are not informed by expertise in your field, but are motivated by the need to critically evaluate published results. I just looked up the Keenan and Veldhuis work on Entrez. They typically have many more patients than a non-human primate electrophysiology experiment can hope to have. I've also skimmed the paper of Wang's (2004). In this paper they discuss data from 72 patients.