Conversion of SAS Code to Compute Multiple Stratified Statistics on More than One Analysis Variable

Greg Adkison gadkison@email.wcu.edu writes:

Say for example that I have a dataset named "dat1" that includes five variables: wshed, site, species, bda, and sla. I can calculate with the following SAS code the mean, CV, se, and number of observations of "bda" and "sla" for each combination of "wshed," "species," and "site," restricting the species considered to only three of several species in dat1 (b, c, and p). Moreover, I can output these calculations and grouping variables to a dataset named "dat2" that will reside in RAM and include the variables wshed, site, species, mBdA, msla, cBda, sBdA, ssla, nBda, and nsla.
proc sort data=dat1;
  by wshed site species;
proc means data=dat1 noprint mean cv stderr n;
  by wshed site species;
  where species in ('b', 'c', 'p');
  var BdA sla;
  output out=dat2
    mean=mBdA msla
    cv=cBdA csla
    stderr=sBdA ssla
    n=nBdA nsla;

The following handles any number of analysis variables, with automatic naming of all statistics computed from them. Note: as of 15Oct06 you will have to get a new version of summarize to fix a bug with stat.name=NULL. To do so do the following. Note that getLatestSource will be in the next version of Hmisc.
source('http://biostat.mc.vanderbilt.edu/tmp/getLatestSource.s')
getLatestSource(c('summary.formula','mApply'))

# Generate some data
set.seed(1)
dat1 <- expand.grid(wshed=1:2, site=c('A','B'),
                    species=c('a','b','c','p'),
                    reps=1:10)
n <- nrow(dat1)
dat1 <- transform(dat1, 
                  BdA = rnorm(n, 100, 20),
                  sla = c(rnorm(n-1, 200, 30), NA))
# Can use upData function in Hmisc in place of transform

# Summarization function, per stratum
g <- function(y) {
  s <- apply(y, 2,
             function(z) {
               z <- z[!is.na(z)]
               n <- length(z)
               if(n==0) c(NA,NA,NA,0) else
               if(n==1) c(z, NA,NA,1) else {
                 m <- mean(z)
                 s <- sd(z)
                 c(Mean=m, CV=s/m, SE=s/sqrt(n), N=n)
               }
             })
  w <- as.vector(s)
  names(w) <-  as.vector(outer(rownames(s), colnames(s), paste, sep=''))
  w
}
library(Hmisc)
dat2 <-  with(dat1,
              summarize(cbind(BdA, sla),
                        llist(wshed, site, species),
                        g,
                        subset=species %in% c('b','c','p'),
                        stat.name=NULL))
              
options(digits=3)
dat2  # is a data frame

   wshed site species MeanBdA CVBdA SEBdA NBdA Meansla  CVsla SEsla Nsla
1      1    A       b   100.5 0.133  4.23   10     195 0.1813 11.20   10
2      1    A       c    99.7 0.101  3.17   10     206 0.1024  6.68   10
3      1    A       p   101.4 0.239  7.65   10     188 0.1580  9.39   10
4      1    B       b   109.9 0.118  4.09   10     203 0.1433  9.21   10
5      1    B       c    98.4 0.193  6.01   10     221 0.1250  8.72   10
6      1    B       p   102.9 0.216  7.03   10     203 0.1446  9.29   10
7      2    A       b    95.8 0.241  7.31   10     195 0.2011 12.40   10
8      2    A       c    98.7 0.194  6.04   10     207 0.1274  8.33   10
9      2    A       p   102.2 0.217  7.01   10     191 0.1709 10.31   10
10     2    B       b    97.8 0.235  7.27   10     191 0.2079 12.58   10
11     2    B       c   100.9 0.164  5.24   10     194 0.0987  6.07   10
12     2    B       p   103.0 0.144  4.69   10     209 0.0769  5.35    9

# Another approach, but does casewise deletion of NAs and computes all
# possible marginal summaries
dat2 <- summary(cbind(BdA, sla) ~ wshed + site + species,
                data=dat1, fun=g, method='cross')
print.data.frame(dat2)  # after Sept 2004 just say dat2
# dat2 is similar to but not really a data frame, with a matrix of statistics S
# Remove method='cross' to get all one-way summaries

-- FrankHarrell - 17-18 Jul 2004
Topic revision: r3 - 15 Oct 2006, FrankHarrell
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback