Rafe's Dataset Reshape Problem
Rafe posed the following problem. The original data set had three measurements for each 'specimen' (spec). The goal was was to turn a wider data set into one that was three times as tall. There were two rows for each specimen, one for the left hand and one for the right hand. At the end, there is a variable for each hand and three readings for each of those variables. So the result is three observations for each specimen, since specimens were nested within gender. The SAS code follows.
proc sort data=measurements out=measurements; by gender spec hand;
run;
data measurements;
set measurements(drop=ra_ fcr_ pl_ mn_ fcu_ unvb_);
obnumber = _n_;
percentra = ra/width;
percentfcr = fcr/width;
percentpl = pl/width;
percentmn = mn/width;
percentfcu = fcu/width;
percentunvb = unvb/width;
run;
proc print data=measurements u; run;
proc format;
value $loc 'percentra'='r'
'percentmn'='m'
'percentunvb'='u';
run;
proc transpose data=measurements
out=talldata(rename=(col1=hand1 col2=hand2)) name=location;
by gender spec;
var percentra percentmn percentunvb;
run;
proc print u data=talldata; run;
proc plot data=talldata;
format location $loc.;
plot hand1*hand2=location; run;
There are at least four solutions in
R
: customized programming, the builtin
reshape
function, the
reShape
function in the
Hmisc
package, and the
reshape
package. The following solution is based on the last approach. A simulated dataset is used that is similar in design to Rafe's but not exactly the same.
library(Hmisc)
library(reshape)
set.seed(1)
d <- expand.grid(gender=c('f','m'), spec=c('a','b','c'), hand=c('L','R'),
loc=LETTERS[1:5])
n <- nrow(d)
d <- transform(d, p1=runif(n), p2=runif(n), p3=runif(n))
options(digits=2)
head(d)
gender spec hand loc p1 p2 p3
1 f a L A 0.27 0.91 0.99
2 m a L A 0.37 0.29 0.50
3 f b L A 0.57 0.46 0.48
4 m b L A 0.91 0.33 0.17
5 f c L A 0.20 0.65 0.75
6 m c L A 0.90 0.26 0.45
m <- melt(d, measure.var=c('p1','p2','p3'), preserve.na=FALSE) # no NAs here but in general ...
head(m)
gender spec hand loc variable value
1 f a L A p1 0.27
2 m a L A p1 0.37
3 f b L A p1 0.57
4 m b L A p1 0.91
5 f c L A p1 0.20
6 m c L A p1 0.90
head(cast(m, ... ~ hand + variable))
head(cast(m, ... ~ variable + hand))
gender spec loc L_p1 L_p2 L_p3 R_p1 R_p2 R_p3
1 f a A 0.27 0.91 0.992 0.94 0.479 0.51
2 f a B 0.69 0.35 0.036 0.38 0.777 0.99
3 f a C 0.27 0.76 0.729 0.48 0.240 0.61
4 f a D 0.79 0.46 0.530 0.78 0.270 0.45
5 f a E 0.73 0.92 0.724 0.07 0.148 0.38
6 f b A 0.57 0.46 0.484 0.63 0.084 0.23
gender spec loc p1_L p1_R p2_L p2_R p3_L p3_R
1 f a A 0.27 0.94 0.91 0.479 0.992 0.51
2 f a B 0.69 0.38 0.35 0.777 0.036 0.99
3 f a C 0.27 0.48 0.76 0.240 0.729 0.61
4 f a D 0.79 0.78 0.46 0.270 0.530 0.45
5 f a E 0.73 0.07 0.92 0.148 0.724 0.38
6 f b A 0.57 0.63 0.46 0.084 0.484 0.23
For more information about
reshape
see
http://had.co.nz/reshape .