The generic datadensity() function
The
datadensity()
function is a generic functions used to show data densities in more complex situations. There are two class-specific methods of the generic
datadensity()
function:
- The
Hmisc
package's datadensity.data.frame()
function, which displays the variables in a data frame.
- The
Design
package's datadensity.plot.Design()
function, which can be used in conjunction with the Design
package's plot.Design()
function. The Design
package's plot.Design()
function is used to plot the results of a regression model fit with one of the Design
package's regression functions (e.g., ols()
, lrm()
, and cph()
).
library(Hmisc)
library(Design)
methods(datadensity)
To illustrate the differences between the two methods, let's use the
samplefile.txt
data file.
x<-read.table("samplefile.txt", header=TRUE)
x<-upData(x,
labels=c(age="Age", race="Race", sex="Sex",
weight="Weight", visits="No. of Visits",
tx="Treatment"),
levels=list(sex=c("Female", "Male"),
race=c("Black", "Caucasian", "Other"),
tx=c("Drug X", "Placebo")),
units=c(age="years", weight="lbs."))
contents(x)
Let's first illustrate the
Hmisc
package's
datadensity.data.frame()
method. As mentioned, this method displays the variables of a data frame. More specifically, rug plots are used to display continuous variables and, by default, bars plots are used to display frequencies of categorical, character, or discrete numeric variables.
- By default, the
datadensity.data.frame()
function will construct one axis (i.e., one strip) per variable in the data frame.
- Variable names appear to the left of the axes, and the number of missing values (if greater than zero) appear to the right of the axes.
- For categorical or character variables, only the first few characters from each level are used when the total length of the value labels exceeds 200.
- An optional
group=
variable can be used for stratification, where the different strata are depicted using different colors.
- If the
q=
argument is specified, the desired quantiles (over all groups) are displayed with solid triangles below each axis.
Here are some specific examples.
datadensity(x)
datadensity(x, which = "continuous")
datadensity(x, group = x$race)
datadensity(x, group = x$tx)
datadensity(x, ranges = list(age = c(5, 100)))
datadensity(x, q = c(0.25, 0.5, 0.75)) # tiny triangles
datadensity(x,
labels = as.character(contents(x)$contents[,"Labels"]))
Now let's illustrate the
Design
package's
datadensity.plot.Design()
method. For this, let's first add some variable to our
x
data frame in order to fit a logistic regression model using the
Design
package's
lrm()
function.
x<-upData(x,
# Specify population model for log odds that Y=1
L = .4*(sex=='Male') + .045*(age-50) +
(log(weight - 10)-5.2)*(-2*(sex=='Female') +
-
- *(sex=='Male')), # Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)] y = ifelse(runif(n) < plogis(L), 1, 0))
ddist <- datadist(x) ; options(datadist='ddist')
mfit <- lrm(y ~ visits + sex * (age + rcs(weight,4)),
data = x, x=TRUE, y=TRUE)
anova(mfit)
z <- plot(mfit, age=NA)
with(x, datadensity(z, age))