You are here: Vanderbilt Biostatistics Wiki>Main Web>Seminars>RClinic>DatadensityFunction (14 Nov 2006, TheresaScott)Edit Attach

The generic datadensity() function

The datadensity() function is a generic functions used to show data densities in more complex situations. There are two class-specific methods of the generic datadensity() function:

The Hmisc package's datadensity.data.frame() function, which displays the variables in a data frame.
The Design package's datadensity.plot.Design() function, which can be used in conjunction with the Design package's plot.Design() function. The Design package's plot.Design() function is used to plot the results of a regression model fit with one of the Design package's regression functions (e.g., ols(), lrm(), and cph()).

library(Hmisc) library(Design)

methods(datadensity)

To illustrate the differences between the two methods, let's use the samplefile.txt data file.

samplefile.txt

x<-read.table("samplefile.txt", header=TRUE)

x<-upData(x, labels=c(age="Age", race="Race", sex="Sex", weight="Weight", visits="No. of Visits", tx="Treatment"), levels=list(sex=c("Female", "Male"), race=c("Black", "Caucasian", "Other"), tx=c("Drug X", "Placebo")), units=c(age="years", weight="lbs.")) contents(x)

Let's first illustrate the Hmisc package's datadensity.data.frame() method. As mentioned, this method displays the variables of a data frame. More specifically, rug plots are used to display continuous variables and, by default, bars plots are used to display frequencies of categorical, character, or discrete numeric variables.

By default, the datadensity.data.frame() function will construct one axis (i.e., one strip) per variable in the data frame.
Variable names appear to the left of the axes, and the number of missing values (if greater than zero) appear to the right of the axes.
For categorical or character variables, only the first few characters from each level are used when the total length of the value labels exceeds 200.
An optional group= variable can be used for stratification, where the different strata are depicted using different colors.
If the q= argument is specified, the desired quantiles (over all groups) are displayed with solid triangles below each axis.

Here are some specific examples. datadensity(x) datadensity(x, which = "continuous") datadensity(x, group = x$race) datadensity(x, group = x$tx) datadensity(x, ranges = list(age = c(5, 100))) datadensity(x, q = c(0.25, 0.5, 0.75)) # tiny triangles datadensity(x, labels = as.character(contents(x)$contents[,"Labels"]))

Now let's illustrate the Design package's datadensity.plot.Design() method. For this, let's first add some variable to our x data frame in order to fit a logistic regression model using the Design package's lrm() function. x<-upData(x, # Specify population model for log odds that Y=1 L = .4*(sex=='Male') + .045*(age-50) + (log(weight - 10)-5.2)*(-2*(sex=='Female') +

1. *(sex=='Male')), # Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)] y = ifelse(runif(n) < plogis(L), 1, 0))

ddist <- datadist(x) ; options(datadist='ddist') mfit <- lrm(y ~ visits + sex * (age + rcs(weight,4)), data = x, x=TRUE, y=TRUE) anova(mfit) z <- plot(mfit, age=NA) with(x, datadensity(z, age))

Topic revision: r2 - 14 Nov 2006, TheresaScott

Main

Department Home Page

Biostatistics Graduate Program

Vanderbilt University Medical Center

Biostatistics Webs
- Archive
- Main
- Sandbox
- System

Copyright &© 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback