DatadensityFunction < Main < Vanderbilt Biostatistics Wiki

You are here: Vanderbilt Biostatistics Wiki>Main Web>Seminars>RClinic>DatadensityFunction (14 Nov 2006, TheresaScott) (raw view)EditAttach

---+ The generic datadensity() function

The =datadensity()= function is a generic functions used to show data densities in more complex situations.  There are two class-specific methods of the generic =datadensity()= function: 
   1 The =Hmisc= package's =datadensity.data.frame()= function, which displays the variables in a data frame.
   2 The =Design= package's =datadensity.plot.Design()= function, which can be used in conjunction with the =Design= package's =plot.Design()= function.  The =Design= package's =plot.Design()= function is used to plot the results of a regression model fit with one of the =Design= package's regression functions (e.g., =ols()=, =lrm()=, and =cph()=).
<highlight>
library(Hmisc)
library(Design)

methods(datadensity)
</highlight>

To illustrate the differences between the two methods, let's use the =samplefile.txt= data file.
   * [[%ATTACHURL%/samplefile.txt][samplefile.txt]]
<highlight>
x<-read.table("samplefile.txt", header=TRUE)

x<-upData(x,
   labels=c(age="Age", race="Race", sex="Sex",
      weight="Weight", visits="No. of Visits",
      tx="Treatment"),
   levels=list(sex=c("Female", "Male"),
      race=c("Black", "Caucasian", "Other"),
      tx=c("Drug X", "Placebo")),
   units=c(age="years", weight="lbs."))
contents(x)
</highlight>

Let's first illustrate the =Hmisc= package's =datadensity.data.frame()= method.  As mentioned, this method displays the variables of a data frame.  More specifically, rug plots are used to display continuous variables and, by default, bars plots are used to display frequencies of categorical, character, or discrete numeric variables.  
   * By default, the =datadensity.data.frame()= function will construct one axis (i.e., one strip) per variable in the data frame. 
   * Variable names appear to the left of the axes, and the number of missing values (if greater than zero) appear to the right of the axes.
   * For categorical or character variables, only the first few characters from each level are used when the total length of the value labels exceeds 200.  
   * An optional =group== variable can be used for stratification, where the different strata are depicted using different colors. 
   * If the =q== argument is specified, the desired quantiles (over all groups) are displayed with solid triangles below each axis.

Here are some specific examples.
<highlight>
datadensity(x)
datadensity(x, which = "continuous")
datadensity(x, group = x$race)
datadensity(x, group = x$tx)
datadensity(x, ranges = list(age = c(5, 100)))
datadensity(x, q = c(0.25, 0.5, 0.75)) # tiny triangles
datadensity(x, 
   labels = as.character(contents(x)$contents[,"Labels"]))
</highlight>

Now let's illustrate the =Design= package's =datadensity.plot.Design()= method.  For this, let's first add some variable to our =x= data frame in order to fit a logistic regression model using the =Design= package's =lrm()= function.
<highlight>
x<-upData(x,
   # Specify population model for log odds that Y=1
   L = .4*(sex=='Male') + .045*(age-50) +
      (log(weight - 10)-5.2)*(-2*(sex=='Female') + 
      2*(sex=='Male')),
   # Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)]
   y  = ifelse(runif(n) < plogis(L), 1, 0))
ddist <- datadist(x) ; options(datadist='ddist')
mfit <- lrm(y ~ visits + sex * (age + rcs(weight,4)),
   data = x, x=TRUE, y=TRUE)
anova(mfit)
z <- plot(mfit, age=NA) 
with(x, datadensity(z, age))
</highlight>

Topic revision: r2 - 14 Nov 2006, TheresaScott

Main

Department Home Page

Biostatistics Graduate Program

Vanderbilt University Medical Center

Biostatistics Webs
- Archive
- Main
- Sandbox
- System

Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback