PlottingModelResults < Main < Vanderbilt Biostatistics Wiki

You are here: Vanderbilt Biostatistics Wiki>Main Web>Seminars>RClinic>PlottingModelResults (14 Nov 2006, TheresaScott) (raw view)EditAttach
---+ Plotting linear model results

Often times, you would like to generate graphics based on a model you fit in R.  These plots may include a number of diagnostic plots, or just plotting the fitted line with prediction and confidence bands.

This solution is a nearly exact excerpt from specific sections in the following book:
   * _Introductory Statistics with R_ by Peter Dalgaard (Springer, 2002).

Specifically, I have excerpted paragraphs and have used and modified the R code from _Chapter 5: Regression and Correlation_ Section 1, 2, and 3 and _Chapter 10: Linear Models_ Section 8.

All of the data sets used in Peter Dalgaard's book can be found in the =ISwR= add-on package, which is available for download/installation on the _CRAN_ website (http://cran.r-project.org/).

For this solution, we will be using the =thuesen= built-in data set from the =ISwR= add-on package.  According to the =ISwR= description, the =thuesen= data set "contains ventricular shortening velocity and blood glucose for type 1 diabetic patients."  =thuesen= contains two columns:
   * =blood.glucose= - fasting blood glucose (mmol/l) (numeric)
   * =short.velocity= - mean circumferential shortening velocity (%/s) (numeric)

In case some of you have problems installing the =ISwR= add-on package, I decided to generate the =thuesen= data set "from scratch" and proceed from there.

<highlight>
thuesen<-data.frame(
   blood.glucose=c(15.3, 10.8, 8.1, 19.5, 7.2, 5.3, 9.3,
      11.1, 7.5, 12.2, 6.7, 5.2, 19.0, 15.1, 6.7, 8.6, 4.2, 10.3,
      12.5, 16.1, 13.3, 4.9, 8.8, 9.5),
   short.velocity=c(1.76, 1.34, 1.27, 1.47, 1.27, 1.49, 1.31, 1.09,
      1.18, 1.22, 1.25, 1.19, 1.95, 1.28, 1.52, NA, 1.12, 1.37, 1.19,
      1.05, 1.32, 1.03, 1.12, 1.70))
</highlight>

For those of you who are able to properly install the =ISwR= package, you can load the library and read in the =thuesen= data set as follows:
<highlight>
library(ISwR)
data(thuesen)
</highlight>

Let's fit a simple linear regression model using the =thuesen= data.  For convenience, let's assign the value(s) returned by the =lm= function under the name =lm.velo=:
<highlight>
lm.velo <- lm(short.velocity ~ blood.glucose, data = thuesen)
lm.velo
</highlight>

As Peter Daalgard states on page 97:
   "The result of =lm= is a _model object_. This is a distinctive concept of the S language    (of which R is a dialect).  Where other statistical systems focus on generating printed output that can be controlled by setting options, you get instead the result of a model
fit encapsulated in an object from which the desired quantities can be obtained using _extractor functions_.  An =lm= object does in fact contain much more information than you see when it is printed."

A basic extractor function is the =summary()= function:
<highlight>
summary(lm.velo)
</highlight>

Let's first %BLUE% *plot just the points and the fitted line of the model fit* %ENDCOLOR%:
<highlight>
with(thuesen, plot(blood.glucose, short.velocity,
   xlab="Blood Glucose", ylab="Short Velocity",
   main="Plot of Thuesen Data"))
abline(lm.velo)
</highlight>

Now let's consider *residuals* and *fitted values*.  Two further extractor functions are the =fitted()= and =resid()= functions.

As Peter Daalgard states on page 100 - 101:
   * "The fitted function returns fitted values - the y-values that you would expect for the given x-values according to the best fitting straight line", which was found by =lm=. 
   * "The residuals shown by the =resid= function is the difference between this and the observed *short.velocity*".
<highlight>
fitted(lm.velo)
resid(lm.velo)
</highlight>

Notice, observation 16 has a missing *short.velocity*, so no fitted value or residual is calculated.  For ease of plotting, we will want to have calculated fitted values and residuals for all observations.  A easy way to handle missing values is by using the =na.exclude= method for missing value (i.e. =NA=) handling.  We do this using the =options= function:
<highlight>
options(na.action="na.exclude")
</highlight>

This overrides the default option of =na.action = "na.omit"=.  =na.omit= removes any incomplete cases from the data set before calculating the fitted values and residuals.  On the other hand, =na.exclude= inset =NA=s for cases omitted by =na.omit= in order to pad the residuals and predictions to the correct length.

In the future, you can determine the default of a particular option by using the =getOption()= function in a similar fashion to this:
<highlight>
# To see default na.action option:
getOption("na.action")
# [1] "na.omit"
</highlight>

Peter Dalgaard discusses more ways to handle missing values on page 101.

We now have to refit =lm.velo= after changing the =na.action= option, and if we look at the fitted values, we see that observation 16 has a missing fitted value (=NA=):
<highlight>
lm.velo<-lm(short.velocity ~ blood.glucose, data=thuesen)
fitted(lm.velo)
</highlight>

Now let's create a %BLUE% *plot where residuals are displayed by connecting observations to corresponding points on the fitted line* %ENDCOLOR%:
<highlight>
with(thuesen, plot(blood.glucose, short.velocity,
   xlab="Blood Glucose", ylab="Short Velocity",
   main="Plot of Thuesen Data"))
abline(lm.velo)
with(thuesen, segments(x0=blood.glucose, 
   y0=fitted(lm.velo),
   x1=blood.glucose, 
   y1=short.velocity))
</highlight>

We can also generate a simple %BLUE% *plot of residuals versus fitted values* %ENDCOLOR%:
<highlight>
plot(fitted(lm.velo), resid(lm.velo),
   xlab="Fitted Values from lm.velo",
   ylab="Residuals from lm.velo",
   main="Residuals versus Fitted values from lm.velo")
</highlight>

We can also get an indication of whether the residuals might have come from a normal distribution by checking for a straight line on a %BLUE% *Q-Q plot* %ENDCOLOR%:
<highlight>
qqnorm(resid(lm.velo))
qqline(resid(lm.velo))
</highlight>

Illustration and explanation of further *regression diagnostics* is given in section 10.8.  A basic set of %BLUE% *four regression diagnostic plots* %ENDCOLOR% is available via the plot method for the =lm= objects:
   1 residual versus fitted values
   2 Q-Q normal distribution plot of standardized residuals
   3 plot of the square root of the absolute value of the standardized residuals 
   4 plot of "Cook's distance", which is a measure of the influence of each observation on the regression coefficients
<highlight>
par(mfro=c(2,2))
plot(lm.velo)
# To return to the default of one graph:
par(mfrow=c(1,1))
</highlight>

The last thing I am going to demonstrate how to %BLUE% *plot the fitted line of your model including the prediction and confidence bands* %ENDCOLOR%.  There is a complete explanation of the procress to produce this plot in section 5.3.

As Peter Dalgaard explains on page 106:
   "First we create a new data frame in which the *blood.glucose* variable contains the values at which we want predictions to be made":
<highlight>
pred.frame<-data.frame(blood.glucose=4:20)
</highlight>

Next we generate two vectors which "contain the result of =predict= for the new data in =pred.frame=", specifically (1) the prediction limits, and (2) the confidence limits:
<highlight>
pp<-predict(lm.velo, int="p", newdata=pred.frame)
pc<-predict(lm.velo, int="c", newdata=pred.frame)
</highlight>

"Now for the plotting: first we create a standard scatterplot, except we ensure that it has enough room for the prediction limits", by incorporating the =ylim= argument.
<highlight>
with(thuesen, plot(blood.glucose, short.velocity,
   ylim=range(short.velocity, pp, na.rm=T),
   xlab="Blood glucose", ylab="Short Velocity",
   main="Plot with Confidence and Prediction Bands"))
</highlight>

"Finally the curves are added, using as c-values the *blood.glucose* used for the prediction, and setting the line types and colours to more sensible values."
<highlight>
matlines(pred.frame$blood.glucose, pc, lty=c(1,2,2), col="black")
matlines(pred.frame$blood.glucose, pp, lty=c(1,3,3), col="black")
</highlight>

We now probably want to set the =na.action= option back to ="na.omit"= instead of ="na.exclude"=":
<highlight>
options(na.action="na.omit")
</highlight>

---++ Acknowledgements:

I would like to thank Richard Urbano for posing this problem.
Topic revision: r5 - 14 Nov 2006, TheresaScott
Main
Department Home Page
Biostatistics Graduate Program
Vanderbilt University Medical Center
Biostatistics Webs
- Archive
- Main
- Sandbox
- System
Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback