You are here:
Vanderbilt Biostatistics Wiki
>
Main Web
>
Seminars
>
RClinic
>
PlottingModelResults
(14 Nov 2006,
TheresaScott
)
(raw view)
E
dit
A
ttach
---+ Plotting linear model results Often times, you would like to generate graphics based on a model you fit in R. These plots may include a number of diagnostic plots, or just plotting the fitted line with prediction and confidence bands. This solution is a nearly exact excerpt from specific sections in the following book: * _Introductory Statistics with R_ by Peter Dalgaard (Springer, 2002). Specifically, I have excerpted paragraphs and have used and modified the R code from _Chapter 5: Regression and Correlation_ Section 1, 2, and 3 and _Chapter 10: Linear Models_ Section 8. All of the data sets used in Peter Dalgaard's book can be found in the =ISwR= add-on package, which is available for download/installation on the _CRAN_ website (http://cran.r-project.org/). For this solution, we will be using the =thuesen= built-in data set from the =ISwR= add-on package. According to the =ISwR= description, the =thuesen= data set "contains ventricular shortening velocity and blood glucose for type 1 diabetic patients." =thuesen= contains two columns: * =blood.glucose= - fasting blood glucose (mmol/l) (numeric) * =short.velocity= - mean circumferential shortening velocity (%/s) (numeric) In case some of you have problems installing the =ISwR= add-on package, I decided to generate the =thuesen= data set "from scratch" and proceed from there. <highlight> thuesen<-data.frame( blood.glucose=c(15.3, 10.8, 8.1, 19.5, 7.2, 5.3, 9.3, 11.1, 7.5, 12.2, 6.7, 5.2, 19.0, 15.1, 6.7, 8.6, 4.2, 10.3, 12.5, 16.1, 13.3, 4.9, 8.8, 9.5), short.velocity=c(1.76, 1.34, 1.27, 1.47, 1.27, 1.49, 1.31, 1.09, 1.18, 1.22, 1.25, 1.19, 1.95, 1.28, 1.52, NA, 1.12, 1.37, 1.19, 1.05, 1.32, 1.03, 1.12, 1.70)) </highlight> For those of you who are able to properly install the =ISwR= package, you can load the library and read in the =thuesen= data set as follows: <highlight> library(ISwR) data(thuesen) </highlight> Let's fit a simple linear regression model using the =thuesen= data. For convenience, let's assign the value(s) returned by the =lm= function under the name =lm.velo=: <highlight> lm.velo <- lm(short.velocity ~ blood.glucose, data = thuesen) lm.velo </highlight> As Peter Daalgard states on page 97: "The result of =lm= is a _model object_. This is a distinctive concept of the S language (of which R is a dialect). Where other statistical systems focus on generating printed output that can be controlled by setting options, you get instead the result of a model fit encapsulated in an object from which the desired quantities can be obtained using _extractor functions_. An =lm= object does in fact contain much more information than you see when it is printed." A basic extractor function is the =summary()= function: <highlight> summary(lm.velo) </highlight> Let's first %BLUE% *plot just the points and the fitted line of the model fit* %ENDCOLOR%: <highlight> with(thuesen, plot(blood.glucose, short.velocity, xlab="Blood Glucose", ylab="Short Velocity", main="Plot of Thuesen Data")) abline(lm.velo) </highlight> Now let's consider *residuals* and *fitted values*. Two further extractor functions are the =fitted()= and =resid()= functions. As Peter Daalgard states on page 100 - 101: * "The fitted function returns fitted values - the y-values that you would expect for the given x-values according to the best fitting straight line", which was found by =lm=. * "The residuals shown by the =resid= function is the difference between this and the observed *short.velocity*". <highlight> fitted(lm.velo) resid(lm.velo) </highlight> Notice, observation 16 has a missing *short.velocity*, so no fitted value or residual is calculated. For ease of plotting, we will want to have calculated fitted values and residuals for all observations. A easy way to handle missing values is by using the =na.exclude= method for missing value (i.e. =NA=) handling. We do this using the =options= function: <highlight> options(na.action="na.exclude") </highlight> This overrides the default option of =na.action = "na.omit"=. =na.omit= removes any incomplete cases from the data set before calculating the fitted values and residuals. On the other hand, =na.exclude= inset =NA=s for cases omitted by =na.omit= in order to pad the residuals and predictions to the correct length. In the future, you can determine the default of a particular option by using the =getOption()= function in a similar fashion to this: <highlight> # To see default na.action option: getOption("na.action") # [1] "na.omit" </highlight> Peter Dalgaard discusses more ways to handle missing values on page 101. We now have to refit =lm.velo= after changing the =na.action= option, and if we look at the fitted values, we see that observation 16 has a missing fitted value (=NA=): <highlight> lm.velo<-lm(short.velocity ~ blood.glucose, data=thuesen) fitted(lm.velo) </highlight> Now let's create a %BLUE% *plot where residuals are displayed by connecting observations to corresponding points on the fitted line* %ENDCOLOR%: <highlight> with(thuesen, plot(blood.glucose, short.velocity, xlab="Blood Glucose", ylab="Short Velocity", main="Plot of Thuesen Data")) abline(lm.velo) with(thuesen, segments(x0=blood.glucose, y0=fitted(lm.velo), x1=blood.glucose, y1=short.velocity)) </highlight> We can also generate a simple %BLUE% *plot of residuals versus fitted values* %ENDCOLOR%: <highlight> plot(fitted(lm.velo), resid(lm.velo), xlab="Fitted Values from lm.velo", ylab="Residuals from lm.velo", main="Residuals versus Fitted values from lm.velo") </highlight> We can also get an indication of whether the residuals might have come from a normal distribution by checking for a straight line on a %BLUE% *Q-Q plot* %ENDCOLOR%: <highlight> qqnorm(resid(lm.velo)) qqline(resid(lm.velo)) </highlight> Illustration and explanation of further *regression diagnostics* is given in section 10.8. A basic set of %BLUE% *four regression diagnostic plots* %ENDCOLOR% is available via the plot method for the =lm= objects: 1 residual versus fitted values 2 Q-Q normal distribution plot of standardized residuals 3 plot of the square root of the absolute value of the standardized residuals 4 plot of "Cook's distance", which is a measure of the influence of each observation on the regression coefficients <highlight> par(mfro=c(2,2)) plot(lm.velo) # To return to the default of one graph: par(mfrow=c(1,1)) </highlight> The last thing I am going to demonstrate how to %BLUE% *plot the fitted line of your model including the prediction and confidence bands* %ENDCOLOR%. There is a complete explanation of the procress to produce this plot in section 5.3. As Peter Dalgaard explains on page 106: "First we create a new data frame in which the *blood.glucose* variable contains the values at which we want predictions to be made": <highlight> pred.frame<-data.frame(blood.glucose=4:20) </highlight> Next we generate two vectors which "contain the result of =predict= for the new data in =pred.frame=", specifically (1) the prediction limits, and (2) the confidence limits: <highlight> pp<-predict(lm.velo, int="p", newdata=pred.frame) pc<-predict(lm.velo, int="c", newdata=pred.frame) </highlight> "Now for the plotting: first we create a standard scatterplot, except we ensure that it has enough room for the prediction limits", by incorporating the =ylim= argument. <highlight> with(thuesen, plot(blood.glucose, short.velocity, ylim=range(short.velocity, pp, na.rm=T), xlab="Blood glucose", ylab="Short Velocity", main="Plot with Confidence and Prediction Bands")) </highlight> "Finally the curves are added, using as c-values the *blood.glucose* used for the prediction, and setting the line types and colours to more sensible values." <highlight> matlines(pred.frame$blood.glucose, pc, lty=c(1,2,2), col="black") matlines(pred.frame$blood.glucose, pp, lty=c(1,3,3), col="black") </highlight> We now probably want to set the =na.action= option back to ="na.omit"= instead of ="na.exclude"=": <highlight> options(na.action="na.omit") </highlight> ---++ Acknowledgements: I would like to thank Richard Urbano for posing this problem.
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r5
<
r4
<
r3
<
r2
|
B
acklinks
|
V
iew topic
|
Edit
w
iki text
|
M
ore topic actions
Topic revision: r5 - 14 Nov 2006,
TheresaScott
Main
Department Home Page
Biostatistics Graduate Program
Vanderbilt University Medical Center
Main Web
Main Web Home
Search
Recent Changes
Changes
Topic list
Biostatistics Webs
Archive
Main
Sandbox
System
Register
|
Log In
Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki?
Send feedback