Data Visualization ======================================================== ### with [Meridith Blevins](http://biostat.mc.vanderbilt.edu/MeridithBlevins) ### for Vanderbilt MSCI program ### 22 September 2016 This is an R Markdown document. Markdown is a simple formatting syntax for authoring web pages (click the **Help** toolbar button for more details on using R Markdown). When you click the **Knit HTML** button in RStudio a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this: ```{r echo=TRUE, results='hide', message=FALSE, warning=FALSE} ## Install Packages first using the menu options or running: install.packages("Hmisc") library(Hmisc) library(UsingR) library(lattice) ``` Load babies dataset which contains a collection of variables taken for each new mother in a Child and Health Development Study. ```{r} data(babies) names(babies) ``` You can also embed plots, for example: ```{r fig.width=7, fig.height=6} # FIRST REMOVE 999 AS THESE ARE INDICATIONS OF MISSING DATA AND THEY SHOULD NOT BE PLOTTED babies$wt[babies$wt==999] <- NA babies$gestation[babies$gestation==999] <- NA plot(wt ~ gestation, data=babies, subset=wt1 < 800 & dwt < 800) ``` Graphical Display Options =================================== Scale -------------------------- ```{r fig.width=7, fig.height=10} # 3 figures arranged in 3 rows and 1 column par(mfrow=c(3,1)) hist(babies$gestation,breaks=10,main="") hist(babies$gestation,breaks=50,main="") hist(babies$gestation,breaks=200,main="") ``` Graphical Choices =================================== ```{r fig.width=7, fig.height=6} pie(table(babies$smoke)) ``` ```{r fig.width=7, fig.height=6} barchart(table(babies$smoke)) ``` Graphical Overlay =================================== ```{r fig.width=7, fig.height=6} # Add boxplots to a scatterplot par(fig=c(0,0.8,0,0.8)) plot(babies$gestation, babies$wt, xlab="Getational Age (days)", ylab="Weight (ounces)") lines(lowess(babies$gestation,babies$wt,delta=2), col = 2) abline(lm(babies$wt~babies$gestation), col = 3) par(fig=c(0,0.8,0.55,1), new=TRUE) boxplot(babies$gestation, horizontal=TRUE, axes=FALSE) par(fig=c(0.65,1,0,0.8),new=TRUE) boxplot(babies$wt, axes=FALSE) ``` Fourfold Plots and Simpson's Paradox ====================================== Four quadrants containing 4 slices of pie with the area corresponding to the number of people in each cell of a 2x2 table. This is aggregate data on applicants to graduate school at Berkeley for the six largest departments in 1973 classified by admission and sex. Fewer females were admitted than males; such that the odds of male admission was 1.83 time higher. ```{r fig.width=7, fig.height=6} ## THIS EXAMPLE CAME FROM http://www.math.yorku.ca/SCS/friendly.html#4fold x1 <- apply(UCBAdmissions,c(2,1),sum) fourfoldplot(x1) ``` Department A admitted more females than males and every other department had no bias. Then the conclusion might change to females apply more to departments with higher rejection rates? ```{r fig.width=7, fig.height=6} data(UCBAdmissions) x <- aperm(UCBAdmissions,c(2,1,3)) dimnames(x)[[2]] <- c("Yes","No") names(dimnames(x)) <- c("Sex","Admit?","Department") fourfoldplot(x) ``` Mosaic Plots and Simpson's Paradox ====================================== Using the same example, we create Mosaic plots: ```{r fig.width=7, fig.height=6} mosaicplot(x1,main="Student admissions at UC Berkeley") ``` ```{r fig.width=10, fig.height=6} ## oma = A vector of the form c(bottom, left, top, right) giving the size of the outer margins in lines of text. ## mar = A numerical vector of the form c(bottom, left, top, right) which gives the number of lines of margin to be specified on the four sides of the plot. The default is c(5, 4, 4, 2) + 0.1. par(mfrow=c(2,3),oma=c(0,0,2,0),mar=c(2,2,2,0)) for(i in 1:6){ mosaicplot(UCBAdmissions[,,i],main=paste("Department",LETTERS[i])) } ```