--- title: "R Exercises" author: "Cole Beck" output: pdf_document: number_sections: no html_document: number_sections: no --- ```{r setup,echo=FALSE} require(Hmisc) knitrSet(lang='markdown') ``` # Manipulating Vectors 1. Modify the following character vector to keep only street names, then sort and remove duplicates. ```{r} x <- c("120 Main St", "231 Walnut Grove", "374 Central Pk", "402 Providence Ln", "555 Central Pk") ``` 2. How could you sum all of the numbers between 1 and 1,000 that are evenly divisible by 3 or 5? What about numbers between 1 and 100,000 divisible by 4, 7, or 13? ```{r} # sum [1,10] divisible by 3 or 5 3 + 5 + 6 + 9 + 10 ``` # Writing Functions Celsius to Fahrenheit: $f(x) = (x*9/5) + 32$ Celsius to Kelvin: $f(x) = x + 273.15$ 1. Write a temperature conversion function. It should take a vector of temperatures, the `from` type, and the `to` type. ```{r, eval=FALSE} # test temp function with this data set.seed(20) x <- round(rnorm(30, 10, 10)) xf <- temp(x, from='C', to='F') xk <- temp(x, from='C', to='K') all.equal(temp(xf, from='F', 'K'), xk) ``` # Manipulating Data Frames 1. Read in the CSV file ```{r, echo = FALSE} "https://github.com/fonnesbeck/Bios6301/raw/master/datasets/haart.csv" ``` 2. Describe the data set ```{r} ``` 3. Create a categorical variable `gender`, using `male` ```{r} ``` 4. Convert `init.date` and `last.visit` into Date variables ```{r} ``` 5. Create the column `daysbetween` by calculating the number of days between visits ```{r} ``` 6. Subset the data where `age` is greater than 40 and `death` is zero. Only keep the following columns: gender, age, cd4baseline, weight, daysbetween ```{r} ``` 7. Reorder the data by `age` ```{r} ``` # Models ```{r} gender <- c('M','M','F','M','F','F','M','F','M') age <- c(34, 64, 38, 63, 40, 73, 27, 51, 47) smoker <- c('no','yes','no','no','yes','no','no','no','yes') exercise <- factor(c('moderate','frequent','some','some','moderate','none', 'none','moderate','moderate'), levels=c('none','some','moderate','frequent'), ordered=TRUE ) los <- c(4,8,1,10,6,3,9,4,8) x <- data.frame(gender, age, smoker, exercise, los) ``` 1. Create a linear model using `x`, estimating the association between `los` and all remaining variables ```{r} ``` 2. Create a new model, this time predicting `los` by `gender`; show the model summary ```{r} ``` 3. What is the estimate for the intercept? What is the estimate for gender? ```{r} ``` 4. Re-calculate the standard errors, by taking the square root of the diagonal of the variance-covariance matrix of the summary of the linear model ```{r} ``` 5. Predict `los` with the following new data set ```{r} newdat <- data.frame(gender=c('F','M','F')) ``` 6. Sum the square of the residuals of the model. Compare this to passing the model to the `deviance` function. ```{r} ``` 7. Create a subset of `x` by taking all records where `gender` is 'M' and assigning it to the variable `men`. Do the same for the variable `women`. ```{r} ``` 8. Call the `t.test` function, where the first argument is los for women and the second argument is los for men. Add the argument var.equal and set it to TRUE. Does this match the p-value computed in the model summary? ```{r} ``` # Generating Plots Given the `vlbw` data set, use `ggplot2` and `qplot` and create several plots. ```{r} require(ggplot2) getHdata(vlbw) ``` 1. Scatterplot of `gest` VS `bwt` ```{r} ``` 2. Scatterplot `gest` VS `bwt`, add color and shape using variable `sex` ```{r} ``` 3. Boxplot of `btw` by `sex` ```{r} ``` 4. Scatterplot of `gest` VS `bwt`, facet by `race` ```{r} ``` 5. Scatterplot of `gest` VS `bwt`, add regression line ```{r} ```