---
title: "R Exercises"
author: "Cole Beck"
output:
  pdf_document:
    number_sections: no
  html_document:
    number_sections: no
---

```{r setup,echo=FALSE}
require(Hmisc)
knitrSet(lang='markdown')
```

# Manipulating Vectors

1. Modify the following character vector to keep only street names, then sort and remove duplicates.

```{r}
x <- c("120 Main St", "231 Walnut Grove", "374 Central Pk",
       "402 Providence Ln", "555 Central Pk")
```

2. How could you sum all of the numbers between 1 and 1,000 that are evenly divisible by 3 or 5?  What about numbers between 1 and 100,000 divisible by 4, 7, or 13?

```{r}
# sum [1,10] divisible by 3 or 5
3 + 5 + 6 + 9 + 10
```

# Writing Functions

Celsius to Fahrenheit: $f(x) = (x*9/5) + 32$

Celsius to Kelvin: $f(x) = x + 273.15$

1. Write a temperature conversion function.  It should take a vector of temperatures, the `from` type, and the `to` type.

```{r, eval=FALSE}
# test temp function with this data
set.seed(20)
x <- round(rnorm(30, 10, 10))
xf <- temp(x, from='C', to='F')
xk <- temp(x, from='C', to='K')
all.equal(temp(xf, from='F', 'K'), xk)
```

# Manipulating Data Frames

1. Read in the CSV file

```{r, echo = FALSE}
"https://github.com/fonnesbeck/Bios6301/raw/master/datasets/haart.csv"
```

2. Describe the data set

```{r}
```

3. Create a categorical variable `gender`, using `male`

```{r}
```

4. Convert `init.date` and `last.visit` into Date variables

```{r}
```

5. Create the column `daysbetween` by calculating the number of days between visits

```{r}
```

6. Subset the data where `age` is greater than 40 and `death` is zero.  Only keep the following columns: gender, age, cd4baseline, weight, daysbetween

```{r}
```

7. Reorder the data by `age`

```{r}
```

# Models

```{r}
gender <- c('M','M','F','M','F','F','M','F','M')
age <- c(34, 64, 38, 63, 40, 73, 27, 51, 47)
smoker <- c('no','yes','no','no','yes','no','no','no','yes')
exercise <- factor(c('moderate','frequent','some','some','moderate','none',
                     'none','moderate','moderate'),
                    levels=c('none','some','moderate','frequent'), ordered=TRUE
)
los <- c(4,8,1,10,6,3,9,4,8)
x <- data.frame(gender, age, smoker, exercise, los)
```

1. Create a linear model using `x`, estimating the association between `los` and all remaining variables

```{r}
```

2. Create a new model, this time predicting `los` by `gender`; show the model summary

```{r}
```

3. What is the estimate for the intercept?  What is the estimate for gender?

```{r}
```

4. Re-calculate the standard errors, by taking the square root of the diagonal of the variance-covariance matrix of the summary of the linear model

```{r}
```

5. Predict `los` with the following new data set

```{r}
newdat <- data.frame(gender=c('F','M','F'))
```

6. Sum the square of the residuals of the model.  Compare this to passing the model to the `deviance` function.

```{r}
```

7. Create a subset of `x` by taking all records where `gender` is 'M' and assigning it to the variable `men`. Do the same for the variable `women`.

```{r}
```

8. Call the `t.test` function, where the first argument is los for women and the second argument is los for men.  Add the argument var.equal and set it to TRUE.  Does this match the p-value computed in the model summary?

```{r}
```

# Generating Plots

Given the `vlbw` data set, use `ggplot2` and `qplot` and create several plots.

```{r}
require(ggplot2)
getHdata(vlbw)
```

1. Scatterplot of `gest` VS `bwt`

```{r}
```

2. Scatterplot `gest` VS `bwt`, add color and shape using variable `sex`

```{r}
```

3. Boxplot of `btw` by `sex`

```{r}
```

4. Scatterplot of `gest` VS `bwt`, facet by `race`

```{r}
```

5. Scatterplot of `gest` VS `bwt`, add regression line

```{r}
```