IGP 304 Analysis Project

Purpose

We are requiring an extra data analysis project for students who received an incomplete in the course to allow them to raise their grades. For this assignment, you will pose a research question, identify a suitable data set with which to examine this question, develop hypotheses, plan appropriate analyses, execute them, and report them in a paper. We will have two interim group meetings where you can briefly present your results and ask questions.

Project Description

Research question and data set

Your research question can be related to any topic you choose. In choosing your research question and variables to examine, make sure they are not the same or slight variations of exercises already presented in the course. Once you have thought of a few different research questions in which you are interested, find an appropriate dataset for analysis. An ideal dataset would be something related to or from your own research, but if this is not available you may find something on the internet. See the datasets page on the Biostatistics twiki for some ideas. The data set would have to be small enough and/or formatted appropriately for you to analyze it, but also large enough with enough different variables to demonstrate your knowledge. The dataset must not have already been analyzed in detail others.

Your research question and hypotheses should pertain to the relationship between three or more variables. If you choose a study with an observational research design, you should investigate the nature of the association between two variables -- that is, whether they seem to be causally linked, spuriously related (confounding factors), linked by an intervening variable, or related to different degrees depending on the value of other variables (interactive or conditional relationship). This means that you should include at least 3 variables in your analysis, either as part of a contingency tables or logistic regression analysis (for categorical variables) or a series of correlation or linear regression analyses (for interval scale variables). If you are able to find a data set from an experimental study, the dataset will need to include additional variables (e.g. not just a treatment group indicator and outcome) to analyze for this assignment.

When you believe you have found an appropriate dataset, email the dataset and a one-sentence description of each variable (a data dictionary) so that we can verify that it will be suitable for the project. Be certain that you follow the 10 Data Entry Commandments.

Project plan

At the first group meeting you will present a proposal or plan for your study. It should:

  • describe your research question(s)
  • indicate the data set you will use, and
  • describe your hypothesis(es) and the variables you intend to examine in evaluating your hypothesis(es)

Please arrange to meet with us or send email to discuss ideas if you are unsure about your plan

Analysis

Your analysis should be tailored to your research question, so the exact analysis instruction will vary from project to project. In general:
  • Your analysis should begin with a thorough description of each variable separately, including the appropriate measures of central tendency and dispersion and characterization of the distribution shape (verbally and graphically). Draw histograms, boxplots, dot pots, etc. using a statistical package such as R.
  • Your bivariate analyses should follow, complete with appropriate crosstabulations, percentages, and measures of association. If you are analyzing interval scale variables, then you should produce scatter plots that accurately display the variables on the x and y axes.
  • Pay attention to how measurements are coded, transformed, and used in the analysis.

Final paper

The final paper should be formatted like a research article suitable for publication in a scientific journal and include the sections:
  1. Abstract
    • A 250 word or less summary of the paper
  2. Introduction
    • Background information on the research question you are attempting to answer
  3. Methods
    • Focus on statistical methods
  4. Results
    • Written description of the results
    • Include appropriate tables and figures
  5. Conclusions
    • Any conclusions drawn from the results
  6. References
Because this is a statistics course, we will focus more on the methods and result sections; the introduction and conclusion sections should contain enough background information on your research question to put the results in the proper scientific context. Graphics should be high resolution.

Due dates

  • Collect or select an appropriate dataset: Monday, 6/9
    • Before starting your analysis, email your dataset and data dictionary so we can verify it will be suitable for this project
  • First group meeting: Monday, 6/23, 2:30-4:30, D2221 Medical Center North
    • Present project plan
    • Ask questions
    • Held in Biostat conference room
  • Second group meeting: Monday, 7/21, 2:30-4:30, D2221 Medical Center North
    • Present analysis (plots, tables)
    • Ask questions
    • Held in Biostat conference room
  • Project due by email: Friday, 8/15, end of the day
    • Write up analysis into a Final paper
    • Suggested length: 10-15 pages (double spaced)
Topic revision: r3 - 01 Jun 2008, FrankHarrell
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback