WfccmDataSubmissionGuidelines < Main < Vanderbilt Biostatistics Wiki

You are here: Vanderbilt Biostatistics Wiki>Main Web>Projects>MicroArrayMassSpec>WfccmDataSubmissionGuidelines (31 Aug 2004, JeremyRoberts)EditAttach

The following are the guidelines for submitting data for analysis. These guidelines are established to make the process better for both sides. Following the guidelines will help us be more efficient and help you to get your results sooner. Also, this will improve data quality and help with quality control.

Minor changes can be mailed to the main analyst, major changes must be done via resubmission and must follow these guidelines.

Data - Data gathered from patients.
- Each column of data must have a unique id.
- Each row must have a unique integer id.
- Each row must have a name/other identifier.

If any patients have multiple samples and the samples have to be combined, additional information must be provided. How the columns combine has to be provided. Column information has to be provided for the data as it will appear after the combination.

Data_info - Information that describes the data's columns.
- Must have column_id (unique reference to data).
- If the data will be combined, there must be a row for each combination. Duplicate values in the column will combine. When the combination happens, the duplicate value will become the column name for the combined data.
- Any Additional information needed for analysis groupings.

CombinedData_info - Information that describes the combined data's columns
- Must have column_id (unique reference to combined data set).
- Any Additional information needed for analysis groupings.

Limit additional data to information necessary/helpful for the analysis.
Each combination needs to have an information file.
Missing/null values may be indicated by a period or an empty value.

Please provide the following information:
- Any transformations (log 10, log 2) that have been done to the data.

Example 1:
- All patient have one sample location. No column combination is necessary.
- The columns cancerType has been included because the PI wants to determine protiens that differentiat between small vs large, normal vs small, normal vs large and normal vs large & small. Age and sex will also be used to determine groupings.

Example 2:
- Some patients have multiple samples. The PI wants to only look data by patient only. Therefore, the only information provided about the original data is how to combine it.
- The columns cancerType has been included because the PI wants to determine protiens that differentiat between small vs large, normal vs small, normal vs large and normal vs large & small. Age and sex will also be used to determine groupings.

Example 3:
- Some patients have multiple sample locations, and some of the sample locations have multiple samples. The PI wants to look at both the spots and the patients. Therefore data_info is provided with the method to combine to spot and patient. Spot_info is provided to create the groupings to for the spot analysis. Patient_info is provided to create the groupings for the patient analysis.
- Patient 3 is a cancer patient that had three samples taken from two locations. One location was cancer tissue and the other location was normal tissue. The cancer location had two samples taken.
- The columns tissueType and patCancer have been included because the PI wants to determine protiens that differentiat between cancer tissue vs normal tissue, small vs large, normal vs small, normal vs large and normal vs large & small.

-- JeremyRoberts - 08 Apr 2004
-- NimishGautam - 30 Aug 2004

Topic revision: r8 - 31 Aug 2004, JeremyRoberts

Main

Department Home Page

Biostatistics Graduate Program

Vanderbilt University Medical Center

Biostatistics Webs
- Archive
- Main
- Sandbox
- System

Copyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback