DataGrouping

Public Methods

  • DONE DataGrouping(DataGrouping) - Copy constructor.
  • DONE Add(string, string) - Adds a Column by name and sets group descriptor (defaults the group to 0).
  • DONE Add(string, string, int) - Adds a Column by name and sets the group descriptor and the group id.
  • DONE GetName(int) - Returns the Column's name by index.
  • DONE GetGroup(int) - Returns the Column's group by index.
  • DONE GetGroup(string) - Returns the Column's group by name.
  • DONE SetGroup(int, int) - Sets the Column's group by index.
  • DONE SetGroup(string, int) - Sets the Column's group by name.
  • DONE GetDescriptor(string) - Returns the Column's descriptor by name.
  • DONE GetDescriptor(int) - Returns the Column's descriptor by index.
  • DONE GetIndex(string) - Returnss the Column's index that matches the name.
  • DONE Remove(int) - Removes a Column from the object by index.
  • DONE GetUniqueDescriptors() - Returns a collection of unique group descriptors.
  • DONE GetUniqueGroups() - Returns a collection of unique group ids.
  • DONE GetColumnsForDescriptor(string) - Returns an ArrayList of all the Column names in that group.
  • DONE GenerateKGroups(int) - Generates the groupings to use for cross-validation. Returns an ArrayList of ArrayLists of column names to exclude for each run.

Properties

  • DONE NumCols - Gets the number of Columns.
  • DONE NumGroupIds - Gets the number of groups.
  • DONE NumGroupDescriptors - Gets the number of descriptors.

Static Methods

  • DONE MultiLoad(string, DataSet, InformationSet) - Loads one or more DataGroupings from a file and returns them in an ArrayList. There is some limited logic available.

Private Methods

  • DONE RebuildColumnIndices - Rebuilds the column name to index map.

Private Data Members

  • SortedList columnIndices - A list of column names mapped to column indices.
  • ArrayList columns - A collection of Columns.
  • class Column
    • int group - The group id.
    • string name - The column name
    • string descriptor - The grouping descriptor.

Exceptions

  • DONE Accessing a column that does not exist.
  • DONE Setting a group that is negative.

Remarks

  • A sorted list is being used for two reasons: (1) The information can be accessed by the group id (2) In the future, the ability to access the grouping can be done by order if needed.
  • Group descriptors will usually be similar. They are the text representation of the group.
    • Example 1: (Normal (1) vs Cancer (2))
      • Cancer
      • Normal
    • Example 2: (Normal (1) vs Squamous and Adeno (2))
      • Squamous
      • Adeno
      • Normal
    • Example 1: (Squamous (1) vs Adeno (2); Normal (0-ignore group))
      • Squamous
      • Adeno
      • Normal
  • There will be 3 items in this object, the group, the column name, and the group descriptor. The group will be a positive integer with 0 being the null/ignore group. The column name will be the column idenitifier that comes from the DataSet. The group descritper will be the group inoformation that comes from the data info.
    • Example
      • id: pat1
      • descriptor: cancer
      • group: 2

Topic revision: r20 - 07 Apr 2005, WillGray
 

This site is powered by FoswikiCopyright © 2013-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback