Large DataSet SubClass Pros and Cons


For large dataset capabilities, we might need to make the read-only MemoryMapDataSet a subclass of DataSet.


  • Polymorphism
    • DataSet ds = new MemoryMapDataSet("file.txt");


  • Inheritance
    • Methods that modify DataSet will need to be disabled?
    • Keep some of the same interface.
    • Will need to change parts of the interface.


Interface instead of SubClass -- IDataSet

    • List common methods between DataSet and MemoryMapDataSet
      • GetDataPoint(int, int)
      • NumCols
      • NumRows
      • Name
      • GetDataPoint(int, int)
      • GetColumnName(int)
      • GetRowName(int)
      • GetRowId(int)
      • GetColumnIndexFromName(string)
      • GetRowIndexFromId(int)
      • Log ?
        • DataSet.Log(int) -> Log(string, int)
        • MemoryMapDataSet.Log(string, int)
      • Average ?
        • DataSet.Average(DataGrouping [, bool]) -> add Average(string, DataGrouping [, bool])
        • MemoryMapDataSet.Average(string, DataGrouping [, bool])
    • MemoryMap implementation of DataSet
      • FileStream
      • StreamReader
      • Buffer
      • Other DataSet members for column names and row ids/names, etc.
    • Other TODO:
      • Read-ahead optimization to reduce FileStream.Seek() usage
      • Vary buffer size based on DataSet size (number of columns)
    • Explicit Interface Implementation (see MSDN tutorial)
      • Methods that can only be called with an interface reference to the object.
    • IDataSet ds = new DataSet("file.txt");
    • IDataSet ds = new MemoryMapDataSet("file.txt");


The Distance module uses DataSet.SetDataPoint to modify by the feature weights from the STD function. For this reason, Distance will still use DataSet. IDataSet.Filter will return a DataSet. After filtering, the DataSet size needs to be small enough to be held in memory.

Topic revision: r7 - 26 Apr 2005, WillGray

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback