GWAsimulator: A rapid whole genome simulation program

GWAsimulator is a C++ program that can simulate genotype data for SNP chips that are used in genome-wide association (GWA) studies. It implements a rapid moving-window algorithm (Durrant et al. 2004. AJHG 75:35-43) to simulate whole genome case-control or population samples. It also can simulate specific regions if desired. For case-control data, the program retrospectively sample cases and controls according to a user-specified multi-locus disease model. The program requires phased data as input, and the simulated data will have similar LD patterns as the input data.

The program can use HapMap phased data as input and has the flexibility of simulating genotypes for different populations and different SNP chips. Because many large-scale GWA data are becoming available, they can be used instead of the HapMap data as the input, as long as the phase information is generated. These data may provide a better representation of the population under study and more accurate LD information than the HapMap due to much larger sample sizes.

The current version is 1.1. See the manual for instructions and detailed description of the program.

Paper: Li C, Li M (2007) GWAsimulator: A rapid whole genome simulation program. Bioinformatics (in press). link or Preprint.

Linux package: GWAsimulator_v1.1_linux.tar.gz

Windows package: GWAsimulator_v1.1_windows.zip (The executable file is standalone. However, the feature of output file compression relies on an external program, gzip, that may not be available in many Windows systems. Look at the file README.txt for details.)

Mac OS X package: GWAsimulator_v1.1_mac.tar.gz (The executable file is a universal program for x86/ppc/ppc64.)

Other OS: Download any package and recompile.

Each package includes:
  1. Program source code (GWAsimulator.cpp)
  2. Manual (GWAsimulator_v1.1.pdf)
  3. Compilation shortcut program (build)
  4. Example control file (control.dat)
  5. Example data analysis program (dataanalysis.cpp)
  6. A subdirectory of example input phased data files
  7. A pre-compiled executable file
  8. Version history
The Windows package has a few more files related to gzip.

Note: Although the executable file may be ready to use, you can always recompile the program. Recompilation often can take advantage of the latest compiler technology and can optimize the program to your local hardware/software configurations. Depending on the version of g++, the program may need a single line modification to compile. See the manual for details.

Supplementary materials: (1) Supplementary Figure 1: Comparison with the HapMap data using LDU maps. (2) Result comparison with the program hapgen.

HapMap CEU phased data for Illumina HumanHap300 and HumanHap550. The program requires that disease loci are known and included in the input phased data. If you want to specify disease loci that are not on the chip, you need to generate input files yourself. See the manual for details.

Please send your comments and suggestions to chun.li@vanderbilt.edu.

Edit | Attach | Print version | History: r31 | r23 < r22 < r21 < r20 | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r22 - 19 Nov 2007, ChunLi
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback