You are here: Vanderbilt Biostatistics Wiki>Main Web>ACCRE (revision 9)EditAttach

How To Run Parallel Programming

What is parallel programming?

Parallel programming is a programming technique, in which many instructions are carried out simultaneously. It operates on the principle that large problems can almost always be divided into smaller ones, which may be solved concurrently.

Usually a task is considered computational-intensive, if it takes a long time for a computer to finish. Parallel programmings are usually carried out on a computer cluster.

What is Embarrassing parallel?

Embarrassing parallel is a simple parallel programming method. This kind parallel programming still divides a large task into smaller tasks and sends these small tasks to nodes to process. In contrast to real parallel programming which nodes can do different types of tasks, nodes running embarrassing parallel program are running the exact same programs. Therefore not every job can be processed using embarrassing parallel. A job must be "dividable" to be able to run embarrassing parallel.

A task is considered dividable if each iteration of the task is independent, which means that iteration n is NOT dependent on the result of iteration n-1. Typical uses of embarrassing parallel are simulations and large data set processing.

What is a computer cluster?

A computer cluster is a group of computers that connected to each other through fast local area networks and work together closely so that in many respects they can be viewed as though they are a single computer. Clusters are usually deployed to improve performance. A computer cluster is usually much more cost-effective than a single computers of comparable computational power.

What is ACCRE?

ACCRE stands for Advanced Computing Center for Research & Education. ACCRE currently manages a Beowulf cluster computer system. A Beowulf is a design for high-performance parallel computing clusters on inexpensive personal computer hardware. ACCRE cluster is composed of over 1500 processors with four generations of hardware: Intel Xeon dual processor nodes, AMD Opteron X86 dual processors nodes, IBM Power PC dual processor blades, and dual core, dual processor AMD Opteron nodes.

What can ACCRE do?

ACCRE has enormous computing power. If you have a computational-intensive job, i.e., simulation and large dataset processing, Using ACCRE will dramatically shorten your processing waiting time.

How to use the ACCRE system?

  • Request an Account. On ACCRE's registration website, choose "option 1" link. Fill out the "Identification Information" form, choose "biostatistics group" as your group and enter "Frank Harrell" or "Yu Shyr" as your "account approving P.I".

  • Attend classes. All new users now are required to attend the ACCRE Cluster Computing Classes within the first two months of opening their accounts.

  1. Introduction to Unix/Linux -- Optional
  2. Introduction to the Compute Cluster -- Mandatory (within one month)
  3. Job Scheduler Details -- Mandatory (within two month)
  4. Compiling Programs -- Optional

  • Login, Linux users. Using ssh to login as:
            ssh user_name@vmplogin.accre.vanderbilt.edu
or using fish under Konqueror or Dolphin as:
            fish://user_name@vmplogin.accre.vanderbilt.edu
Windows users. Use SSH Secure File Transfer Client to access the ACCRE system. The program is free to download for Vanderbilt community with a valid vunetid and password.

How to run jobs on ACCRE?

To submit a job on ACCRE, one must create a PBS submission script, then submit this pbs script by running this command:
            qsub mypbs.pbs

A typical .pbs file looks like this:
#!/bin/sh
# Beginning of PBS batch script.
#PBS -M my.address@vanderbilt.edu
# Status/Progress EMails sent to "my.address@vanderbilt.edu"
#PBS -m bae
# Email generated at b)eginning, a)bort, and e)nd of jobs
#PBS -l nodes=4:ppn=2:x86
# Nodes required (#nodes:#processors per node:CPU type)
#PBS -l mem=2000mb
# Total job memory required (specify how many megabytes)
#PBS -l pmem=250mb
# Memory required per processor (specify how many megabytes)
#PBS -l walltime=00:05:00
# You must specify Wall Clock time (hh:mm:ss) [Maximum allowed 30 days = 720:00:00]
#PBS -o myjob.output
# Send job stdout to file "myjob.output"
#PBS -j oe
# Send (join) both stderr and stdout to "myjob.output"
echo "This is my first job submitted on the ACCRE cluster."
# Replace the above echo command with your executable program
# End of PBS batch script.

What is EPP program?

EPP (Embarrassing Parallel Program) is a program to help users run embarrassing parallel programs on the ACCRE system. For each job a user wants to run, he/she must prepare and submit a different pbs file. It is an obvious problem if the user wants to use a lot of nodes. For example, if you have a dataset to process, this dataset contains 20000 data entries. If it will take 10 hours to process on your workstation, it will only need one hours to do so when you use 10 nodes from the ACCRE system. In order to do that, the following are the steps you need to do:
  • Separate your original dataset into 10 smaller data files. Each contains 1/10 of the original data entries.
  • Prepare 10 .pbs file.
  • Modify your R code.
  • Submit 10 .pbs files.
  • Manually combine 10 result files into one file

EPP program will do all of above steps for you. The only thing you need to do is to prepare a .epp meta-data file.

What is a .epp file?

.epp is the meta-data file to run EPP program. The users use this file to tell the system how to run the program. The following is a template of .epp file:

##################################################
#How to set up these arguments?
#       This metadata file is essential to use EPP program.
#This file defines some important arguments for the EPP 
#program to submit parallel processing jobs.  
#
#       All the EPP program arguments are start with "#epp"
#and comments are prefix with "#".
#
#Mandatory arguments:
#"#epp NODE":       Number of CPUs you want to have to run you job. 
#               This will determine how many parallel jobs you
#               want to run.  A valid entry must be any number >=1
#
#"#epp WALLTIME":   The system must know how long you expect your jobs
#               to run.  This is an estimation.  The valid format for this entry is:
#               hh:mm:ss
#
#"#epp CPUTIME":   The system must know how much CPU time you need for
#               your jobs.  This is an estimation.  The valid format is:
#               hh:mm:ss.  rule of thumb: 
#               CPTTIME = NODE * WALLTIME
#
#"#epp PROG_TYPE":  The processing program you want to use to handle 
#               your jobs.  Currently R is the only program available. 
#               The entry is R.  In future other program will
#               be develop on demand.
#
#"#epp EXECPROG_x": Your program(s) to run the parallel jobs.  
#               x indicates executing order if you have multiple
#               program to run.  These program will run in numeric 
#               order of x.  FULL PATH IS REQUIRED FOR THIS ENTRY.
#               Example of some legal entries:
#               #epp EXECPROG_1 = "/home/liuz/myR/runr1.R
#               #epp EXECPROG_2 = "/home/liuz/myR/runr2.R
#                 :
#
#"#epp OUTPUT_x":   You output result files whose file name defined 
#               within the EXECPROG_x file.  PLEASE DO NOT ENTER FULL 
#               PATH IN BOTH YOUR EXE.  PROGRAM AND HERE!!!
#                   If you have multiple out put files, name them in 
#               number order as x.
#               Example of some legal entries:
#               #epp OUTPUT_1 = "out1.txt"
#               #epp OUTPUT_2 = "out2.txt"
#                  :
#
#Optional arguments:
#"#epp DATA????_x": These arguments define the properties of your source 
#               data, if you have any. x indicate multiple source data 
#               files
#    #epp DATA_x:         Describe the source data file name.  
#                         A FULL PATH IS NEEDED.
#    #epp DATATYPE_x:     type of processing.  
#                         legal entries: parallel/paral/p or 
#                         /sequential/seq/s
#    #epp DATAHEAD_x:     Describe the line of data in the file that 
#                         need to be skiped
#    #epp DELAY:          For simulation, if random number is needed, this delay will change
#                         the seed used to generate the random number, if system time is used
#                         as a seed
#An example of seting-up a .epp file:
#Task description:      
#       I have two datasets, mydat.dat and mydat2.dat need to process.  
#mydat.dat should be processed in every job request while mydat2.dat 
#should be divided into N pieces and each job should only run a a piece 
#of it. 
#       To process these data files I need to run two R scripts: myr1.R 
#and myr2.R in order.  
#       Within the myr1.R and myr2.R scripts, I specified that the results 
#will be storeed in two output files: out1.txt and out2.txt.
#       I want to use 2 CPUs tp process my job and I estimate that 21 hours 
#44 minutes and 55 seconds should be enough for the processing to finish.  
#Therefore I setup the following metadata file:
#
#       #epp NODE = 2               #2 CPUs is needed
#       #epp WALLTIME = 21:44:55    #estimated time needed for each 
#                                    job is 21h44m55s
#       #epp OUTPUT_1 = "out1.txt"  #output file names.  since have more 
#                                    than 1, specify x
#       #epp OUTPUT_2 = "out2.txt"
#       
#       describe mydat.dat.  FULL PATH IS NEEDED
#       This data is going to run in every node so type is "paral"
#       this data has 3 lines of heading that need to skip           
#       #epp DATA_1 = "/home/liuz/embarrass/mydat.dat"
#       #epp DATATYPE_1 = paral
#       #epp DATAHEAD_1 = 3
#
#       describe mydat2.dat.  FULL PATH IS NEEDED
#       This data is going to be divided and every job run a part of it, 
#       so type is "SEQ"
#       this data has 0 lines of heading that need to skip, so = 0
#       #epp DATA_2 = "/home/liuz/embarrass/mydat2.dat"
#       #epp DATATYPE_2 = seq
#       #epp DATAHEAD_2 = 0
#
#       my R script programs are as following.
#       FULL PATH IS NEEDED
#       #epp EXECPROG_1 = "/home/liuz/embarrass/myr1.R"
#       #epp EXECPROG_2 = "/home/liuz/embarrass/myr2.R"
#
#       I am going to use R to process my data.
#       #epp PROG_TYPE = R
##################################################

#JOB CONTROL ARGUMENTS
#epp NODE = 4            
#epp WALLTIME = 22:44:55
#epp CPUTIME = 90:00:00
#epp PROG_TYPE = R

#set up program related arguments
#this group all more than 1 entries
#epp DELAY = 1
#epp OUTPUT_1 = "dataout.txt"

#epp DATA_1 = "/home/liuz/embarrass/demo/dataSrc.txt"
#epp DATATYPE_1 = paral
#epp DATAHEAD_1 = 3

#epp EXECPROG_1 = "/home/liuz/embarrass/demo/meanAndSum.R"

EPP program commands

   * epp myEppFile.epp         #submit a job
   * epp myEppFile.epp -s      #status check of submitted jobs
   * epp myEppFile.epp -a      #cancel submitted jobs
   * epp myEppFile.epp -c      #clean up after job is finished

Important things about ACCRE and EPP program

  • Bash shell or C shell. When you login to ACCRE, the default shell program is C shell. To use EPP program, it's suggested to use bash shell instead. A user can use "chsh" command permanently change his default shell setting or enter command "bash" everytime he logs in.

  • Simulation problem. When a user runs simulation using EPP program, it is possible that the results from two or more nodes are identical. This is because the system starts the jobs at same time and, since most of the random number generation programs use system time as random number seeds, the nodes will have the same seeds, therefore produce the same results. There are two methods to solve this problem. The first solution is that the user provides a seed file for the program to read; The second solution is to specify "DELAY" in the .epp file.

ACCRE Annual Disclosure

Edit | Attach | Print version | History: r17 | r10 < r9 < r8 < r7 | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r9 - 21 Oct 2009, DianeKolb
 

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback