High Performance Computer Cluster

What is ACCRE?

ACCRE stands for Advanced Computing Center for Research & Education. ACCRE currently manages a Beowulf cluster computer system. A Beowulf is a design for high-performance parallel computing clusters on inexpensive personal computer hardware. ACCRE cluster is composed of over 1500 processors with four generations of hardware: Intel Xeon dual processor nodes, AMD Opteron X86 dual processors nodes, IBM Power PC dual processor blades, and dual core, dual processor AMD Opteron nodes.

What is a computer cluster?

A computer cluster is a group of computers that connected to each other through fast local area networks and work together closely so that in many respects they can be viewed as though they are a single computer. Clusters are usually deployed to improve performance. A computer cluster is usually much more cost-effective than a single computers of comparable computational power.

What can ACCRE do?

ACCRE has enormous computing power. If you have a computational-intensive job, i.e., simulation and large dataset processing, Using ACCRE will dramatically shorten your processing waiting time.

How to use the ACCRE system?

See the ACCRE Getting Started pages for information on using the cluster.

  • Request an Account. On ACCRE's registration website, choose "option 1" link. Fill out the "Identification Information" form, choose "biostatistics group" as your group and enter "Frank Harrell" or "Yu Shyr" as your "account approving P.I".

  • Attend classes. All new users now are required to attend the ACCRE Cluster Computing Classes within the first two months of opening their accounts.

What is parallel programming?

Parallel programming is a programming technique, in which many instructions are carried out simultaneously. It operates on the principle that large problems can almost always be divided into smaller ones, which may be solved concurrently.

Usually a task is considered computational-intensive, if it takes a long time for a computer to finish. Parallel programmings are usually carried out on a computer cluster.

Allocations and quotas for Biostatistics

Hostname for login is vmplogin.accre.vanderbilt.edu

SLURM documentation

ACCRE Cluster Reports

Two commands to show limits and allocations:
$ showLimits -g biostat_account

$ sacctmgr show account name=biostat_faculty,biostat_gpu,biostat_it,biostat_other,h_biostat_kang,h_biostat_student,h_biostat_student_mic,biostat_account WithAssoc format=account%23,user,fairshare,grpcpus,maxcpus,GrpMem,GrpWall,GrpCPUMins

As of 10/3/2018 our group account’s fairshare (Share) is 152CPU cores and it is allowed to burst to a maximum of 988 CPU cores and 9880 GB of RAM (across all currently running jobs within the group). Other limits (MaxCPUs, GrpWall, GrpCPUMins) are not set for this account, meaning the account will not be limited by the maximum number of CPU cores in a single job (MaxCPUs), the total amount of CPU time consumed by a group over time (GrpWall), or the wall time aggregated across all currently running jobs within the group (GrpCPUMins).

Report on Fairshare usage:
$ sreport cluster AccountUtilizationByUser start=2017-07-01 Accounts=biostat_account Tree -t hourper
This report shows CPU usage (in hours) within biostat_account since January 1, 2017. TRES=Trackable resource.

Disk space:

These two commands return the same information (the latter with greater detail):
$ accre_storage
$ mmlsquota --block-size auto
gpfs20, gpfs21, and gpfs22 refer to quotas and usage on your /data, /scratch and /home disk usage, respectively.

See also Checking Fairshare Usage

What is "embarrassingly parallel"?

Embarrassingly parallel is a simple parallel programming method. This kind parallel programming still divides a large task into smaller tasks and sends these small tasks to nodes to process. In contrast to real parallel programming which nodes can do different types of tasks, nodes running embarrassingly parallel program are running the exact same programs. Therefore not every job can be processed using embarrassingly parallel. A job must be "dividable" to be able to run embarrassingly parallel.

A task is considered dividable if each iteration of the task is independent, which means that iteration n is NOT dependent on the result of iteration n-1. Typical uses of embarrassing parallel are simulations and large data set processing.

How do I use makeCluster() / the doSNOW package in my R programs on ACCRE?

Note on terminology: a node on ACCRE is not the same thing as a node in makeCluster()!

If you ask for a complete ACCRE node in your .pbs script (as of 5 Nov 2014, this would be nodes=1:ppn=8 or nodes=1:ppn=12, because all nodes on ACCRE have either 8 or 12 processors), you can use up to two makeCluster "nodes" per ACCRE processor (so as of 5 Nov 2014, this would be makeCluster(16) or makeCluster(24) depending on whether you specified ppn=8 or ppn=12). If you ask for anything less than a complete ACCRE node in your .pbs script (e.g. nodes=1:ppn=3), the number in your call to makeCluster() should be the same as the number of ACCRE processors you requested (here, makeCluster(3)) to prevent your job from "stealing" processors from other jobs on your ACCRE node. (Information given to Laurie Samuels by Charles Johnson, 5 Nov 2014.)
Topic revision: r16 - 03 Oct 2018, DalePlummer

This site is powered by FoswikiCopyright © 2013-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback