1 Introduction
We expect that the Swarm cluster will have many competing priorities for its use. This policy document is written in the hope of minimizing conflicts. Many of the policies will be automatic and enforced by software. We hope this will minimize the load on people. Requests should be sent to system@cs.umass.edu
2 Accounts
• Each researcher laboratory will be a group and will be assigned a Group ID (GID). Resources will be allocated according to groups.
• New users/groups will be added per user request -(contact system@cs.umass.edu). The request should come from the faculty member, or be OKed by a faculty member.
• The procedure for non-CS individuals (who are part of the research laboratories of senior researchers on the grant) may be more complicated and we will try to figure this out as we proceed.
• Home Directories: Each user will have 10GB enforced by quotas for home directories which will be backed up. Users are encouraged to store code/non-data files in home directories.
3 Jobs
Jobs will be run in batch mode using partitions (queues). There are 2 partitions:
• defq - Short Jobs: Each job (process on a core) is allowed to run for 12 hours. Jobs that run more than 12 hours will be automatically killed. Note that a user can submit many jobs - each of up to 12 hours duration. Thus by subdividing their work users can run many jobs.
• longq - Long Jobs: Long jobs are limited to 21 days.
4 Priorities
The order in which submitted jobs are assigned to nodes is based on a fairshare priority system. Each faculty member group will be allocated a fairshare value - ie a portion of cluster cpu time. Each user within that group is allocated a subset fairshare value. The fair-share factor serves to prioritize queued jobs such that those jobs charging accounts that are under-serviced are scheduled first, while jobs charging accounts that are over-serviced are scheduled when the machine would otherwise go idle
5 Disk Space
Each research group will be allocated a quota (usually 2Tb) on the work1 disk. While this space will not be backed up often it will not be overwritten. Temporary scratch space of the order of 50Tb is also available in /mnt/nfs/scratch1. This space will be overwritten periodically.
Backup: We recommend users backup their work both in the temporary scratch space as well as any other data they need. Since transfers between LGRC and the CS building are limited to 1Gbps for all clusters, we ask that large transfers are done by taking a USB disk to the cluster site in LGRC (please arrange with system@cs.umass.edu in advance).