Resource Limit Update
Resource limit has been updated. Details here.
SCRP has seen tremendous growth since it came online in summer 2020—we started with 84 CPU cores, 576GB RAM and 12 consumer-grade GPUs, while as of January 2023 we have close to 600 CPU cores, 4.3TB RAM and an increasing number of datacenter GPUs. To keep up with the increasing computing power and address user needs, we will be implementing new resource limit over the next month.
Already Implemented
- The previous 32 CPU/user limit on the default
scrp
partition has been increased to 128. In its place is a limit of 1 node/job and 32 CPU/job. This change allows you to send jobs to the default partition even if you have a large job running on another partition. - The large partition now includes one RTX 3090 node. This change allows you to request a full RTX 3090 node, with all 64 CPU cores and four RTX 3090 GPUs.
- A maximum of two GPU/user and 32 CPU/user on the
a100
partition. - When there are multiple jobs in a queue, the following priority is observed:
- The time a job has been on the queue is the main determinant.
- Faculty, staff and research postgraduate students have a two-day priority over other user types.
- A small bonus that scales inversely with usage.
To be Implemented
- We will restrict the default partition to jobs no longer than 24 hours later in the month.
Once this new restriction comes into effect, you should send your long-running jobs to
either the
large
partition or thea100
partition. - We will add two A100 GPUs to the default partition to allow more users to access them and facilitate faster turnaround.
Below is the updated resource limit for each user category:
User Type | Logical CPUs | GPUs | Job Duration |
---|---|---|---|
Faculty and staff | 128 | 8 | 5/30 daysa |
Research postgraduate student | 128 | 4 | 5/30 daysa |
Taught postgraduate student and Senior undergraduate student |
16 | 1 | 5 days |
Undergraduate student | 4 | 1 | 1 day |
New Partition settings:
Partition | Nodes | Resource per Node | Limits |
---|---|---|---|
scrp* | scrp-node-[1-6,8,9] | CPU: 16-64 cores RAM: 64-512GB GPU: 0-4 |
Max. 1 node/job Max. 32 CPU/job |
large | scrp-node-[1-7] | CPU: 16-64 cores RAM: 64GB-1TB |
Max. 4 GPU/user |
a100 | scrp-node-10 | CPU: 64 cores RAM: 512GB GPU: A100 x 3 |
Max. 32 CPU/job Max. 2 GPU/user |