J E T
Research Cluster for Staff 🔒
Getting Started
Welcome to the HPC @IMG @UNIVIE and please follow these steps to become a productive member of our department and make good use of the computer resources. Efficiency is keen.
Steps:
- Getting Started
- Connect to Jet
- Load environment (libraries, compilers, interpreter, tools)
- Checkout Code, Program, Compile, Test
- Submit to compute nodes using slurm
System Information
Last Update: 16.07.2024
Node Setup:
- 2x Login Nodes (jet01, jet02)
- 7x Compute Nodes INTEL (jet03-09)
- 10x Compute Nodes AMD (jet10-jet19)
- 5x Storage Nodes
Example INTEL Node
Type | Detail |
---|---|
Product | ThinkSystem SR630 |
Processor | Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz |
Cores | 2 CPU, 20 physical cores per CPU, total 80 logical CPU units |
CPU Time | 350 kh |
Memory | 755 GB Total |
Memory/Core | 18.9 GB |
Network | 100 Gbit/s (Infiniband) |
Example AMD Node
Type | Detail |
---|---|
Product | ThinkSystem SR635 V3 |
Processor | AMD EPYC 9454P 48-Core Processor |
Cores | 1 CPU, 48 physical cores per CPU, total 96 logical CPU units |
CPU Time | 420 kh |
Memory | 1132 GB Total |
Memory/Core | 23.5 GB |
Network | 200 Gbit/s (Infiniband) |
Storage
All nodes are connected to a global file system (GPFS) with about 3.5 PB (~3500 TB) of storage. There is no need to copy files to the compute nodes, your HOME and SCRATCH directories will be available under the same path as on the login nodes.
Paths:
/jetfs/home/[username]
/jetfs/scratch/[username]
/jetfs/shared-data
Software
The typcial installation of a intel-cluster has the INTEL Compiler suite (intel-parallel-studio
) and the open source GNU Compilers installed. Based on these two different compilers (intel
, gnu
), there are usually two version of each scientific software.
Major Libraries:
- OpenMPI (3.1.6, 4.0.5)
- HDF5
- NetCDF (C, Fortran)
- ECCODES from ECMWF
- Math libraries e.g. intel-mkl, lapack,scalapack
- Interpreters: Python, Julia
- Tools: cdo, ncl, nco, ncview
These software libraries are usually handled by environment modules. Need another library 🯄 mail to IT
Currently installed modules
Bash | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|
Jupyterhub
The Jet Cluster serves a jupyterhub with a jupyterlab that launches on the JET cluster compute nodes and allows users to work directly on the cluster as well as submit jobs.
Steps:
- https://jupyter.wolke.img.univie.ac.at from within the VPN or UNI-Network.
- Login with your Jet Credentials
- Choose a job
- The jupyterlab will be launched and will be available to you until you log out or the walltime exceeds (depends on the job you lauch).
Please use the resources responsible. We trust that you apply a fair-share policy and collaborate with your colleagues.
There are several kernels available as modules and how to use other kernels can be found here:
User Quotas and Restrictions
Currently there are not restrictions on the duration or the resources you can request. On JET the nodes can be shared between jobs, whereas on VSC nodes are job exclusive. Please follow these rules of collaboration:
Jobs:
- Number of CPUs, keyword:
ntasks
e.g. 1 Node == 2x20 physcial cores - Memory, keyword:
mem
e.g. each Node up to 754 GB - Runtime, keyword:
time
e.g. try to split jobs into pieces.
Consider the following example:
You can use 1 node relatively easy for more than 3 days with your jobs running, but do not use all nodes an block them for all other users for 3 days. If you need multiple nodes, split the jobs into shorter runtimes. In general it is better to have more smaller jobs that are processed in a chain. Also try not to use too much resources that are not used.
Have a look at resources used in your jobs using the /usr/bin/time
command or look here.
Sample Job
Slurm example on JET | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Storage Limitations are set mainly to the HOME directory (default: 100 GB), but there are some general restrictions as well.
Login nodes
On the Login Nodes (jet01/jet02) processes can run without any queue. However, please make sure that other users are not affected to much when these nodes are used for processing.
On Jet02 the jupyterhub is running and on jet01 a vnc server can be launched using gui applications.
How to use a vnc server, go to VNC.
Network drives
Transfer of files between SRV and JET is not necessary. The file system is mounted on JET Nodes JET01/JET02 and vice versa. These mounted drives need to transfer the data via the network and latencies might be higher.
Mounted files systems | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 |
|
Slurm
The job manager is called slurm and is available on numerous other HPC systems in the EU. There are endless online documentations that can be asked for some guidance. Please have a look at the VSC tutorials or training courses.
There is some more information about how to use slurm:
- Summary
- a more advanced Slurm Tutorial on Gitlab (🔒 staff only)
- VSC Slurm introduction
- VSC SLURM presentation
- Slurm Quick Start Guide - Manual Page
Job efficiency reports
since 2024 there is a new feature that allows to check how well one's jobs ran and get information on the efficiency of the resources used. The report is available once the job has finished.
Job efficiency report | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Created: January 26, 2023