Skip to content

J E T

Research Cluster for Staff 🚀 🔒

Getting Started

Welcome to the HPC @IMG @UNIVIE and please follow these steps to become a productive member of our department and make good use of the computer resources. Efficiency is keen.

Steps:

  1. Getting Started
  2. Connect to Jet
  3. Load environment (libraries, compilers, interpreter, tools)
  4. Checkout Code, Program, Compile, Test
  5. Submit to compute nodes using slurm

System Information

Last Update: 16.07.2024

Node Setup:

  • 2x Login Nodes (jet01, jet02)
  • 7x Compute Nodes INTEL (jet03-09)
  • 10x Compute Nodes AMD (jet10-jet19)
  • 5x Storage Nodes

GPFS

Example INTEL Node

Type Detail
Product ThinkSystem SR630
Processor Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Cores 2 CPU, 20 physical cores per CPU, total 80 logical CPU units
CPU Time 350 kh
Memory 755 GB Total
Memory/Core 18.9 GB
Network 100 Gbit/s (Infiniband)

Example AMD Node

Type Detail
Product ThinkSystem SR635 V3
Processor AMD EPYC 9454P 48-Core Processor
Cores 1 CPU, 48 physical cores per CPU, total 96 logical CPU units
CPU Time 420 kh
Memory 1132 GB Total
Memory/Core 23.5 GB
Network 200 Gbit/s (Infiniband)

Storage

All nodes are connected to a global file system (GPFS) with about 3.5 PB (~3500 TB) of storage. There is no need to copy files to the compute nodes, your HOME and SCRATCH directories will be available under the same path as on the login nodes.

Paths:

  • /jetfs/home/[username]
  • /jetfs/scratch/[username]
  • /jetfs/shared-data

Software

The typcial installation of a intel-cluster has the INTEL Compiler suite (intel-parallel-studio) and the open source GNU Compilers installed. Based on these two different compilers (intel, gnu), there are usually two version of each scientific software.

Major Libraries:

  • OpenMPI (3.1.6, 4.0.5)
  • HDF5
  • NetCDF (C, Fortran)
  • ECCODES from ECMWF
  • Math libraries e.g. intel-mkl, lapack,scalapack
  • Interpreters: Python, Julia
  • Tools: cdo, ncl, nco, ncview

These software libraries are usually handled by environment modules. Need another library 🯄 mail to IT

Currently installed modules

Bash
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
$ module av
--------- /jetfs/spack/share/spack/modules/linux-rhel8-skylake_avx512 ----------
anaconda3/2020.11-gcc-8.5.0-gf52svn                         
anaconda3/2021.05-gcc-8.5.0-gefwhbz                         
cdo/1.9.10-gcc-8.5.0-y4q2l2h                                
cdo/2.0.1-gcc-8.5.0-xgalz67                                 
eccodes/2.18.0-intel-20.0.2-6tadpgr                         
eccodes/2.19.1-gcc-8.5.0-74y7rih                            
eccodes/2.19.1-gcc-8.5.0-MPI3.1.6-q3prgpi                   
eccodes/2.21.0-gcc-8.5.0-lq54nls                            
eccodes/2.21.0-gcc-8.5.0-MPI3.1.6-uu4b62w                   
eccodes/2.21.0-intel-2021.4.0-cscplox                       
eccodes/2.21.0-intel-2021.4.0-xnc5g2f                       
gcc/8.5.0-gcc-8.5rhel8-7ka2e42                              
gcc/9.1.0-gcc-8.5rhel8-hmyhbce                              
geos/3.8.1-gcc-8.5.0-bymxoyq                                
geos/3.9.1-gcc-8.5.0-smhcud5                                
geos/3.9.1-intel-2021.4.0-wdqirxs                           
hdf5/1.10.7-gcc-8.5.0-MPI3.1.6-zia454a                      
hdf5/1.10.7-gcc-8.5.0-t247okg                               
hdf5/1.10.7-intel-2021.4.0-l6tbvga                          
hdf5/1.10.7-intel-2021.4.0-n7frjgz                          
hdf5/1.12.0-intel-20.0.2-ezeotzr                            
intel-mkl/2020.3.279-intel-20.0.2-m7bxged                   
intel-mkl/2020.4.304-intel-2021.4.0-mcf5ggn                 
intel-oneapi-compilers/2021.4.0-gcc-9.1.0-x5kx6di           
intel-oneapi-mkl/2021.4.0-intel-2021.4.0-d2aqurq            
intel-oneapi-mpi/2021.4.0-intel-2021.4.0-eoone6i            
intel-parallel-studio/composer.2020.2-intel-20.0.2-zuot22y  
libemos/4.5.9-gcc-8.5.0-MPI3.1.6-kcv3tlk                    
libemos/4.5.9-gcc-8.5.0-vgk5xbg                             
libemos/4.5.9-intel-2021.4.0-2q2qpc3                        
miniconda2/4.7.12.1-gcc-8.5.0-hkx7ovs                       
miniconda3/4.10.3-gcc-8.5.0-eyq4jvx                         
nco/4.9.3-intel-20.0.2-dhlqiyo                              
nco/5.0.1-gcc-8.5.0-oxngdn5                                 
ncview/2.1.8-gcc-8.5.0-c7tcblp                              
ncview/2.1.8-intel-20.0.2-3taqdda                           
netcdf-c/4.6.3-gcc-8.5.0-MPI3.1.6-2ggkkoh                   
netcdf-c/4.6.3-intel-2021.4.0-eaqh45b                       
netcdf-c/4.7.4-gcc-8.5.0-o7ahi5o                            
netcdf-c/4.7.4-intel-20.0.2-337uqtc                         
netcdf-c/4.7.4-intel-2021.4.0-vvk6sk5                       
netcdf-fortran/4.5.2-gcc-8.5.0-MPI3.1.6-needvux             
netcdf-fortran/4.5.2-intel-2021.4.0-6avm4dp                 
netcdf-fortran/4.5.3-gcc-8.5.0-3bqsedn                      
netcdf-fortran/4.5.3-intel-20.0.2-irdm5gq                   
netcdf-fortran/4.5.3-intel-2021.4.0-pii33is                 
netlib-lapack/3.9.1-gcc-8.5.0-ipqdnxj                       
netlib-scalapack/2.1.0-gcc-8.5.0-bukelua                    
netlib-scalapack/2.1.0-gcc-8.5.0-MPI3.1.6-rllmmt4           
openblas/0.3.18-gcc-8.5.0-zv6qss4                           
openmpi/3.1.6-gcc-8.5.0-ie6e7fw                             
openmpi/3.1.6-intel-20.0.2-ubasrpk                          
openmpi/4.0.5-gcc-8.5.0-ryfwodt                             
openmpi/4.0.5-intel-20.0.2-4wfaaz4                          
parallel-netcdf/1.12.1-intel-20.0.2-sgz3yqs                 
parallel-netcdf/1.12.2-gcc-8.5.0-MPI3.1.6-y4btiof           
parallel-netcdf/1.12.2-gcc-8.5.0-zwftkwr                    
parallel-netcdf/1.12.2-intel-2021.4.0-bykumdv               
perl/5.32.0-intel-20.0.2-2d23x7l                            
proj/7.1.0-gcc-8.5.0-k3kp5sb                                
proj/7.1.0-intel-2021.4.0-bub3jtf                           
proj/8.1.0-gcc-8.5.0-4ydzmxc                                
proj/8.1.0-intel-2021.4.0-omzgfdy                           
zlib/1.2.11-intel-20.0.2-3h374ov                            

------------- /jetfs/spack/share/spack/modules/linux-rhel8-haswell -------------
intel-parallel-studio/composer.2017.7-intel-17.0.7-disfj2g  

---------------------------- /jetfs/manual/modules -----------------------------
enstools/v2020.11  enstools/v2021.11  teleport/10.1.4  

--------- /opt/spack-jet01/share/spack/lmod/linux-rhel8-skylake_avx512 ---------
anaconda3/2020.11-gcc-8.3.1-bqubbbt  
on how to use environment modules go to Using Environment Modules

Jupyterhub

The Jet Cluster serves a jupyterhub with a jupyterlab that launches on the JET cluster compute nodes and allows users to work directly on the cluster as well as submit jobs.

Steps:

  • https://jupyter.wolke.img.univie.ac.at from within the VPN or UNI-Network.
  • Login with your Jet Credentials
  • Choose a job
  • The jupyterlab will be launched and will be available to you until you log out or the walltime exceeds (depends on the job you lauch).

Please use the resources responsible. We trust that you apply a fair-share policy and collaborate with your colleagues.

There are several kernels available as modules and how to use other kernels can be found here:

User Quotas and Restrictions

Currently there are not restrictions on the duration or the resources you can request. On JET the nodes can be shared between jobs, whereas on VSC nodes are job exclusive. Please follow these rules of collaboration:

Jobs:

  • Number of CPUs, keyword: ntasks e.g. 1 Node == 2x20 physcial cores
  • Memory, keyword: mem e.g. each Node up to 754 GB
  • Runtime, keyword: time e.g. try to split jobs into pieces.

Consider the following example:

You can use 1 node relatively easy for more than 3 days with your jobs running, but do not use all nodes an block them for all other users for 3 days. If you need multiple nodes, split the jobs into shorter runtimes. In general it is better to have more smaller jobs that are processed in a chain. Also try not to use too much resources that are not used.

Have a look at resources used in your jobs using the /usr/bin/time command or look here.

Sample Job

Slurm example on JET
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#!/bin/bash
# SLURM specific commands
#SBATCH --job-name=test-run
#SBATCH --output=test-run.log
#SBATCH --ntasks=1
#SBATCH --mem=1MB
#SBATCH --time=05:00
#SBATCH --mail-type=BEGIN    # first have to state the type of event to occur 
#SBATCH --mail-user=<email@address.at>   # and then your email address

# Your Code below here
module load miniconda3
# Execute the miniconda Python
# use /usr/bin/time -v [program]
# gives statistics on the resources the program uses
# nice for testing
/usr/bin/time -v python3 -v

Storage Limitations are set mainly to the HOME directory (default: 100 GB), but there are some general restrictions as well.

Login nodes

On the Login Nodes (jet01/jet02) processes can run without any queue. However, please make sure that other users are not affected to much when these nodes are used for processing.

On Jet02 the jupyterhub is running and on jet01 a vnc server can be launched using gui applications.

How to use a vnc server, go to VNC.

Network drives

Transfer of files between SRV and JET is not necessary. The file system is mounted on JET Nodes JET01/JET02 and vice versa. These mounted drives need to transfer the data via the network and latencies might be higher.

Mounted files systems
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
$ df -h 

131.130.157.5:/mnt/users            319T  300T   20T  95% /mnt/users
131.130.157.5:/mnt/scratch          400T  321T   80T  81% /mnt/scratch
131.130.157.5:/mnt/users/staff      319T  300T   20T  95% /srvfs/home
131.130.157.5:/mnt/users/scratch    319T  300T   20T  95% /srvfs/tmp
131.130.157.5:/mnt/users/data       319T  300T   20T  95% /srvfs/data
131.130.157.5:/mnt/scratch/scratch  400T  321T   80T  81% /srvfs/scratch
131.130.157.5:/mnt/scratch/shared   400T  321T   80T  81% /srvfs/shared
131.130.157.5:/mnt/scratch/webdata  400T  321T   80T  81% /srvfs/webdata
remjetfs                            3.6P  1.6P  2.0P  44% /jetfs

Slurm

The job manager is called slurm and is available on numerous other HPC systems in the EU. There are endless online documentations that can be asked for some guidance. Please have a look at the VSC tutorials or training courses.

There is some more information about how to use slurm:

Job efficiency reports

since 2024 there is a new feature that allows to check how well one's jobs ran and get information on the efficiency of the resources used. The report is available once the job has finished.

Job efficiency report
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# get a jobs efficiency report
seff [jobid]
# example showing only 3% memory and 45% cpu efficiency!
seff 2614735
Job ID: 2614735
Cluster: cluster
User/Group: /vscusers
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 30
CPU Utilized: 01:00:33
CPU Efficiency: 41.05% of 02:27:30 core-walltime
Job Wall-clock time: 00:04:55
Memory Utilized: 596.54 MB
Memory Efficiency: 2.91% of 20.00 GB

Last update: December 6, 2024
Created: January 26, 2023