We have the privilege to be part of the VSC and have private nodes at VSC-5 (since 2022), VSC-4 (since 2020) and VSC-3 (since 2014), which is retired by 2022.
Access is primarily via SSH:
ssh to VSC
12
sshuser@vsc5.vsc.ac.at
sshuser@vsc4.vsc.ac.at
Please follow some connection instruction on the wiki which is similar to all other servers (e.g. SRVX1).
The VSC is only available from within the UNINET (VPN, ...). Authentication requires a mobile phone.
We have private nodes at our disposal and in order for you to use these you need to specify the correct account in the jobs you submit to the queueing system (SLURM). The correct information will be given to you in the registration email.
IMGW customizations in the shell
If you want you can use some shared shell scripts that provide information for users about the VSC system.
Load IMGW environment settings
12
# run the install script, that just appends to your PATH variable.
/gpfs/data/fs71386/imgw/install_imgw.sh
Please find the following commands available:
imgw-quota shows the current quota on VSC for both HOME and DATA
imgw-container singularity/apptainer container run script, see below
imgw-transfersh Transfer-sh service on wolke, easily share small files.
imgw-cpuinfo Show CPU information
Please find a shared folder in /gpfs/data/fs71386/imgw/shared and add data there that needs to be used by multiple people. Please make sure that things are removed again as soon as possible. Thanks.
Node Information VSC-5
There are usually two sockets per Node, which means 2 CPUs per Node.
VSC-5 Compute Node
1234
CPU model: AMD EPYC 7713 64-Core Processor
2 CPU, 64 physical cores per CPU, total 256 logical CPU units
512 GB Memory
We have access to 11 private Nodes of that kind. We also have access to 1 GPU node with Nvidia A100 accelerators. Find the partition information with:
VSC-5 Quality of Service
1234567
$ sqos
qos name type total res used res free res walltime priority total n* used n* free n*
================================================================================================================================
p71386_0512 cpu 2816 2816 0 10-00:00:00 100000 11 11 0
p71386_a100dual gpu 2 0 2 10-00:00:00 100000 1 0 1
* node values do not always align with resource values since nodes can be partially allocated
Storage on VSC-5
the HOME and DATA partition are the same as on VSC-4.
since Fall 2023 there has been a major update. JET and VSC-5 are holding hands now. Your files on JET are now accessible from VSC-5. e.g.
JET and VSC-5
12345
a directory on JET
/jetfs/home/[username]
can be found on VSC-5
/gpfs/jetfs/home/[username]
JETFS on VSC
Only from VSC5 you can access JETFS. Not the other way around.
You can use these directories as well for direct writing. The performance is higher on VSC-5 storage. This does not work on VSC-4.
Node Information VSC-4
VSC-4 Compute Node
1234
CPU model: Intel(R) Xeon(R) Platinum 8174 CPU @ 3.10GHz
2 CPU, 24 physical cores per CPU, total 96 logical CPU units
378 GB Memory
We have access to 5 private Nodes of that kind. We also have access to the jupyterhub on VSC. Check with
VSC-4 Quality of Service
1234567
$ sqos
qos name type total res used res free res walltime priority total n* used n* free n*
================================================================================================================================
p71386_0384 cpu 480 288 192 10-00:00:00 100000 5 3 2
skylake_0096_jupyter cpu 288 12 276 3-00:00:00 1000 3 1 2
* node values do not always align with resource values since nodes can be partially allocated
Storage on VSC-4
All quotas are shared between all IMGW/Project users:
$HOME (up to 100 GB, all home directories)
$DATA (up to 10 TB, backed up)
$BINFL (up to 1TB, fast scratch), will be retired
$BINFS (up to 2GB, SSD fast), will be retired
$TMPDIR (50% of main memory, deletes after job finishes)
/local (Compute Nodes, 480 GB SSD, deletes after Job finishes)
Check quotas running the following commands yourself, including your PROJECTID or use the imgw-quota command as from the imgw shell extensions
We have access to the Earth Observation Data Center EODC, where one can find primarily the following data sets:
Sentinel-1, 2, 3
Wegener Center GPS RO
These datasets can be found directly via /eodc/products/.
We are given a private data storage location (/eodc/private/uniwien), where we can store up to 22 TB on VSC-4. However, that might change in the future.
We have to use the following keywords to make sure that the correct partitions are used:
--partition=mem_xxxx (per email)
--qos=xxxxxx (see below)
--account=xxxxxx (see below)
The core hours will be charged to the specified account. If not specified, the default account will be used.
Put this in the Job file (e.g. VSC-5 Nodes)
VSC slurm example job
1 2 3 4 5 6 7 8 91011121314151617
#!/bin/bash##SBATCH -J TEST_JOB#SBATCH -N 2#SBATCH --ntasks-per-node=16#SBATCH --ntasks-per-core=1#SBATCH --mail-type=BEGIN # first have to state the type of event to occur#SBATCH --mail-user=<email@address.at> # and then your email address#SBATCH --partition=zen3_0512#SBATCH --qos=p71386_0512#SBATCH --account=p71386#SBATCH --time=<time># when srun is used, you need to set (Different from Jet):
<srun-l-N2-n32a.out>
# or
<mpirun-np32a.out>
-J job name
-N number of nodes requested (16 cores per node available)
-n, --ntasks= specifies the number of tasks to run,
--ntasks-per-node number of processes run in parallel on a single node
--ntasks-per-core number of tasks a single core should work on
srun is an alternative command to mpirun. It provides direct access to SLURM inherent variables and settings.
-l adds task-specific labels to the beginning of all output lines.
--mail-type sends an email at specific events. The SLURM doku lists the following valid mail-type values: "BEGIN, END, FAIL, REQUEUE, ALL (equivalent to BEGIN, END, FAIL and REQUEUE), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80 percent of time limit), and TIME_LIMIT_50 (reached 50 percent of time limit). Multiple type values may be specified in a comma separated list."cited from the SLURM doku
--mail-user sends an email to this address
slurm basic commands
1234
sbatchcheck.slrm# to submit the job
squeue-u`whoami`# to check the status of own jobs
scancelJOBID# for premature removal, where JOBID# is obtained from the previous command
Example of multiple simulations inside one job
Sample Job when for running multiple mpi jobs on a VSC-4 node.
Note: The “mem_per_task” should be set such that
mem_per_task * mytasks < mem_per_node - 2Gb
The approx 2Gb reduction in available memory is due to operating system stored in memory. For a standard node with 96 Gb of Memory this would be eg.:
moduleavail# lists the **available** Application-Software,# Compilers, Parallel-Environment, and Libraries
modulelist# shows currently loaded package of your session
moduleunload<xyz># unload a particular package <xyz> from your session
moduleload<xyz># load a particular package <xyz> into your session
will load the intel compiler suite and add variables to your environment.
Please do not forget to add the module load statements to your jobs.
It is possible to install user site packages into your .local/lib/python3.* directory:
installing python packages in your HOME
12
# installing a user site package
pipinstall--user[package]
Please remember that all HOME and DATA quotas will be shared Installing a lot of packages creates a lot of files!
Python importing user site packages
123
importsys,sitesys.path.append(site.site.getusersitepackages())# This will add the correct path.
Then you will be able to load all packages that are located in the user site.
Containers
We can use complex software that is contained in singularity containers (doc) and can be executed on VSC-4. Please consider using one of the following containers:
py3centos7anaconda3-2020-07-dev
located in the $DATA directory of IMGW: /gpfs/data/fs71386/imgw
How to use?
Currently there is only one container with a run script.
Bash
12345678
# The directory of the containers
/gpfs/data/fs71386/imgw/run.sh[arguments]# executing the python inside
/gpfs/data/fs71386/imgw/run.shpython
# or ipython
/gpfs/data/fs71386/imgw/run.shipython
# with other arguments
/gpfs/data/fs71386/imgw/run.shpythonanalyis.py
Understanding the container
In principle, a run script needs to do only 3 things:
load the module singularity
set SINGULARITY_BIND environment variable
execute the container with your arguments
It is necessary to set the SINGULARITY_BIND because the $HOME and $DATA or $BINFS path are no standard linux paths, therefore the container linux does not know about these and accessing files from within the container is not possible. In the future if you have problems with accessing other paths, adding them to the SINGULARITY_BIND might fix the issue.
In principe one can execute the container like this:
Bash
1 2 3 4 5 6 7 8 9101112131415
# check if the module is loaded
$moduleloadsingularity
# just run the container initiating the building runscript (running ipython):
$/gpfs/data/fs71386/imgw/py3centos7anaconda3-2020-07-dev.sif
Python3.8.3(default,Jul22020,16:21:59)
Type'copyright','credits'or'license'formoreinformation
IPython7.16.1--AnenhancedInteractivePython.Type'?'forhelp.
In[1]:
In[2]:%envDATA
Out[2]:'/gpfs/data/fs71386/USER'
In[3]:ls/gpfs/data/fs71386/USER
ls:cannotaccess/gpfs/data/fs71386/USER:Nosuchfileordirectory
# Please note here that the path is not available, because we did not use the SINGULARITY_BIND
which shows you some information on the container, e.g. Centos 7 is installed, python 3.8, and glibc 2.17.
But you can also check the applications inside
Execute commands inside a container
1234567
# List all executables inside the container
$py3centos7anaconda3-2020-07-dev.sifls/opt/view/bin
# or using conda for the environment
$py3centos7anaconda3-2020-07-dev.sifcondainfo
# for the package list
$py3centos7anaconda3-2020-07-dev.sifcondalist
Currently (6.2021) there is no development queue on VSC-4 and the support suggested to do the following:
Debuging on VSC-4
12345678
# Request resources from slurm (-N 1, a full Node)
$salloc-N1-pmem_0384--qosp71386_0384--no-shell
# Once the node is assigned / job is running# Check with
$squeue-u$USER# connect to the Node with ssh
$ssh[Node]# test and debug the model there.
otherwise you can access one of the *_devel queues/partitions and submit short test jobs to check your setup.