Company
University of Lausanne (Switzerland)
Company Website
Info
Full Time
Applications have closed
Linux HPC systems administrator

Linux HPC systems administrator

Expected start date in position: ASAP/to be agreed

Contract length: Unlimited (CDI)

Activity rate: 100 %

Workplace: University of Lausanne, Dorigny

Context:

This position is assigned to the central IT service of UNIL, and more specifically to the DCSR (Division de Calcul et Soutien à la Recherche) which is a team of 17 people dedicated to provide UNIL researchers with high-end infrastructures (HPC clusters, storage, virtual machines) and advanced consulting services in multiple fields like scientific computing (HPC, AI, image processing) or modern web and database technologies.

 

Being part of the DCSR, you will be mainly in charge of our HPC infrastructures which are currently:

  • one cluster dedicated to normal data processing about 100 nodes
  • another cluster dedicated to sensitive data processing, 16 nodes

Regarding hardware, both clusters use Spectrum Scale storage, AMD Epyc processors, Nvidia A100 GPU, Infiniband HDR and 100 Gbps Ethernet networks. From the software part, the clusters are running RHEL and are deployed using XCat and Ansible for post configuration.

 

Since a part of our activities is related to sensitive data, the design and the management of our infrastructures include state of the art security concerns.

 

Your activities:

  • Contribute to the configuration and maintenance of the HPC infrastructures and adapt it to the current needs of the researchers. Use DevOps and infrastructure as code approach to ensure maintainability
  • Deploy new scientific computing frameworks offered as a service (Open OnDemand or JupyterHub).
  • Interact with other teams of the IT service regarding network interconnection between DCSR infrastructures and global UNIL infrastructure
  • Contribute to the general Linux operations of the service (storage management, HPC software stack, IT security, virtual infrastructure).
  • Present and explain the DCSR HPC resources to different groups of users with different base skills (courses are organized by the DCSR several times per year).

Your profile:

  • Master’s degree in a relevant field such as computer science, computer information systems -or – equivalent combination of education, training, and experience.
  • Mastering scripting with Bash and another language like Python, Ruby or Perl
  • Mastering a version control system like Git
  • Good understanding of Linux TCP/IP networking stack
  • Experience with configuration management and system provisioning methods and tools (Ansible, Puppet, Kickstart, XCat, …)
  • Experience with virtualization and container technologies (KVM, docker, Singularity) would be an additional asset.
  • Good understanding of Linux storage technologies (LVM, RaidManager, ZFS, NFS, NAS)

The following skills would be an advantage:

  • Experience in managing cluster filesystems like Spectrum Scale, Lustre, BeeGFS or Ceph
  • Experience in managing distributed resource managers like LSF or Slurm
  • Experience with secure infrastructures (system and network)

As well as:

  • 5 years experience in management of large scale Linux infrastructures

The following hands on experiences  would be an advantage:

  • 2 years experience in management of cluster filesystems
  • 2 years experience in management of HPC schedulers

What’s Different About Us?

  • Big Impact – you will directly impact the results of a lot of researchers
  • A dynamic, inter-disciplinary, international, and caring working environment
  • An opportunity to get your hands dirty with new technologies as they emerge
  • Great colleagues that are eager for sharing their knowledge and their hobbies
  • Beautiful view to the lake for lunch time
  • Access to a wide range of sport infrastructures/clubs