Company
EPFL - Blue Brain Project (BBP)
Company Website
BlueBrainPjt
Where
Info
Full Time
Applications have closed
Site Reliability Engineer (W/M)

EPFL, the Swiss Federal Institute of Technology in Lausanne, is one of the most dynamic university campuses in Europe and ranks among the top 20 universities worldwide. The EPFL employs more than 6,000 people supporting the three main missions of the institutions: education, research and innovation. The EPFL campus offers an exceptional working environment at the heart of a community of more than 16,000 people, including over 12,000 students and 4,000 researchers from more than 120 different countries.

Site Reliability Engineer (W/M)

Your mission :

The aim of the EPFL Blue Brain Project (BBP), a Swiss brain research initiative founded and directed by Professor Henry Markram, is to establish simulation neuroscience as a complementary approach alongside experimental, theoretical and clinical neuroscience to understanding the brain, by building the world’s first biologically detailed digital reconstructions and simulations of the mouse brain.
We are now looking for a Site Reliability Engineer (SRE) to join our Core Services section, which delivers mission-critical IT systems to BBP’s scientists. These IT services include e.g. a 1200+ node HPC cluster, on-premises Openstack and K8s platforms, multi-petabyte parallel file system (Spectrum Scale), multi-petabyte NAS (NetApp) and numerous software platforms used e.g. for scientific software development, configuration management, CI/CD and infrastructure-as-code approaches.

Main duties and responsibilities include :

This SRE position would give you opportunities to challenge yourself by:

  • Ensuring successful, periodic upgrades and new product launches upon our IT infrastructure
  • Contributing to IT security e.g. by establishing industry best practices with regards to periodic patching and other, proactive IT security measures
  • Improving IT service reliability by implementing SRE best practices for availability, performance, emergency response and capacity planning
  • Developing monitoring, logging and metrics tools to embrace and minimize risks
  • Automating IT processes – in order to get rid of toil, technical debt and manual work – using modern software engineering practices

Your profile :

We expect you to have hands-on experience in the following areas:

  • Linux (e.g. RedHat, Ubuntu) in production server environments
  • Virtualized and containerized infrastructure
  • Network concepts (e.g. IP routing, DNS, VLANs)
  • Configuration & provisioning tools (e.g. Puppet)
  • Programming and scripting languages (e.g. Python, bash)
  • Source code management and CI/CD tools (e.g. Gitlab)

We count as advantage your possible experience with:

  • Operating physical server hardware and data centre infrastructure
  • Operating large-scale storage systems (e.g. NetApp, Spectrum Scale), filesystems
  • Operating cloud or container platforms (e.g. Openstack, Kubernetes)
  • Operating data centre networks built on Ethernet or InfiniBand
  • Operating HPC systems and software (e.g. Slurm, cluster managers)
  • Implementing & monitoring secure IT infrastructure
  • Stakeholder relationships, team leadership

Our desired candidate would have:

  • Bachelor or Master degree in Computer Science – or similar degree or working experience
  • Detail-oriented, cautious & professional working practices and attitude
  • Understanding of TCO, compliance and IT governance factors
  • Experience managing and completing IT projects
  • Interest to work in a collaborative and multi-cultural environment
  • Proven ability to work both independently and in team-based environments
  • Fluent communication in English (written and spoken)

We offer :

  • A world-recognized leader in simulation-based research in neuroscience using state-of-the-art HPC infrastructure
  • A dynamic, multidisciplinary, international and collaborative working environment committed to benefitting the global community
  • A modern working environment, based at the Biotech Campus in Geneva Sécheron

Please provide your CV and also a cover letter (in English) in PDF format.

Start date :As soon as possible

Term of employment :Fixed-term (CDD)

Work rate :100%

Duration :1 year, renewable

Remark :Only candidates who applied through EPFL website or our partner Jobup’s website will be considered. Files sent by agencies without a mandate will not be taken into account.