HPC Systems Administration Specialist

Our Argonne Leadership Computing Facility (ALCF) division is seeking an experienced HPC System Administrator to join their team.

In this role you can expect to:

Participate in the design, development, and advancement of the high-performance computing (HPC) environment of the Argonne Leadership Computing Facility (ALCF)
Assist with implementation, management, and administration of experimental and production networks
Contribute ideas to the design and development of new systems and network tools
Focus of this position will be on continuous integrations technologies compatible with the environment.

Argonne is a multidisciplinary science and engineering research center, where “dream teams” of world- class researchers work alongside experts from industry, academia, and other government laboratories to address vital national challenges in clean energy, environment, technology, and national security.

Position Requirements

Required skills and qualifications:

Understanding of the software development process and best practices with a focus on Continuous Integration
Knowledge of Gitlab CI/CD, Gitlab Server and Gitlab Runner management
Knowledge and expertise in Linux systems administration
Expertise in installation, management, and use of software such as compilers, resource managers, and web applications
Knowledge and expertise in technologies such as TCP/IP and related protocols; networked file systems, including NFS
Knowledge of systems administration tools, and languages, such as Ruby, Python, C, and shell scripting
Expertise in configuration management technologies, particularly SaltStack
Problem solving skills
Ability to work effectively as a member of a team
Flexibility in handling assignments and working on several projects simultaneously
Knowledge and understanding of how to safely operate within a datacenter, including tasks such as mounting and unmounting server hardware
Preferred skills and qualifications:

Knowledge of parallel and distributed file systems such as Lustre and GPFS, and their associated hardware
Understanding of MPI, and implementations
Knowledge of HPC networking technologies such as Infiniband and Slingshot
Ability to gather site requirements and represent them to design and development teams to find appropriate solutions across multiple sites
Understanding of cloud authentication and integrations and IdP such as OAuth or SAML
Ability to collaborate with a distributed team leveraging communication tools such as email, Slack, Confluence, and video conferencing and expected 4-6 travel events per year
Ability to communicate and present ideas, status, design concepts, technical and user documentation, and decision analysis

This position can be hired at one of two levels, and the requirements for each are as follows:

PT2: Bachelor’s degree + 2 years of experience, or equivalent
PT3: Bachelor’s degree + 4 years of experience, or a Master’s degree + 2 years of experience, or equivalent