The Technology Integration Group at Oak Ridge National Laboratory is seeking a highly motivated HPC Systems Engineer with advanced experience, to develop and deploy capabilities for next-generation leadership computing systems.
This position is part of the Advanced Technologies Section within the National Center for Computational Sciences (NCCS) Division. The Advanced Technologies Section offers scientific, technical, operational, and thought leadership by developing, hardening, and deploying solutions for compute and data intensive computing environments.
The NCCS provides state-of-the-art computational and data science infrastructure, coupled with dedicated technical and scientific professionals, to accelerate scientific discovery and engineering advances across a broad range of disciplines. NCCS hosts the Oak Ridge Leadership Computing Facility, one of DOE’s National User Facilities, and home to the Frontier supercomputer.
This position will be responsible for planning and managing major technical projects or functions, such as problem analysis and designing a solution as well as the implementation, validation, and deployment of new capabilities for existing and future NCCS programs and projects.
- Responsible for planning and managing major technical projects or functions
- Plan and lead the implementation, validation, and deployment of successful solutions for complex technical problems
- Collaborate with a diverse team of system architecture researchers and developers within NCCS and across DOE Labs, partner universities and the community
- Support researchers in systems architecture, file and storage systems, and edge computing research fields
- M.S. in computer science or a related field and 10+ years of experience or equivalent or B.S. in one of these fields and 12+ years of experience.
- Experience with low-level and/or Linux kernel systems programming
- 5+ years of software development experience in C, and Python.
- Low-level systems programming experience in C/C++
- Experience with four or more of the following:
- Linux kernel programming
- Kernel bypass programming (e.g., direct access of hardware such as HPC NICs)
- Debugging and profiling
- Communication libraries (e.g., BSD sockets, libfabric, UCX, Portals)
- Concurrent programming via multi-threading
- Concurrent programming via event-driven interfaces
- Distributed programming (e.g., MPI, PGAS/SHMEM)
- Accelerator offload programming (e.g., GPU)
- Experience writing system software in a Linux POSIX programming environment.
- Advanced understanding of high-performance networking or distributed computing concepts.
- Proficient analytical and problem-solving skills to contribute creative solutions to problems.
- Proficient verbal and written communication skills.
- Proficient interpersonal skills necessary to work effectively with system administrators and system programmers, external vendors
- Ability to represent the NCCS and ORNL in public forums such as open source projects and technical conferences
ORNL Ethics and Conduct:
As a member of the ORNL scientific community, you will be expected to commit to ORNL’s Research Code of Conduct. Our full code of conduct, and a statement by the Lab Director’s office can be found here: https://www.ornl.gov/content/research-integrity
This position will remain open for a minimum of 5 days after which it will close when a qualified candidate is identified and/or hired.
We accept Word (.doc, .docx), Adobe (unsecured .pdf), Rich Text Format (.rtf), and HTML (.htm, .html) up to 5MB in size. Resumes from third party vendors will not be accepted; these resumes will be deleted and the candidates submitted will not be considered for employment.
If you have trouble applying for a position, please email ORNLRecruiting@ornl.gov.
ORNL is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply. UT-Battelle is an E-Verify employer.