Company
Lawrence Berkeley National Laboratory
Company Website
Where
Info
Full Time
Closes: 19 July 2020
July 19, 2020
HPC Systems Cloud Engineer

HPC Systems Cloud Engineer – 90213

Organization: NE-NERSC

We are looking for system developers and engineers to join our team at the National Energy Research Scientific Computing Center (NERSC) to help architect, deploy, configure, maintain, and operate large scale, leading-edge High Performance Computing (HPC) systems of high complexity that  provide computational resources for scientists globally, including COVID-19 research. You will join a team to support some of the largest HPC systems in the world.  In this exciting role, you’ll regularly diagnose and resolve challenging problems in the configuration, tuning, and management of both compute and storage platforms to drive scientific research and progress. You will collaborate with a community of teams at NERSC, other laboratories, and vendors to develop innovative solutions that enable science as well as improve the state of HPC practice on an international stage.  Using your unique set of knowledge, skills, and experience, you will help research, evaluate, and develop new technologies to support NERSC’s mission of accelerating scientific discovery through high performance computing and data analysis into the exascale era.

NERSC is distinguished by its success in creating an environment that makes computational resources effective for scientific research. Currently, NERSC is on the cusp of deploying Perlmutter, a new GPU-accelerated pre-exascale system that will integrate new networking, CPU, and GPU technologies, and blend them with a cloud-enabled software stack. As a developer/engineer in the Computational Systems Group you will enjoy a great opportunity to help understand and shape how cloud technologies are deployed within supercomputers as we prepare to deploy the Perlmutter system in late 2020.

The Computational Systems Group at NERSC ensures our systems are reliable, secure, and provide a state-of-the-art scientific development environment with the tools needed by our diverse community of users. NERSC provides world-class supercomputing, high performance, scalable data systems and services to 7,000+ users worldwide who conduct ground-breaking research. We support the U.S. Department of Energy (DOE) Office of Science mission to deliver scientific discoveries that transform our understanding of nature, and advance the energy, economic, and national security of the United States.

What You Will Do:

• Lead or collaborate on systems programming projects to maintain and enhance system functionality, in areas such as large systems monitoring, systems and workload management and file systems and I/O subsystems.

• Participate in team-oriented agile development and management process for HPC systems.

• Develop and/or use tools to implement task automation on computational systems.

• Work independently and as part of the Computational Systems Group to diagnose and fix system problems, help analyze system issues and develop and implement workarounds and/or patches for software bugs.

• Install, test, maintain and manage the NERSC computational systems.

• Assist with technology evaluation of systems and system architecture to provide input for HPC system procurements and DOE technology roadmaps out past the next decade.

• Work with vendors to prioritize, develop and enhance their technologies in order to better meet the needs of our users.

• Be part of a team that provides 24×7 systems support.

In addition to the above, the Sr. HPC Systems Cloud Engineer will:

• Provide leadership and technical guidance to group members, and members of other groups at NERSC.

• Recommend and lead implementation and deployment efforts for system improvements that enhance reliability, stability, usability, performance and security.

• Identify and evaluate emerging HPC technologies and explore new features that would create new capabilities and enhance system performance and usability.

• Participate in working/user/advocacy groups and represent NERSC and its interests to the broader HPC community.

What is Required:

• B.S. in Computer Science, Computational Science or equivalent experience and/or 8 years of UNIX or Linux experience.

• 2 years of experience with systems programming or management of large-scale Linux-based systems in a high-performance computing, cloud computing, or hyper-scaler environment.

• Experience with developing and/or operating micro-services with Docker and Kubernetes.

• Experience with C and shell/PERL/Python systems programming as well as with processor, interconnect, and storage technologies for High Performance Computing systems.

• Experience with installation, configuration, monitoring, and tuning of workload management systems such as Slurm, PBSPro, or GridEngine.

• Familiarity with UNIX/Linux internals.

• Strong technical and collaboration skills to create and deploy innovative ways of allowing our diverse user base to effectively utilize the unique resources that NERSC provides.

In addition to the above, the Sr. HPC Systems Cloud Engineer will:

• 12 years of UNIX or Linux experience and 6 years of experience with the management of large-scale UNIX based systems in a HPC or WSC environment.

• Excellent systems programming skills and strong knowledge of UNIX/Linux internals.

• Demonstrated ability to successfully lead complex projects.

Desired Qualifications:

• Desirable qualifications include experience in one or more of the following:

– Power/energy efficiency

– Performance variability

– Workload managers

– Specialized networking (infiniband, high-speed networks) Lustre or other parallel file systems

Notes:

• This is a full-time career appointment, exempt (monthly paid) from overtime pay.

• This position will be hired at a level commensurate with the business needs, skills, knowledge, and abilities of the successful candidate.

• This position may be subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment.

• This position requires access to export control and security sensitive information. Therefore the selected incumbent for this position requires U.S. citizenship or U.S. permanent residency.

• Work will be primarily performed at Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA.

How To Apply

Apply directly online at http://50.73.55.13/counter.php?id=181334 and follow the on-line instructions to complete the application process.

Learn About Us:

Working at Berkeley Lab has many rewards including a competitive compensation program, excellent health and welfare programs, a retirement program that is second to none, and outstanding development opportunities.  To view information about the many rewards that are offered at Berkeley Lab- Click Here (https://hr.lbl.gov/).

Berkeley Lab (LBNL, http://www.lbl.gov/) addresses the world’s most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab’s scientific expertise has been recognized with 13 Nobel prizes. The University of California manages Berkeley Lab for the U.S. Department of Energy’s Office of Science.

Equal Employment Opportunity: Berkeley Lab is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, or protected veteran status. Berkeley Lab is in compliance with the Pay Transparency Nondiscrimination Provision under 41 CFR 60-1.4 (https://www.dol.gov/ofccp/PayTransparencyNondiscrimination.html).  Click here (https://www.dol.gov/ofccp/regs/compliance/posters/ofccpost.htm) to view the poster: “Equal Employment Opportunity is the Law”.