Requisition Id 8715
At the Oak Ridge National Laboratory (ORNL), we solve the most challenging problems of our time. Working side-by-side with the world’s best scientists and technology leaders, we address challenges on a global scale—from energy storage to the origins of the universe to life-saving drug discovery. Unique among the Department of Energy’s 17 national labs, ORNL has expertise across multiple disciplines ranging from fundamental discovery to deployed technologies. This depth and breadth of science provides candidates and colleagues a broad spectrum of growth opportunities and the ability to join a mission to positively impact the US economy, public health, and national security.
The National Center for Computational Sciences Division (NCCS) at ORNL has a deep legacy in High Performance Computing (HPC) operating leadership class systems, deploying the world’s first exascale system (Frontier) and largest parallel file system. These systems generate enormous volumes of data which, when properly analyzed, can shed new insights into systems operating at extreme scale. The NCCS is seeking a motivated and results-oriented individual who excels at solving challenging problems. Successful candidate(s) will work with a team and will possess the necessary technical skills to take on existing and new challenges, particularly in the area of large scale data analytics related to HPC, including privacy-preserving federated learning with leadership HPC systems and ORNL’s Interconnected Science Ecosystem (INTERSECT) Initiative which interconnects HPC, edge computing, data analysis, and experimental instruments to enable autonomous “self-driving” scientific experiments.
Prospective candidates will be part of a data intelligence initiative in the organization. In NCCS, machine data from our systems is an important asset to achieve efficiency in HPC system design and operations. Such data facilitates data-driven decision making in many stages in the lifecycle of modern supercomputers. In collaboration with NCCS staff, the prospective candidate is expected to leverage their expertise in data analytics, such as machine learning and artificial intelligence, to tackle various issues in HPC system design and operations as a data scientist. The candidate is also expected to design, develop and maintain efficient end-to-end data pipelines and components for large-scale machine data analytics to sustain such activities. Major related areas of activities are listed below but are not limited to:
Profile and analysis of data-intensive machine learning application workloads at extreme scale
Control parameter recommendations (automated) towards various HPC facility operations such as cooling, data placement, and resource scheduling
Analysis and prediction of various failure events found in many hardware software components deployed at scale
Develop efficient algorithms and practical techniques to improve data efficiencies on both existing and future systems
MS degree and 7+ years of relevant experience or Ph.D. and 2+ years of relevant experience. Degree areas should be in Computer Science, Statistics or closely related fields. An equivalent combination of education and experience will be considered.
Practical data engineering skills such as data mapping and transformation, data modeling, feature engineering, normalization, performance etrics and evaluation using various tools and languages.
Ability to code, debug and maintain necessary software components in the process.
Basic understanding of (1) statistical techniques and concepts (regression, properties of distributions, statistical tests and proper usage, etc.); (2) machine learning techniques (clustering, decision tree learning, artificial neural networks, etc.) and their real-world applications.
This position requires access to technology that is subject to export control requirements. Successful candidates must be qualified for such access without an export control license.
Ability to independently formulate problems, design & execute experiments starting hypothesis to analysis end-to-end.
Ability to work in a highly collaborative environment. Excellent communication skills, including verbal, presentation, and writing skills for effective interaction with technical peers.
Code development experience with several languages: C/C++, Go, Python.
Knowledge and experience in statistical and data mining techniques.
Experience with Deep Learning Neural Network building framework such as TensorFlow or PyTorch.
Experience in designing and maintaining automated machine learning data pipelines & workflows using systems like Airflow, TensorFlow TFX, Kubeflow & etc.
Experience in developing software components for real-time data processing, REST APIs, or data visualization.
Experience in developing and deploying applications on Kubernetes.
Experience with distributed data/computing systems: Spark, Kafka, ElasticSearch & etc.
Experience visualizing/presenting data for stakeholders using: D3, matplotlib, bokeh & etc.
Benefits at ORNL:
ORNL offers competitive pay and benefits programs to attract and retain talented people. The laboratory offers many employee benefits, including medical and retirement plans and flexible work hours, to help you and your family live happy and healthy. Employee amenities such as on-site fitness, banking, and cafeteria facilities are also provided for convenience. Other benefits include: Prescription Drug Plan, Dental Plan, Vision Plan, 401(k) Retirement Plan, Contributory Pension Plan, Life Insurance, Disability Benefits, Generous Vacation and Holidays, Parental Leave, Legal Insurance with Identity Theft Protection, Employee Assistance Plan, Flexible Spending Accounts, Health Savings Accounts, Wellness Programs, Educational Assistance, Relocation Assistance, and Employee Discounts.
This position will remain open for a minimum of 5 days after which it will close when a qualified candidate is identified and/or hired.
We accept Word (.doc, .docx), Adobe (unsecured .pdf), Rich Text Format (.rtf), and HTML (.htm, .html) up to 5MB in size. Resumes from third party vendors will not be accepted; these resumes will be deleted and the candidates submitted will not be considered for employment.
If you have trouble applying for a position, please email ORNLRecruiting@ornl.gov.
ORNL is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply. UT-Battelle is an E-Verify employer.