Company
Info
Full Time
Applications have closed
Computational Scientist (ML and HPC) for Destination Earth (ref. VN24-31)

We are searching for a highly motivated Computational Scientist in Machine Learning (ML) and High Performance Computing (HPC) to join the HPC Applications team at ECMWF. In this role, you will have a particular focus on benchmarking and optimising machine learning (ML) models within the Destination Earth (DestinE) intiative so that they run efficiently across a variety of pre-Exascale EuroHPC systems such as LUMI, Leonardo, MareNostrum 5 as well as ECMWF’s own in-house HPC system. Candidates are encouraged to apply even if they only have experience in one of the two areas (e.g., HPC or ML) and are willing to learn the other.

At ECMWF, you will find a passionate community, collectively aiming to build world-leading global Earth system models for numerical weather prediction and climate simulations. ECMWF has been the first operational weather centre to publish results of its own global machine-learned weather model – the Artificial Intelligence Forecasting System (AIFS) –and the latest predictions are continuously updated on the web . Within DestinE, ECMWF will develop and deploy workflows for training and running machine-learned Earth system components of a European foundation model based on existing DestinE traditional simulation and modelling results.

The successful applicant’s work with the existing HPC Applications, DestinE and ML teams, will not only contribute to improving the computational performance and scalability of ML models on the world’s largest supercomputers but also enable ECMWF to bring innovative ML approaches into the DestinE workflow that will also be considered for use in our operational weather predictions. This effort supports ECMWF’s strategy of producing cutting‐edge science and world-leading weather predictions and monitoring of the Earth system.

The HPC Applications team is responsible with making sure that applications such as the ECMWF IFS weather model, and from now on, also the AIFS machine learned model, run efficiently on internal and external HPC systems and that they are able to scale across the world’s largest supercomputers. The team is also responsible with developing benchmarks for procuring ECMWF’s in-house HPC systems used for operational weather forecasting and to that end, this post will play a pivotal role supporting the development AI/ML benchmarks for the next procurement cycle.

About ECMWF
The European Centre for Medium-Range Weather Forecasts (ECMWF) is a world-leader in weather and environmental forecasting. As an international organisation we serve our members and the wider community with global weather predictions and data that is critical for understanding and solving the climate crisis. We function as a 24/7 research and operational centre with a focus on medium and long-range predictions, holding one of the largest meteorological data archives in the world. The success of our activities builds on the talent of our scientists and experts, strong partnerships with 35 Member and Co-operating States and the international community, some of the most powerful supercomputers in the world, and the use of innovative technologies and ML across our operations.

ECMWF has also developed a strong partnership with the European Union and has been entrusted with the implementation and operation of the Climate Change and Atmosphere Monitoring Services of the EU Copernicus Programme. We also contribute to the Copernicus Emergency Management Service. Other areas of work include High Performance Computing and the development of digital tools that enable ECMWF to extend provision of data and products covering weather, climate, air quality, fire and flood prediction and monitoring.

See  www.ecmwf.int for more info about what we do.

The Destination Earth (DestinE) initiative
ECMWF is one of the three entities entrusted to implement the DestinE initiative of the European Commission, alongside ESA and EUMETSAT as partners. DestinE aims to deploy several highly accurate thematic digital replicas of the Earth, called Digital Twins. The Digital Twins will help monitor and predict environmental change and human impact, in order to develop and test scenarios that support sustainable development and corresponding European policies for the Green Deal. ECMWF is responsible for the delivery of these digital twins and of the Digital Twin engine, the software infrastructure needed to power them of some of Europe’s largest supercomputers, those of the European HPC Joint Undertaking (EuroHPC). The second phase of DestinE covers the period June 2024 – May 2026, and future phases are foreseen (subject to funding). Phase 2 will focus on early operations with consolidation, maintenance, and continuous evolution of the DestinE system components developed in the first phase. There will also be an enhanced focus on ML activities, including the deployment of workflows of components of a ML model for the Earth system, optimisation of the Digital Twin Engine to enable efficient model training and simulations, and other activities.

For more information on DestinE, see https://ec.europa.eu/digital-single-market/en/destination-earth-destine and https://www.ecmwf.int/en/about/what-we-do/environmental-services/destination-earth

In this role you will:

  • Contribute to the development of DestinE ML training and inference benchmark workflows for use in benchmarking ML models on pre-Exascale supercomputers
  • Benchmark ML models on a wide range of EuroHPC systems as part of DestinE
  • Diagnose performance and scalability bottlenecks of DestinE ML-based workflows and come up with ideas and solutions on how to address them
  • Port and benchmark DestinE ML workflows on future bleeding-edge HPC/ML architectures (e.g., GPUs, TPUs etc)
  • Support and contribute towards the deployment of ML workflows on EuroHPC systems
  • Collaboration with colleagues across ECMWF in the ongoing development of DestinE ML data flows required for training and running ML models

What we’re looking for:

  • Excellent analytical and problem-solving skills with a proactive and constructive approach
  • Excellent interpersonal and communication skills, and the ability to work effectively with specialists across diverse teams
  • Dedication, passion, and enthusiasm to succeed both individually and across teams
  • Flexibility, with the ability to adapt to changing priorities
  • Highly organised with the capacity to work on a diverse range of tasks to tight deadlines
  • Candidates must be able to work and communicate effectively in English

Your education, skills and experience:

  • A university degree (EQF Level 6) in a physical, computational, or mathematical science, or related subject, or equivalent experience
  • Strong “close to the metal” GPU programming skills in any of the following programming models: CUDA, HIP, SYCL, OpenCL
  • Strong HPC programming skills (e.g., MPI, OpenMP, OpenACC)
  • Proven in-depth knowledge of HPC architectures and technologies
  • Experience benchmarking and troubleshooting performance issues of applications on high-performance computing systems
  • Proficiency in object-oriented coding in Python and experience of shell scripting in Unix or Linux environments is required
  • Familiarity with ML frameworks such as PyTorch would be highly advantageous

Other information
Grade remuneration:  The successful candidates will be recruited at the A2 grade, according to the scales of the Co-ordinated Organisations. The position is assigned to the employment category STF-PS  as defined in the ECMWF Staff Regulations. Full details of salary scales and allowances available on the ECMWF website at www.ecmwf.int/en/about/jobs.

Starting date:  As soon as possible

Length of contract: The contract duration is expected to be two years. The DestinE initiative as per the Contribution Agreement is divided into phases, the second of which will last approximately two years from June 2024 to June 2026. There may be the possibility of further contract extensions in the future depending on requirements and funding availability.

Location:         This position will be located at either one of ECMWF’s duty stations in Reading, UK, or in Bonn, Germany.  Candidates are expected to relocate to the duty station. As a multi-site organisation, ECMWF has adopted a hybrid organisation model which allows flexibility for staff to mix office working and teleworking, including working away from the duty station (within the area of our member states and co-operating states) for up to 10 working days per month.

Interviews by videoconference (MS Team) are expected to take place approximately a month after the closing date.

Who can apply 
Applicants are invited to complete the online application form by clicking on the apply button below.

At ECMWF, we consider an inclusive environment as key for our success. We are dedicated to ensuring a workplace that embraces diversity and provides equal opportunities for all, without distinction as to race, gender, age, marital status, social status, disability, sexual orientation, religion, personality, ethnicity and culture. We value the benefits derived from a diverse workforce and are committed to having staff that reflect the diversity of the countries that are part of our community, in an environment that nurtures equality and inclusion.

Applications are invited from nationals from ECMWF Member States and Co-operating States, as well as from all EU Member States.  In these exceptional times, we also welcome applications from Ukrainian nationals for this vacancy.   Applications from nationals from other countries may be considered in exceptional cases.

ECMWF Member States and Co-operating States are: Austria, Belgium, Bulgaria, Croatia, Czech Republic, Denmark, Estonia, Finland, France, Georgia, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Latvia, Lithuania, Luxembourg, Montenegro, Morocco, the Netherlands, Norway, North Macedonia, Portugal, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Türkiye and the United Kingdom.