Company
Barcelona Supercomputing Center - Centro Nacional de Supercomputación
Company Website
Where
Info
Full Time
Closes: 16 March 2023
Applications have closed
Machine Learning Operations Engineer (RE1)

Context And Mission

The Language Technologies Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning for under-resourced languages and domains. It has been entrusted by the Spanish and the Catalan government, with the mission to develop fundamental open-source resources and technologies for Spanish and Catalan. In connection with this, the LT Unit is currently in charge of two flagship projects at the national and regional level: the Spanish National Language Technology Plan, funded by the Spanish Secretariat of Digitalisation and Artificial Intelligence, and the AINA project, aimed at developing AI resources for Catalan, funded by the Catalan Digitalisation Department. In addition, the Unit participates in various EU funded international projects.

The LT Unit at BSC is looking for an MLOps engineer with experience in automating the development, deployment and monitoring of ML models. The MLOps engineer will develop, maintain and optimize ML pipelines to ensure high performance and reliability of ML models, furthermore ensure their effective deployment to the production. The successful candidate will work in a highly sophisticated HPC environment, will have access to state-of-the-art systems and computational infrastructures.

Key Duties

  • Design and implement ML model continuous integration, continuous deployment and continuous monitoring processes for production.
  • Develop and maintain CI/CD pipelines for ML models.
  • Configure and automate ML model deployment, monitoring and scaling in production.
  • Implement monitoring and logging of ML models.
  • Identify areas of improvement in the ML model lifecycle.
  • Collaborate with data engineers and ML engineers to select best ML model architectures and hyperparameters.
  • Ensure the accessibility of the open software, and the replicability of their deployment via containers based architectures.
  • Maintain the code infrastructure as open source repositories.

Requirements

Education

  • Bachelor’s degree in Computer Science, Information Technology or related field.

Essential Knowledge and Professional Experience

  • In-depth knowledge of Linux/Unix systems.
  • Demonstrable experience of automating tasks via Python, bash or other scripting tools
  • Experience with Kubernetes, Docker, or other comparable containerization
  • Experience with cloud technology; AWS, Azure/GCP or OpenStack.
  • Solid knowledge on monitoring and logging tools such as ELK, Prometheus, Grafana, etc.
  • Knowledge on Automation/DevOps tools (such as Github Actions, Gitlab, Jenkins, Ansible etc.)
  • Knowledge of provisioning methods via tools such as Terraform and Helm

Competences

  • Ability to work independently and in a team to complete tasks on schedule.
  • Ability to work under set deadlines

Conditions

  • The position will be located at BSC within the Life Sciences Department
  • We offer a full-time contract, a good working environment, a highly stimulating environment with state-of-the-art infrastructure, flexible working hours, extensive training plan, tickets restaurant, private health insurance, fully support to the relocation procedures
  • Duration: Open-ended contract due to technical and scientific activities linked to the project and budget duration
  • Salary: we offer a competitive salary commensurate with the qualifications and experience of the candidate and according to the cost of living in Barcelona
  • Starting date: asap