logo

Prolific

Senior MLOps Engineer

Department
Engineering
Job Type / Location
remote
Experience Required
5+ years
Posted On

About the Role

As a Senior MLOps Engineer, you will be the backbone of our AI production lifecycle. You will bridge the gap between research and real-world application, ensuring our Data Scientists, AI Researchers, Product teams, and others in the company have the high-performance infrastructure, automated pipelines, and deployment strategies needed to ship state-of-the-art models and agents at scale.

What You’ll be Doing

Infrastructure & Platform Engineering

  • Infrastructure as Code (IaC): Design and maintain scalable cloud environments (GCP/AWS) using Terraform.
  • Resource Provisioning: Manage GPU/TPU resource allocation for training, fine-tuning, and interactive notebooks.
  • Custom Tooling: Build internal services and CLI tools to streamline the developer experience for the AI team.

ML & LLM Orchestration & Observability

  • Automated Pipelines: Design CI/CD and training pipelines using tools such as GitHub Actions, MLFlow, Vertex AI Pipelines. Ensure high quality training data (e.g., introducing a feature store).
  • Deployment Methodology: Develop reusable patterns for model serving. Managing service deployments to Kubernetes.
  • Vector Infrastructure: Manage and optimize vector databases and embedding pipelines for RAG-based systems.
  • Observability and Reliability: Model drift monitoring, resource utilisation, LLM and agent tracing.

Performance & Optimization

  • Inference Optimization: Implement techniques to reduce latency and increase throughput (quantisation, distillation, etc.).
  • Cold Start Mitigation: Solve scaling bottlenecks for serverless or containerized model deployments.
  • Cost Management: Optimize GPU utilization and cloud spend without compromising performance.

AI Enablement

  • Support AI Agent Deployment: Define and create tooling and service templates around agent deployment (tool libraries, tracing, default agent frameworks, skills, etc.).
  • Enablement for non-technical agent users: Help create workflows and guidance on no-code/low-code agent platforms (n8n, LangSmith, or similar). Create tooling and policies to enable safe usage of local agents such as Claude code.

Who We’re Looking For

  • 5+ years experience with cloud infrastructure and infrastructure as code.
  • Previous experience with the ML and LLM lifecycle - training, hosting, optimisation, observability.
  • Used to working closely with researchers and data scientists - taking experiments from worksheets into production.
  • Strong grasp of ML fundamentals and modern GenAI stack.

View Assessment Process

Think you'll be a good fit?