logo

Character.AI

Machine Learning Infrastructure Engineer

Department
Engineering
Job Type / Location
Redwood City
Experience Required
4+ years
Posted On

About the role

We’re looking for seasoned ML Infrastructure engineers with experience designing, building and maintaining training and serving infrastructure for ML research.

Responsibilities

  • Provide infrastructure support to our ML research and product
  • Build tooling to diagnose cluster issues and hardware failures
  • Monitor deployments, manage experiments, and generally support our research
  • Maximize GPU allocation and utilization for both serving and training

Requirements

  • 4+ years of experience supporting the infrastructure within an ML environment
  • Experience in developing tools used to diagnose ML infrastructure problems and failures
  • Experience with cloud platforms (e.g., Compute Engine, Kubernetes, Cloud Storage)
  • Experience working with GPUs

Nice to have

  • Experience with large GPU clusters and high-performance computing/networking
  • Experience with supporting large language model training
  • Experience with ML frameworks like Pytorch/TensorFlow/JAX
  • Experience with GPU kernel development

View Assessment Process

Think you'll be a good fit?