logo

Scale AI

Machine Learning Research Engineer, GenAI Applied ML

Department
Engineering
Job Type / Location
San Francisco
Experience Required
3+ years
Posted On

About This Role

Lead applied ML engineering on Scale's Applied ML team, powering data infrastructure for leading agentic LLMs (ChatGPT, Gemini, Llama). You will build scalable multi-agent systems to validate agentic reasoning and behaviors, scale human expertise, and drive research into real-world agent reliability failures despite strong benchmarks, shipping production fixes.

Ideal for exceptional engineers with deep research rigor and a relentless focus on practical, high-impact systems. You will iterate rapidly with data, leverage AI tools to accelerate development, and collaborate tightly across engineering, product, and research.

If you excel at turning frontier agent research into reliable deployed systems, we want to hear from you.

You will:

  • Build and deploy multi-agent systems for agentic reasoning validation
  • Develop pipelines to detect errors and scale human judgment
  • Combine classical ML, LLMs, and multi-agent techniques for reliability
  • Lead research into agent failure modes and ship fixes
  • Use AI tools to speed prototyping and iteration
  • Build data-driven evaluations and deploy rapid improvements
  • Integrate systems into Scale's platform

Ideally You’ll Have:

  • PhD or MSc in Computer Science, Mathematics, Statistics, or related field
  • 3+ years shipping scaled production ML systems
  • Demonstrated real-world impact
  • Mastery of PyTorch, TensorFlow, JAX, or scikit-learn
  • Deep expertise in agentic LLMs and multi-agent systems
  • Strong software engineering and microservices (AWS/GCP)
  • Rapid, data-driven iteration
  • Proficiency using AI tools to accelerate work
  • Strong research depth with practical bias
  • Excellent cross-functional communication

Nice to Have:

  • Experience prototyping agent evaluation/reliability systems
  • Human-in-the-loop or annotation pipeline work
  • Open-source contributions in agents, evaluation, or alignment
  • Publications on agent reliability (NeurIPS, ICML, ICLR)

View Assessment Process

Think you'll be a good fit?