logo

Thinking Machines Lab

Research Engineer, Infrastructure, RL Systems

Department
Engineering
Job Type / Location
San Francisco
Experience Required
3+ years
Posted On

About the Role

We’re looking for an infrastructure research engineer to design and build the core systems that enable scalable, efficient training of large models through reinforcement learning.

This role sits at the intersection of research and large-scale systems engineering: a builder who understands both the algorithms behind RL and the realities of distributed training and inference at scale. You’ll wear many hats, from optimizing rollout and reward pipelines to enhancing reliability, observability, and orchestration, collaborating closely with researchers and infra teams to make reinforcement learning stable, fast, and production-ready.

What You’ll Do

  • Design, build, and optimize the infrastructure that powers large-scale reinforcement learning and post-training workloads.
  • Improve the reliability and scalability of RL training pipeline, distributed RL workloads, and training throughput.
  • Develop shared monitoring and observability tools to ensure high uptime, debuggability, and reproducibility for RL systems.
  • Collaborate with researchers to translate algorithmic ideas into production-grade training pipelines.
  • Build evaluation and benchmarking infrastructure that measures model progress on helpfulness, safety, and factuality.
  • Publish and share learnings through internal documentation, open-source libraries, or technical reports that advance the field of scalable AI infrastructure.

Skills and Qualifications

Minimum qualifications:

  • Bachelor’s degree or equivalent experience in computer science, electrical engineering, statistics, machine learning, physics, robotics, or similar.
  • Strong engineering skills, ability to contribute performant, maintainable code and debug in complex codebases
  • Understanding of deep learning frameworks (e.g., PyTorch, JAX) and their underlying system architectures.
  • Thrive in a highly collaborative environment involving many, different cross-functional partners and subject matter experts.
  • A bias for action with a mindset to take initiative to work across different stacks and different teams where you spot the opportunity to make sure something ships.

Preferred qualifications — we encourage you to apply if you meet some but not all of these:

  • Experience training or supporting large-scale language models with tens of billions of parameters or more.
  • Experience working with reinforcement learning workloads (e.g., PPO, DPO, RLHF, or reward modeling).
  • Background in high-performance or reliability engineering — distributed training frameworks and cluster orchestration (Kubernetes, Slurm).
  • Familiarity with monitoring and observability tools (Prometheus, Grafana, OpenTelemetry).
  • Contributions to large-scale ML research or infrastructure, open-source frameworks, or internal performance optimization efforts.

View Assessment Process

Think you'll be a good fit?