About the Role
We’re hiring a Senior Engineering Manager to lead our Reinforcement Learning Environments (RLE) team - the group building the interactive sandboxes where frontier models learn to complete real work.
RLE environments simulate end-to-end workflows across domains like software engineering, finance, and legal research, with realistic tools, constraints, and feedback loops. The platform generates high-signal interaction data researchers use to train and evaluate models for task completion, quality, and robustness.
This is a high-leverage role: the systems you lead directly shape what models can learn, how quickly new domains can launch, and how much researchers trust the signal. You’ll lead a team of ~7 engineers today and are expected to add leadership capacity (including managing an EM) as we scale.
Location: San Francisco, CA. This is an in-office role, 5 days/week (no remote/hybrid)
What You’ll Do
- Lead, hire, and develop a high-performing team building RL environments and the platform behind them
- Own the RLE roadmap and execution in close partnership with Research, Product, and Operations
- Drive architecture for scalable, reliable, extensible environment systems and data generation pipelines
- Build modular, plug-and-play domains that integrate cleanly with training and evaluation loops
- Raise the bar on reliability, observability, performance, and data quality
- Create a culture of ownership, speed, and strong engineering fundamentals in an ambiguity heavy setting
What We’re Looking For
- Engineering leader + builder: 3+ years managing teams, plus 5+ years hands-on engineering experience
- Strong people leadership: experience leading senior engineers; managing an EM (or equivalent scope) is a plus
- Execution in ambiguity: proven ability to align cross-functionally and deliver in fast-moving, unclear problem spaces
- Systems + product mindset: strong platform/distributed systems background, and the ability to turn research/ops needs into a clear roadmap, ship iteratively, and measure outcomes
Nice to Have
- Experience with RL training infrastructure, simulation systems, or evaluation platforms
- Human-in-the-loop systems (annotation, rubric tooling, QA pipelines, workflow platforms)
- Operations-heavy, tech-enabled environment experience
- Familiarity with AWS/GCP, APIs, Docker, and modern stacks (TypeScript/Node, React)
- Experience building systems used by applied ML or AI research teams