About the Role
We are looking for Research Scientists and Research Engineers to help us build the data engine for frontier AI. In this role, you will bridge the gap between frontier research and production engineering. You will develop the high-fidelity environments and evaluation stacks that the world’s leading AI labs rely on to stress-test their most advanced agents. Your work will involve iterating on novel RL approaches and translating them into robust, scalable infrastructure that moves the needle on real-world model metrics.
Responsibilities:
- Build Agentic Environments: Design and implement the next generation of "SimLabs", ultra-realistic, long-horizon simulation environments where agents learn to navigate ambiguity and maintain context.
- Programmatic Verification: Develop rigorous, policy-aware judges and evaluations that measure genuine capability and safety beyond simple benchmarks.
- Close the Loop: Design and execute high-quality post-training runs (CPT, SFT, RL) to deliver frontier performance on open-source models using curated, high-signal data.
- Rapid Iteration: Debug and iterate across the full ML stack, from infrastructure to model behavior, ensuring our tools remain "command-line first" and developer-friendly.
- Collaborate: Work daily with the founders and research staff to shape the roadmap and push the state-of-the-art in AI reliability.
About You
We are looking for individuals who demonstrate a rare combination of technical depth, research intuition, and high agency.
- Technical Foundation: A Bachelor’s, Master’s, or PhD in a technical field (CS, Math, Physics, etc.), or a demonstrated "proof of work" through significant open-source contributions or industry experience.
- Engineering Rigor: A strong foundation in software engineering with the ability to build robust, scalable infrastructure. You should be comfortable in a Python-friendly, CLI-first development environment.
- ML Fluency: A principled understanding of foundation models, including how they are constructed, evaluated, and optimized.
- Empirical Mindset: Experience conducting research or technical experiments with a focus on reproducibility and data-driven results.
What will make you stand out
- Research Taste: You have a strong intuition for identifying what matters in complex problem spaces. You can balance deep research exploration with the pragmatism needed to ship a product.
- Impact-Driven Agency: You care about outcomes, not just activity. You don't wait for a ticket; you identify gaps in the system, build the solution, and ensure it moves real-world metrics for frontier AI labs.
- Domain Expertise: Prior experience with Reinforcement Learning (RLHF/RLAIF), simulation systems, or building long-horizon agentic environments.
- Proven Track Record: A history of contributing to influential ML research (e.g., publications at NeurIPS, ICLR, ICML) or maintaining high-impact open-source projects.
- Post-Training Experience: Experience fine-tuning or evaluating large-scale models to deliver "frontier performance" on open-source benchmarks.