About Us

Chakra Labs’ mission is to encode human taste into intelligence. We build high-fidelity trajectories, environments, and evaluation systems for frontier AI research, working with several of the top labs.

Our work sits at the intersection of post-training, agent environments, data quality, and research infrastructure. We care about building systems that make models better in ways that are measurable, useful, and hard to fake.

What You’d Work On

Post-training loops. You’d help design and run model improvement workflows across supervised fine-tuning, preference optimization, and reinforcement learning approaches like GRPO. The work is not just launching training jobs; it’s figuring out what signal matters, how to collect it, and whether the model actually improved.
Environment and task design. We build environments that feel real and scenarios that push agents past static benchmark behavior. You’d design tasks, tools, validators, reward signals, and evaluation harnesses that test meaningful capabilities instead of whatever is easiest to measure.
High-fidelity trajectories. You’d create, inspect, and improve the data that teaches models how to behave. That means caring about taste, correctness, edge cases, and whether a trajectory would actually help a frontier model learn.
Reward and evaluation systems. You’d work on reward functions, rubrics, validators, and analysis tools that turn messy model behavior into useful training signal. You should be interested in where evals lie, where rewards get hacked, and how to make measurements more robust.
Training and research infrastructure. You’d run experiments across distributed GPU clusters, work with PyTorch and FSDP, and build the infrastructure needed to support model training, evaluation, and data generation at scale.
Customer research problems. You’d work with frontier AI labs to translate ambiguous research goals into concrete environments, datasets, experiments, and deliverables.

About You

Machine learning fundamentals. You have Masters / PhD-level knowledge of machine learning fundamentals. You understand linear algebra, optimization, stochastic gradient descent, and can reason from first principles when a model or training run behaves unexpectedly.
Hands-on post-training experience. You have worked with or deeply understand LLM fine-tuning, preference data, reward modeling, or reinforcement learning for language models. You know the difference between reproducing a recipe and understanding why it works.
Environment-building instinct. You are excited by agent environments, multi-turn tool use, sandboxed tasks, eval harnesses, and the question of how to test capabilities that do not fit neatly into a benchmark.
Python and PyTorch fluency. You are strong in Python and comfortable building with PyTorch, FastAPI, and modern ML infrastructure. You can work at a high level, but you are also comfortable dropping into lower-level primitives when the abstraction leaks.
Strong engineering judgment. You can move between research ambiguity and production constraints. You know when to iterate quickly, when to be rigorous, and when a result is too fragile to trust.
Experience. No hard rule. Roughly 3-5 years is what we imagine, but more or less experience works if the expectations above resonate with you.

What Makes This Different

The work is concrete. You are not just “touching the latest AI stack.” You are building the environments, trajectories, evals, rewards, and training loops that frontier labs use to improve models.
It is research-facing, but production-minded. Our customers are AI researchers and labs pushing the edge of what agents can do. The systems you build need to support real experiments, real users, and real deadlines.
Ownership, not theater. You own whole problems, not isolated tickets. One week you might be designing a new environment; the next you might be debugging a training run, improving a reward function, or scaling an eval pipeline.
Ambiguity is part of the job. There is no fixed playbook for this work. Data, post-training, and agent evaluation are changing quickly. If you need every problem to be fully specified before you begin, this role will be challenging.
The team. You’ll work with a small, experienced team with backgrounds from companies like Stripe, Snap, AWS, and Microsoft, and with people who have spent years shipping high-impact technical products.

Member of Technical Staff, Post-Training

About Us

What You’d Work On

About You

What Makes This Different

View Assessment Process

Think you'll be a good fit?