logo

Prime Intellect

Research Engineer - RL Infrastructure

Department
Research
Job Type / Location
San Francisco
Experience Required
3+ years
Posted On

About Prime Intellect

Prime Intellect is building the open superintelligence stack: from frontier agentic models to the infrastructure that enables anyone to train, adapt, and deploy them. We unify globally distributed compute into a single control plane and pair it with the full reinforcement learning post-training stack: environments, secure sandboxes, verifiable evaluations, and our async RL trainer. We enable researchers, startups, and enterprises to run end-to-end RL at frontier scale, adapting models to real tools, workflows, and deployment environments.

We are looking for a Research Engineer to work on the systems layer behind large-scale RL training. This role is for someone who enjoys going deep on performance: optimizing kernels, improving memory and communication efficiency, scaling distributed workloads, and pushing the throughput and reliability of training systems closer to hardware limits. If you care about making large-scale model training faster, cheaper, and more robust, we’d love to talk.

What You’ll Work On

  • Build and optimize the systems infrastructure behind large-scale RL and distributed training workloads.
  • Improve end-to-end training efficiency across compute, memory, networking, and scheduling layers.
  • Design and implement low-level performance optimizations, including kernels, communication paths, and runtime improvements.
  • Work on distributed training systems spanning data, tensor, and pipeline parallel workloads.
  • Help shape the architecture of our RL training stack, including async rollout and post-training systems.
  • Contribute to open-source libraries and internal infrastructure used for frontier-scale model training.
  • Collaborate closely with researchers and infrastructure engineers to translate bottlenecks into concrete systems improvements.
  • Stay at the frontier of training systems, inference systems, compiler/runtime tooling, and hardware-aware optimization techniques.

You May Be a Fit If You Have

  • Strong systems engineering experience in AI/ML infrastructure, especially around large-scale model training or inference.
  • Deep familiarity with PyTorch and distributed training frameworks such as PyTorch Distributed, DeepSpeed, FSDP, Megatron, vLLM, Ray, or related tooling.
  • Experience optimizing training performance across kernels, memory movement, communication overhead, or parallelization strategy.
  • Hands-on experience with large-scale training techniques including data parallelism, tensor parallelism, and pipeline parallelism.
  • Strong understanding of GPU architecture, profiling, and performance debugging.
  • Ability to identify bottlenecks across the stack and drive improvements from first principles.
  • Comfort working in a fast-moving environment with ambiguous problems and high ownership.

Especially Exciting

  • Experience writing or optimizing CUDA / Triton kernels.
  • Experience with compiler or runtime optimization for ML systems.
  • Experience working on RL training infrastructure, rollout systems, or asynchronous training pipelines.
  • Experience with multi-node GPU clusters and high-performance networking.
  • Contributions to open-source ML systems or infrastructure projects.
  • Interest in publishing technical work or sharing insights through engineering blogs and technical writing.

View Assessment Process

Think you'll be a good fit?