As a Software Engineer on our RL Training Infrastructure team, you will be responsible for designing, developing, and maintaining the infrastructure that supports our reinforcement learning training workloads. This includes building and operating large-scale distributed systems, developing tools and scripts to automate and streamline our workflows, and collaborating with other teams to ensure seamless integration with our broader infrastructure.

Key Responsibilities:

Design and develop scalable and efficient infrastructure to support RL training workloads
Develop tools and scripts to automate and streamline workflows
Collaborate with other teams to ensure seamless integration with broader infrastructure
Develop and maintain high-quality, well-documented code
Troubleshoot and resolve infrastructure-related issues

Requirements:

3+ years of experience in software engineering, with a focus on distributed systems and infrastructure
Strong understanding of machine learning and reinforcement learning concepts
Experience with Python, Node.js, and AWS
Excellent problem-solving and communication skills

Software Engineer, RL Training Infra

View Assessment Process

Think you'll be a good fit?