About the Role
Architect the core platform that powers synthetic data generation, agentic workflows, RL environments, and scalable LLM operations. Design and evolve the APIs, compute services, and orchestration layers that empower internal and customer-facing applications. Generate and refine synthetic data at massive scale. Create and evaluate agentic systems that reason, act, and improve over time. Provision RL training environments and simulation frameworks for next-generation AI agents. Deploy robust benchmarks, datasets, and automated labeling pipelines to accelerate model development. Run LLMs and multi-agent systems efficiently, reliably, and cost-effectively across cloud and hybrid environments. This role requires a hybrid schedule with three days per week in our Redwood City HQ or the SF office.
Responsibilities
- Architect the core platform that powers synthetic data generation, agentic workflows, RL environments, and scalable LLM operations.
- Design and evolve the APIs, compute services, and orchestration layers that empower internal and customer-facing applications.
- Generate and refine synthetic data at massive scale.
- Create and evaluate agentic systems that reason, act, and improve over time.
- Provision RL training environments and simulation frameworks for next-generation AI agents.
- Deploy robust benchmarks, datasets, and automated labeling pipelines to accelerate model development.
- Run LLMs and multi-agent systems efficiently, reliably, and cost-effectively across cloud and hybrid environments.
Requirements
- 5 years of experience building customer-facing, cloud-native software systems, specifically with AI/ML pipelines, LLM systems, or agentic workflows; or MLOps/model platform infrastructure, data pipelines, workflow orchestration, training/inference infrastructure, or production ML systems.
- Experience with distributed computing, large-scale data systems, or orchestration frameworks.
- Experience at high-growth technology startups.
- Experience building software products for large enterprise customers.
- Expertise in Python and cloud platforms (AWS, GCP, or Azure).
- Strong understanding of production web-scale systems: monitoring, telemetry, reliability, performance, debugging, and triage.