About the Role
As a Senior Engineering Manager, you will lead the team owning both the product experience and the foundational infrastructure of Model Serving — shaping customer-facing capabilities while designing for scalability, extensibility, and performance across both CPU and GPU inference — and collaborate closely across the platform, product, infrastructure, and research organizations.
The impact you will have:
- Lead, mentor, and grow a high-performing engineering team responsible for both the customer-facing Model Serving product and its foundational infrastructure — covering runtime, APIs, scaling, reliability, and integrations.
- Define and own the product and technical roadmap for Model Serving, balancing customer experience, functionality, and foundational investments across deployment, inference, monitoring, and scaling.
- Collaborate closely with product, research, platform, and infrastructure teams to drive end-to-end delivery — from ideation and prioritization to launch and operation.
- Ensure Model Serving meets stringent SLAs, SLOs, and performance and reliability goals, continuously improving operational efficiency and customer experience.
- Drive architectural decisions and product design around latency, throughput, autoscaling, GPU/CPU placement, and cost optimization.
- Advocate for customer needs through direct engagement, ensuring engineering decisions translate to clear product impact.
- Promote best practices in code quality, testing, observability, and operational readiness.
- Foster a culture of excellence, inclusion, and continuous improvement across the team.
- Partner with recruiting to attract, hire, and develop top-tier engineering talent.
What we look for:
- 5+ years of experience in technical leadership or management.
- Proven track record building and operating large-scale distributed systems, preferably real-time or low-latency APIs.
- Deep understanding of real-time serving systems.
- Experience driving architectural design and operational excellence for production systems with measurable SLAs and SLOs.
- Familiarity with CPU/GPU performance optimization, concurrency, caching, and scalability concepts.
- Excellent collaboration and communication skills across engineering, product, and research organizations.
- Ability to lead teams through ambiguity and deliver complex, cross-functional projects.
- BS in Computer Science (Masters or PhD Preferred).