logo

Undisclosed

Principal Engineer, AI Infrastructure

Department
Engineering
Job Type / Location
Redwood City
Experience Required
10+ years
Posted On

About the Role

We are seeking a highly experienced and technically profound Principal Engineer, AI Infrastructure to design, build, and scale our GPU-accelerated AI infrastructure. You will be instrumental in enabling real-time inference for a variety of AI models across different modalities and driving the entire model lifecycle from deployment to optimization.

Responsibilities

  • Design and operate GPU infrastructure for model hosting, including provisioning, scheduling, and cost optimization across cloud and on-premise environments.
  • Build and scale model serving systems using vLLM, TensorRT-LLM, Triton, or equivalent, supporting real-time inference with strong latency and availability guarantees.
  • Implement multi-model routing to serve multiple models across modalities (text, voice, code, vision) on shared infrastructure.
  • Own the model lifecycle end to end: download, deploy, serve, monitor, swap, and scale.
  • Drive inference optimization including quantization strategies (AWQ, GPTQ), batching, caching, and cold start reduction.
  • Build self-service infrastructure platforms where teams provision compute, storage, and model endpoints through APIs and control planes.
  • Implement infrastructure-as-code at scale using Terraform, Pulumi, or CDK.
  • Build observability and reliability for inference systems: SLIs/SLOs, GPU utilization monitoring, latency tracking, automated capacity planning, and alerting.
  • Define platform standards and governance including multi-tenant isolation, cost attribution, and resource quotas.
  • Lead architectural design and influence engineering direction across the AI infrastructure stack.

View Assessment Process

Think you'll be a good fit?