Key Responsibilities
- Design and maintain GPU-accelerated infrastructure platforms
- Optimize Kubernetes clusters for GPU workloads
- Develop automation for GPU resource provisioning
- Monitor and troubleshoot infrastructure performance
- Collaborate with ML teams to optimize GPU utilization
- Implement security best practices for multi-tenant GPU environments
Requirements
- 3+ years of experience in infrastructure engineering
- Expertise in Kubernetes and container orchestration
- Experience with GPU computing and drivers
- Strong understanding of cloud infrastructure (AWS/GCP/Azure)
- Familiarity with CI/CD pipelines and infrastructure as code