Key Responsibilities
- Own and operate Kubernetes platform, including cluster health, workload deployments, scaling, and incident response
- Design, implement, and maintain infrastructure automation using Ansible, Terraform, and GitOps workflows (ArgoCD/Flux)
- Lead migration projects to modernize on-premises workloads toward cloud-native platform services
- Build and maintain CI/CD pipelines for infrastructure and application delivery using CircleCI or GitHub Actions
- Drive observability improvements with Datadog, Splunk, and ELK, including dashboards, alert tuning, and SLO/SLA definition
- Participate in on-call rotations for P1/P2 incidents and scheduled maintenance windows
Requirements
- 4–7 years of experience in platform engineering, SRE, or infrastructure engineering
- Strong hands-on experience with Kubernetes (cluster operations, Helm, workload troubleshooting) and Linux systems administration (Ubuntu)
- Proficiency with infrastructure-as-code tooling (Ansible, Terraform) and GitOps workflows
- Experience with VMware vSphere and CI/CD pipelines at scale
- Demonstrated ability to self-direct complex projects with minimal oversight