Key Responsibilities
- Design and optimize cloud infrastructure for AI workloads
- Implement cost-effective and scalable solutions on AWS
- Manage Kubernetes clusters and container orchestration
- Automate infrastructure provisioning and configuration management
- Monitor system health and performance with observability tools
- Ensure high availability and disaster recovery strategies
Requirements
- 3+ years of experience in cloud infrastructure or systems engineering
- Proficiency in AWS, Terraform, and Kubernetes
- Strong Linux administration and networking skills
- Experience with monitoring and logging systems (Prometheus/Grafana)
- Knowledge of security best practices and compliance requirements