Key Responsibilities
- Ensure high availability and performance of production systems
- Design and implement monitoring and alerting systems
- Automate infrastructure provisioning and scaling
- Troubleshoot and resolve critical system issues
- Collaborate with development teams to improve reliability
- Optimize cloud costs and resource utilization
Requirements
- 5+ years in SRE or DevOps roles
- Hands-on experience with Kubernetes and containerization
- Proficiency in scripting (Python/Bash)
- Strong understanding of distributed systems
- Experience with cloud platforms (AWS/GCP/Azure)