Key Responsibilities
- Design and maintain scalable data infrastructure to ensure high availability and performance
- Implement and optimize CI/CD pipelines for data processing systems
- Monitor and troubleshoot production systems using observability tools
- Automate infrastructure provisioning and configuration management
- Collaborate with cross-functional teams to improve system reliability
- Develop and maintain data pipelines for real-time and batch processing
Requirements
- 5+ years of experience in site reliability engineering or data engineering
- Proficiency in Python and scripting for automation
- Experience with containerization and orchestration (Docker, Kubernetes)
- Knowledge of infrastructure as code (Terraform, Ansible)
- Familiarity with monitoring and logging tools (Prometheus, Grafana, ELK)