Key Responsibilities
- Design and optimize ML systems for scalability and performance
- Develop infrastructure for training and deploying ML models
- Implement distributed computing solutions for large-scale ML workloads
- Collaborate with ML engineers to integrate systems components
- Ensure reliability and efficiency of ML pipelines
- Monitor and troubleshoot system-level issues in production
Requirements
- 5+ years of experience in ML systems and distributed computing
- Proficiency in Python and systems design principles
- Experience with Kubernetes, Docker, and cloud infrastructure
- Strong understanding of distributed systems and scalability
- Ability to work in a fast-paced, technical environment