Key Responsibilities
- Design and optimize ML infrastructure for scalable training and inference pipelines
- Develop tools and frameworks to streamline model deployment and monitoring
- Collaborate with cross-functional teams to integrate ML systems with production environments
- Implement CI/CD pipelines for ML models and infrastructure components
- Optimize GPU utilization and reduce inference latency for high-performance applications
- Ensure security and compliance in ML infrastructure deployments
Requirements
- 5+ years of experience in ML infrastructure or related fields
- Proficiency in Python and ML frameworks (PyTorch/TensorFlow)
- Experience with Kubernetes, Docker, and cloud platforms (AWS/GCP)
- Strong understanding of distributed computing and GPU acceleration
- Familiarity with MLOps tools and best practices