As a Member of Technical Staff, Training Infra Engineer at cohere, you will be responsible for designing and implementing scalable training infrastructure for machine learning models. This role requires a strong technical background in software engineering, with expertise in Python, Node.js, and AWS. You will work closely with the training team to ensure seamless integration of infrastructure with model development and deployment.
Key Responsibilities:
- Design and implement scalable training infrastructure for machine learning models.
- Collaborate with the training team to ensure seamless integration of infrastructure with model development and deployment.
- Develop and maintain high-quality, efficient, and scalable software components.
- Work closely with cross-functional teams to identify and prioritize infrastructure needs.
- Contribute to the development of technical documentation and knowledge sharing.
Requirements:
- 5+ years of experience in software engineering, with a focus on infrastructure and machine learning.
- Strong expertise in Python, Node.js, and AWS.
- Experience with containerization and orchestration tools (e.g. Docker, Kubernetes).
- Excellent problem-solving skills and ability to work in a fast-paced environment.
- Strong communication and collaboration skills.