Key Responsibilities
- Develop and optimize training pipelines for large language models (LLMs) and vision-language models (VLMs)
- Implement efficient data loading and preprocessing for model training
- Design and maintain distributed training infrastructure
- Collaborate with researchers to implement novel training techniques
- Monitor and optimize model performance and training efficiency
- Document training methodologies and best practices
Requirements
- 3+ years of experience in machine learning or related fields
- Hands-on experience with LLM/VLM training pipelines
- Proficiency in Python and deep learning frameworks
- Experience with distributed computing and GPU acceleration
- Strong problem-solving and debugging skills