Key Responsibilities
- Train and fine-tune large language and vision models for production use
- Optimize training pipelines for efficiency and scalability
- Implement distributed training strategies across GPU clusters
- Collaborate with research teams to improve model architectures
- Monitor training performance and resource utilization
- Ensure reproducibility and version control of training experiments
Requirements
- 3+ years of experience in AI/ML training and optimization
- Expertise in PyTorch and large-scale model training
- Experience with GPU computing and distributed systems
- Strong background in deep learning and neural networks
- Familiarity with data preprocessing and augmentation techniques