Key Responsibilities
- Develop and refine model training, evaluation, and deployment pipelines for benchmarking advanced AI systems.
- Collaborate with research and engineering teams to execute MLE Bench-style evaluation tasks across diverse AI models.
- Debug, refactor, and optimize production-grade ML systems to ensure correctness, efficiency, and scalability.
- Curate datasets, features, and metrics critical for ML benchmarking and validation processes.
- Identify failure modes and edge cases in model behavior through rigorous testing and evaluation.
- Write clean, well-documented Python code adhering to best practices for reproducibility and maintainability.
Requirements
- Minimum 3+ years of experience as a Machine Learning Engineer or Software Engineer with ML focus.
- Proficiency in Python and hands-on experience with ML workflows in production environments.
- Strong understanding of machine learning fundamentals, including supervised/unsupervised learning and evaluation metrics.
- Experience with ML frameworks such as PyTorch, TensorFlow, or JAX.
- Ability to navigate and modify complex, real-world ML codebases effectively.