About the Role
Together AI is seeking a Machine Learning Engineer to join our Inference Engine team, focusing on optimizing and enhancing the performance of our AI inference systems. This role involves working with state-of-the-art large language models models and ensuring they run efficiently and effectively at scale. If you are passionate about AI inference, PyTorch, and developing high-performance systems, we want to hear from you. This position offers the chance to collaborate closely with AI researchers and engineers to create cutting-edge AI solutions. Join us in shaping the future at Together AI!
Responsibilities
- Design and build the production systems that power the Together AI inference engine, enabling reliability and performance at scale.
- Develop and optimize runtime inference services for large-scale AI applications.
- Collaborate with researchers, engineers, product managers, and designers to bring new features and research capabilities to the world.
- Conduct design and code reviews to ensure high standards of quality.
- Create services, tools, and developer documentation to support the inference engine.
- Implement robust and fault-tolerant systems for data ingestion and processing.
Requirements
- 3+ years of experience writing high-performance, well-tested, production-quality code.
- Proficiency with Python and PyTorch.
- Demonstrated experience in building high performance libraries and tooling.
- Excellent understanding of low-level operating systems concepts including multi-threading, memory management, networking, storage, performance, and scale.
- Preferred: Knowledge of existing AI inference systems such as TGI, vLLM, TensorRT-LLM, Optimum.
- Preferred: Knowledge of AI inference techniques such as speculative decoding.
- Preferred: Knowledge of CUDA/Triton programming.
- Nice to have: Knowledge of Rust, Cython and compilers.