About Us
AI71 is an applied research team dedicated to creating helpful and responsible AI agents for knowledge workers. Working closely with our industry partners, our cross-functional teams of AI experts build products grounded in the cutting-edge research of our colleagues from the Technology Innovation Institute (TII).
About the Role
As a Senior LLM Engineer, you will be responsible for the end-to-end development, optimization, and deployment of large language models. You'll work on challenging problems at the intersection of deep learning, natural language processing, and distributed computing.
What You'll Do
- Analyze large and complex datasets to extract meaningful insights and inform data-driven decision-making.
- Develop, train, and deploy predictive models to enhance the capabilities of our AI solutions.
- Collaborate with cross-functional teams to understand business objectives and translate them into actionable data science tasks.
- Design and implement advanced LLM architectures, including transformer-based models and their variants.
- Develop novel attention mechanisms and positional encoding schemes.
- Experiment with model scaling techniques and efficient architectures (e.g., MoE, sparse transformers).
- Continuously evaluate and improve existing models based on real-world performance and evolving business needs.
- Implement and optimize distributed training pipelines for large-scale models.
- Develop strategies for efficient fine-tuning, including parameter-efficient techniques (e.g., LoRA, prefix tuning).
- Apply advanced optimization techniques such as mixed-precision training and gradient accumulation.
- Optimize models for inference, including quantization and pruning techniques.
- Implement efficient serving solutions for real-time inference.
- Develop strategies for model compression and knowledge distillation.
- Develop task-specific algorithms for applications such as text classification, named entity recognition, and question-answering.
- Work with MLOps teams to design and maintain training and serving infrastructure.
What You'll Bring
- 5+ years of experience in deep learning and NLP, with a focus on large language models.
- Master's or Ph.D. in Data Science, Statistics, Computer Science, or a related field.
- Expert-level proficiency in Python and at least one deep learning framework (PyTorch, TensorFlow, or JAX).
- Strong understanding of transformer architectures, attention mechanisms, and recent advancements in LLMs.
- Experience with distributed training frameworks (e.g., DeepSpeed, Megatron-LM).
- Proficiency in optimizing model performance using techniques like mixed-precision training, gradient checkpointing, and model parallelism.
- Understanding of NLP algorithms such as tokenization, parsing, and semantic analysis.
- Experience with sequence-to-sequence models and self-supervised learning techniques.
- Experience with both SQL and NoSQL databases for managing training data and model artifacts.