Key Responsibilities
- Design and implement production-grade AI Assistants with tool-augmented agents, multi-step planners, and self-reflective reasoning systems
- Develop memory-enabled assistants with short-term and long-term memory management, including persistent vector-based memory and context compression strategies
- Architect Retrieval-Augmented Generation pipelines using vector databases (ChromaDB, FAISS, Weaviate, Milvus) and hybrid search techniques
- Build multi-agent workflows with goal-driven coordination, task decomposition, and agent-to-agent communication using frameworks like LangGraph or CrewAI
- Optimize Small/Tiny LLMs (Phi-3, Mistral, Llama 3) via fine-tuning (LoRA, QLoRA) and low-latency inference strategies
- Implement guardrails, grounded retrieval, and tool validation to reduce hallucinations in enterprise settings
Requirements
- 3–6 years in AI/ML/NLP with hands-on experience building production-grade AI Assistants
- Deep expertise in LLM architectures, tool-calling agents, memory management, and context engineering
- Proficiency in Python and frameworks like Hugging Face, LangChain, LangGraph, or LlamaIndex
- Experience with vector databases and deploying AI systems using Docker/Kubernetes
- Ability to design reliable, traceable multi-agent systems and optimize LLM performance