Key Responsibilities
- Build and optimize the LLM abstraction layer using LiteLLM or AWS Bedrock for seamless model swaps without application code changes
- Design and implement high-performance RAG pipelines including ingestion, chunking, embedding, indexing, and retrieval
- Develop reusable prompt libraries and structured output parsing with schema validation for LLM responses
- Optimize context window usage and token budgeting for efficient LLM interactions
- Implement hybrid retrieval strategies (dense + sparse/BM25) and reranking for improved accuracy
- Run prompt and retrieval evaluation experiments using frameworks like Ragas, DeepEval, or Langfuse
Requirements
- Strong knowledge of modern LLMs including Claude, GPT-4, Llama, and Mistral
- Hands-on experience with LiteLLM, AWS Bedrock, or Azure OpenAI
- Production experience implementing RAG systems using LlamaIndex or LangChain
- Experience with vector databases such as OpenSearch, Qdrant, Weaviate, or Milvus
- Strong Python programming skills and familiarity with embedding models