logo

InOpTra Digital

Senior AI Engineer – LLMs, RAG, and Vector Systems

Department
Engineering
Job Type / Location
Bengaluru
Experience Required
5+ years
Posted On

Role Summary

The Senior AI Engineer will lead the design and development of advanced Generative AI systems, including embeddings pipelines, vector database architectures, retrieval-augmented generation (RAG) frameworks, model evaluation pipelines, and enterprise-grade LLM integrations. The role requires deep expertise in transformer architectures, fine-tuning and optimizing LLMs, and implementing GPU-accelerated AI workloads using PyTorch, TensorFlow, and CUDA. The engineer will collaborate with cross-functional teams to build scalable, secure, and highly performant AI platforms.

Key Responsibilities

LLM & RAG Architecture

  • Design, build, and optimize end-to-end RAG systems including retrievers, rankers, context assembly, and generative components.
  • Develop and fine-tune LLMs (open-source and proprietary) for domain-specific use cases.
  • Implement prompt engineering, prompt orchestration, and guardrails for enterprise applications.
  • Create and optimize embedding generation workflows using transformer-based models.

Vector Database & Retrieval Systems

  • Architect high-performance vector search solutions using vector databases (e.g., FAISS, Pinecone, Weaviate, Milvus, PGVector).
  • Implement indexing strategies, ANN algorithms, sharding, and scaling approaches for large embedding stores.
  • Ensure latency optimization, relevance tuning, and reliability of retrieval pipelines.

Evaluation & Monitoring Pipelines

  • Build automated evaluation frameworks for RAG/LLM pipelines using metrics such as faithfulness, relevance, hallucination detection, and latency.
  • Operationalize model monitoring, drift detection, feedback loops, and continuous improvement workflows.
  • Integrate human-in-the-loop (HITL) evaluation mechanisms for production AI systems.

ML Engineering & Orchestration

  • Develop scalable embeddings and model-serving pipelines using Airflow, Kubeflow, Ray, or similar orchestration frameworks.
  • Optimize model performance on GPUs leveraging CUDA kernels, mixed precision training, and distributed training techniques.
  • Implement CI/CD for ML pipelines, model versioning, and reproducibility using MLOps practices.

Integration & Platform Engineering

  • Build APIs, microservices, and inference endpoints to integrate LLM capabilities into enterprise applications.
  • Collaborate with data engineering teams to integrate AI services with data lakes, warehouses, and unstructured content repositories.
  • Ensure security, compliance, observability, and uptime for all AI services.

Required Skills & Qualifications

  • 5–10 years of hands-on experience in AI/ML engineering.
  • Minimum 3–4 full-cycle AI/LLM projects delivered in enterprise or production environments.
  • Deep understanding of transformer architectures, LLM internals, fine-tuning strategies, and RAG frameworks.
  • Strong proficiency in Python, PyTorch, TensorFlow, and GPU-accelerated development using CUDA.
  • Experience with vector search technologies (FAISS, Pinecone, Weaviate, Milvus, etc.).
  • Expertise in building embeddings pipelines, evaluation systems, and scalable ML workflows.
  • Strong understanding of distributed systems, containerization (Docker), Kubernetes, and API development.
  • Knowledge of data security, privacy, and governance considerations for enterprise AI.
  • Bachelor’s or Master’s degree in Computer Science, AI/ML, Data Science, or a related technical field.

Preferred Qualifications

  • Experience with commercial LLM ecosystems (OpenAI, Anthropic, Meta Llama, Mistral, etc.).
  • Familiarity with GPU cluster management (NVIDIA Triton, DeepSpeed, Hugging Face Accelerate).
  • Prior work in information retrieval, NLP pipelines, and knowledge-augmented generative systems.
  • Contributions to open-source AI projects or research publications.

View Assessment Process

Think you'll be a good fit?