Role Summary

The Senior AI Engineer will lead the design and development of advanced Generative AI systems, including embeddings pipelines, vector database architectures, retrieval-augmented generation (RAG) frameworks, model evaluation pipelines, and enterprise-grade LLM integrations. The role requires deep expertise in transformer architectures, fine-tuning and optimizing LLMs, and implementing GPU-accelerated AI workloads using PyTorch, TensorFlow, and CUDA. The engineer will collaborate with cross-functional teams to build scalable, secure, and highly performant AI platforms.

Key Responsibilities

LLM & RAG Architecture

Design, build, and optimize end-to-end RAG systems including retrievers, rankers, context assembly, and generative components.
Develop and fine-tune LLMs (open-source and proprietary) for domain-specific use cases.
Implement prompt engineering, prompt orchestration, and guardrails for enterprise applications.
Create and optimize embedding generation workflows using transformer-based models.

Vector Database & Retrieval Systems

Architect high-performance vector search solutions using vector databases (e.g., FAISS, Pinecone, Weaviate, Milvus, PGVector).
Implement indexing strategies, ANN algorithms, sharding, and scaling approaches for large embedding stores.
Ensure latency optimization, relevance tuning, and reliability of retrieval pipelines.

Evaluation & Monitoring Pipelines

Build automated evaluation frameworks for RAG/LLM pipelines using metrics such as faithfulness, relevance, hallucination detection, and latency.
Operationalize model monitoring, drift detection, feedback loops, and continuous improvement workflows.
Integrate human-in-the-loop (HITL) evaluation mechanisms for production AI systems.

ML Engineering & Orchestration

Develop scalable embeddings and model-serving pipelines using Airflow, Kubeflow, Ray, or similar orchestration frameworks.
Optimize model performance on GPUs leveraging CUDA kernels, mixed precision training, and distributed training techniques.
Implement CI/CD for ML pipelines, model versioning, and reproducibility using MLOps practices.

Integration & Platform Engineering

Build APIs, microservices, and inference endpoints to integrate LLM capabilities into enterprise applications.
Collaborate with data engineering teams to integrate AI services with data lakes, warehouses, and unstructured content repositories.
Ensure security, compliance, observability, and uptime for all AI services.

Required Skills & Qualifications

5–10 years of hands-on experience in AI/ML engineering.
Minimum 3–4 full-cycle AI/LLM projects delivered in enterprise or production environments.
Deep understanding of transformer architectures, LLM internals, fine-tuning strategies, and RAG frameworks.
Strong proficiency in Python, PyTorch, TensorFlow, and GPU-accelerated development using CUDA.
Experience with vector search technologies (FAISS, Pinecone, Weaviate, Milvus, etc.).
Expertise in building embeddings pipelines, evaluation systems, and scalable ML workflows.
Strong understanding of distributed systems, containerization (Docker), Kubernetes, and API development.
Knowledge of data security, privacy, and governance considerations for enterprise AI.
Bachelor’s or Master’s degree in Computer Science, AI/ML, Data Science, or a related technical field.

Preferred Qualifications

Experience with commercial LLM ecosystems (OpenAI, Anthropic, Meta Llama, Mistral, etc.).
Familiarity with GPU cluster management (NVIDIA Triton, DeepSpeed, Hugging Face Accelerate).
Prior work in information retrieval, NLP pipelines, and knowledge-augmented generative systems.
Contributions to open-source AI projects or research publications.

Senior AI Engineer – LLMs, RAG, and Vector Systems