Key Responsibilities

Design and operate production LLM serving stacks (vLLM, TGI, Triton) and vector databases (Pinecone, Weaviate, Qdrant)
Build evaluation harnesses for AI features covering accuracy, hallucination, regression, latency, and cost
Own prompt registries, versioning, model routing, A/B testing, and rollback paths as production artifacts
Instrument AI workflows with LangSmith, OpenTelemetry, Prometheus, and Grafana; define SLOs and lead incident response
Drive cost discipline through batching, prompt caching, smaller-model routing, and inference optimization
Mentor engineers and set team standards for AI-assisted engineering tools (Claude, Cursor)

Requirements

5+ years of engineering experience with 2+ years in MLOps or production AI infrastructure
Hands-on production ownership of LLM/ML systems at scale with on-call and scaling decisions
Proficiency in Python, FastAPI, Docker, Kubernetes, and AWS (EC2, S3, EKS, IAM)
Experience with inference tooling (vLLM, TGI, Triton) and evaluation frameworks (LangSmith, Prometheus)
Strong written/verbal communication to defend architectural decisions and mentor teams

Senior MLOps Engineer - Unico Connect

View Assessment Process