About SpotDraft
SpotDraft is an AI-powered Legal Intelligence Platform for high-growth companies. We are building a product to ensure convenient, fast and easy contracting for businesses. We know the potential to be unlocked if legal teams are equipped with the right kind of tools and systems. Currently, customers like PhonePe, Chargebee, Unacademy, Meesho and Cred use SpotDraft to streamline contracting within their organisations. On average, SpotDraft saves legal counsels within the company 10 hours per week and helps close deals 25% faster.
Job Summary
SpotDraft is revolutionizing legal operations with AI-powered tools that help legal teams work faster and smarter. Our flagship products leverage cutting-edge LLM technology to automate routine legal work and accelerate contract review.
- Sidebar is an AI-powered team of legal assistants that handles routine legal work—from contract analysis and legal research to compliance tracking and drafting. It tackles everything beyond contracts, including policy questions, regulatory compliance, and strategic advice, learning from your organization's knowledge to become a specialized legal co-pilot.
- VerifAI is an AI contract review tool that works as a Microsoft Word add-in, helping legal teams review contracts up to 70% faster. It uses generative AI to check contracts against personal or organizational guidelines and answer open-ended questions, automatically flagging deviations and suggesting improvements.
The Role
As a Senior Applied AI Engineer, you'll architect and deploy the production AI systems powering these products. You'll build distributed, fault-tolerant services that process millions of legal documents, designing multi-agent architectures and optimizing LLM performance for accuracy, speed, and cost efficiency in our mission-critical SaaS environment.
What You’ll Do
Build Production AI Features
- Design and implement end-to-end AI features using transformer-based architectures, fine-tuned models, and prompt engineering at scale for legal document workflows (summarization, clause extraction, document comparison, intelligent drafting).
- Build stateful agentic systems using ReAct/Tree-of-Thoughts frameworks with memory persistence, tool orchestration (function calling), multi-turn reasoning capabilities, and chain-of-thought prompting handling 10K+ concurrent sessions.
Architect Scalable RAG & Context Systems
- Design semantic chunking, hybrid search (BM25 + dense embeddings), reranking models, query decomposition, and HyDE (Hypothetical Document Embeddings) for large legal documents.
- Build retrieval-grounded generation pipelines with dynamic context window allocation and efficient vector database queries for improved precision/recall.
Optimize Infrastructure & Performance
- Design distributed systems with horizontal scaling, load balancing, and circuit breakers for LLM inference services processing millions of requests/day.
- Implement inference optimizations: request batching, semantic caching, KV-cache optimization, quantization (INT8/FP16), and model serving frameworks (vLLM, TensorRT-LLM, TGI).
- Build low-latency API gateways with rate limiting, retry logic, and fallback mechanisms across multiple LLM providers (OpenAI, Anthropic, Google Gemini).
Ensure Quality & Reliability
- Build multi-layered hallucination mitigation with fact-checking agents, confidence calibration, and citation validation.
- Design evaluation harnesses with automated metrics (BLEU, ROUGE, BERTScore, semantic similarity), LLM-as-judge pipelines, and human-in-the-loop labeling workflows.
- Create experimentation platforms for A/B testing prompt variations, model comparisons, and configuration changes with statistical rigor.
MLOps & Observability
- Build MLOps pipelines with automated training, evaluation, and deployment using Kubernetes, Argo Workflows, and feature stores.
- Implement blue-green deployments, shadow traffic testing, and automated rollback mechanisms with SLO-based monitoring.
- Create monitoring stacks with distributed tracing, anomaly detection, RLHF feedback loops, and dashboards tracking latency (p50/p95/p99), token consumption, error rates, and cost per request.
What we’re looking for
Must Have
- Experience: 6+ years in AI/ML engineering with a proven track record deploying LLM-based systems at scale.
- LLM Expertise: Deep hands-on experience with GPT-4, Claude, Gemini, or Llama models via API integration, fine-tuning (LoRA, QLoRA), and prompt optimization techniques.
- Production AI Systems: Shipped production AI features (chatbots, RAG systems, document intelligence) with measurable impact on latency, accuracy, and cost metrics.
- Technical Depth:
- Expert-level Python with strong software engineering fundamentals (design patterns, testing, profiling).
- Deep understanding of transformer architectures, attention mechanisms, and embedding models.
- Proficiency with vector databases (Pinecone, Weaviate, Qdrant, pgvector) and semantic search.
- Experience with distributed systems, microservices architecture, and async programming (asyncio, FastAPI, gRPC).
- Strong knowledge of data structures, algorithms, and computational complexity for optimization.
- MLOps & Infrastructure: Hands-on experience with containerization (Docker, Kubernetes), CI/CD pipelines, and cloud platforms (AWS SageMaker, GCP Vertex AI, Azure OpenAI).
- Performance Optimization: Demonstrated ability to optimize inference latency (p95 < 2s), and scale systems to handle traffic spikes.
- Evaluation & Metrics: Experience designing evaluation frameworks, building synthetic test datasets, and implementing A/B testing for ML systems.
- Systems Thinking: Ability to make architecture trade-offs considering latency, cost, reliability, and maintainability at scale.
Good to Have
- Experience with Langraph, Langsmith, and Langchain ecosystem or other agentic libraries like llamaindex, and Google agent kit, crew ai.
- Large-Scale Systems: Experience with large scale systems, multi-region deployments, and distributed training (DeepSpeed, FSDP).
- Advanced RAG & Model Training: Implementation of query routing, text embeddings, instruction tuning, RLHF, or DPO for domain adaptation.
- Document Analysis & Multi-Modal AI: Understanding of document analysis and experience with vision-language models (GPT-4V, Claude 3) or document layout analysis for text, table, and image extraction.
- Cost Optimization & Research: Track record of reducing inference costs through caching, prompt compression, or model distillation.