About the Role
Sarvam AI is building the next-generation media AI engine to automate movie and OTT content creation & localization across India’s diverse languages. We're seeking a Senior Machine Learning Engineer to oversee the orchestration of our end-to-end AI in Media pipeline, which combines cutting-edge models, agile workflows, and media-grade quality automation.
This role requires in-depth expertise in speech and audio AI, agile pipeline orchestration, and scalable ML system design knowledge. The ideal candidate has a strong track record of building and deploying production-grade ML pipelines, leading technical teams, and solving complex problems in audio/video processing and generative media workflows.
Key Responsibilities
- Own the design and evolution of Sarvam’s Media AI pipeline by combining agent-based orchestration, scalable ML integration, and rapid experimentation.
- Build and lead the development of production-grade ML pipelines, ensuring robustness, automation, and adaptability to media workflows.
- Prototype and deploy agentic systems (multi-agent frameworks, LLM-driven decision engines, self-optimizing workflows) to automate task sequencing, error handling, and quality evaluation.
- Drive continual improvements through prompt engineering, RAG, fine-tuning, and metric-driven performance monitoring.
- Define pipeline-wide standards for quality, testing, fallback mechanisms, and edge-case handling.
- Collaborate with model researchers, infra/platform teams, and product stakeholders to ensure end-to-end reliability.
- Provide technical leadership and mentorship across the orchestration and pipeline engineering team.
Technical Expertise
- 4-7 years of experience in ML/AI engineering, including at least 1+ years working with speech/audio, TTS, or translation models & related pipeline.
- Hands-on expertise in Generative AI: LLMs, prompt design, and building complex agentic systems.
- Strong proficiency in Python, PyTorch/TensorFlow, MCP and agent orchestration frameworks (LangChain, LangGraph, Langfuse, or custom DAGs).
- Proven track record of deploying ML pipelines in production at scale with monitoring and alerts, comfortable with CUDA profiling and tensor debugging.
- Familiarity with ASR/TTS/translation models such as Whisper, wav2vec, F5, Coqui TTS, FastSpeech, Bark, Wav2Lip, etc.
- Experience with cloud platforms (GCP/AWS), containerised deployments (Docker/Kubernetes), and scalable inference (Triton, ONNX, Kubeflow, MLflow) is a plus.
Leadership & Soft Skills
- Entrepreneurial spirit with hands-on experience in 0-to-1 startup phases.
- Proven ability to lead and mentor ML teams, balancing research rigour with production needs.
- Strong systems thinking and cross-functional coordination (models, infra, inference, QA, product).
- Passion for keeping pace with the latest in Gen AI, LLMs, and ML engineering.
- Experience working with media/OTT post-production pipelines is a plus.