logo

ABBYY

Senior Machine Learning Engineer - AI-Assisted Data Annotation

Department
Engineering
Job Type / Location
Bangalore
Experience Required
5+ years
Posted On

About the Role

We are seeking a Senior Machine Learning Engineer – AI-Assisted Data Annotation to own the automated annotation track within ABBYY’s Document AI Data team. This role sits at the intersection of large model capabilities and production data engineering, leveraging LLMs and vision-language models to generate high-quality training data at scale. You will design and build AI-assisted annotation pipelines, ensuring outputs are accurate, measurable, and reliable for downstream model training. This is an ideal role for engineers who combine deep model expertise with strong system-building instincts and thrive in fast-moving, experimental environments.

Key Responsibilities

Technical Development & Innovation

  • Design and implement AI-powered annotation pipelines using large models to generate ground truth labels at scale.
  • Develop and refine prompting strategies, few-shot examples, and fine-tuning approaches to improve accuracy and consistency.
  • Build systems for label verification, confidence scoring, and quality validation.
  • Evaluate which tasks are suitable for automated annotation vs. human review, and define decision criteria.
  • Create evaluation frameworks to benchmark automated annotations against human-labeled data.
  • Continuously improve annotation quality using feedback from human review workflows.

Project Ownership & Leadership

  • Own the automated annotation track end-to-end, from architecture through production monitoring.
  • Drive technical decisions across model selection, pipeline design, and validation strategies.
  • Define integration points with platform infrastructure and model serving systems.
  • Collaborate with Data Operations to design human-in-the-loop workflows for efficient review.
  • Contribute to roadmap planning with Principal-level technical leadership.

Infrastructure & Scale

  • Build and optimize large-scale inference pipelines for processing millions of documents.
  • Implement monitoring and alerting for quality degradation and system failures.
  • Design batching, caching, and fallback mechanisms to balance cost, throughput, and accuracy.
  • Collaborate with Platform teams on model serving, APIs, and infrastructure scaling.
  • Maintain clear documentation of annotation strategies, metrics, and known limitations.

Qualifications

Education & Experience

  • MS or PhD in Computer Science, Engineering, Mathematics, or related field.
  • 5+ years of experience in Machine Learning / AI, with focus on:
    • Large Language Models (LLMs)
    • Vision-Language Models (VLMs)
    • Data annotation or labeling systems
  • Demonstrated success using large AI models to automate annotation at production scale.
  • Strong background in evaluation design and quality measurement.

Technical Expertise

  • Deep expertise in LLMs and VLMs, including prompting, instruction tuning, and output evaluation.
  • Strong understanding of document understanding tasks (classification, extraction, layout analysis, semantic parsing).
  • Experience designing label quality metrics, confidence scoring, and agreement analysis.
  • Strong programming skills in Python and proficiency with PyTorch or similar frameworks.
  • Experience with large-scale inference pipelines and model serving systems.
  • Familiarity with human-in-the-loop annotation systems and automation trade-offs.

Leadership & Communication

  • Proven ability to independently own complex technical workstreams.
  • Strong collaboration with data operations, platform, and modeling teams.
  • Ability to clearly communicate quality trade-offs and system behavior to diverse stakeholders.
  • Rigorous, data-driven problem-solving approach.

View Assessment Process

Think you'll be a good fit?