logo

Scale AI

Machine Learning Engineer, Global Public Sector

Department
Engineering
Job Type / Location
Doha
Experience Required
3+ years
Posted On

About the Role

Scale is hiring ML Research Engineers to bridge the gap between emerging AI capabilities and mission-critical, real-world impact. In our Global Public Sector (GPS) division, we don’t just implement tools; we conduct applied research to solve the unique challenges of sovereign AI.

Your role is to move beyond off-the-shelf implementations. You will lead the research into Agent Design, Reliability, and AI Safety, developing novel system architectures that power high-stakes government applications. You will be the bridge between a research paper and a production-ready system that functions at scale.

The Mission

  • Applied Agent Research: Leading the design of reliable, multi-step agentic systems and long-horizon reasoning frameworks that can solve complex problems for national security and public policy.
  • Systemic Evaluation & Red-Teaming: Developing rigorous benchmarks and evaluation protocols to ensure AI systems are safe, unbiased, and performant in high-stakes, non-commercial environments.
  • Model Optimisation & Selection: Conducting deep-dive research into model performance (both open-weight and closed) to identify the best tools for niche domains, optimising them through context engineering, RAG, and other inference-time techniques.

What You Will Do

  • Architect Agentic Systems: Design and build agent architectures, the harnesses, tool-use protocols, and logic flows that allow LLMs to function as reliable, autonomous agents in complex workflows.
  • Drive Reliability & Safety: Research and implement robust evaluation frameworks. This includes red-teaming for sovereign AI requirements and developing strategies to mitigate hallucinations in regulated data environments.
  • Synthesise Deep Research: Build agents capable of autonomous information synthesis and long-horizon reasoning, enabling users to analyse massive datasets and extract actionable insights.
  • Optimize for Niche Domains: Evaluate and adapt models for specialised use cases, such as LLM reasoning for low-resource languages, complex OCR tasks, or working in GPU-constrained environments
  • Build Evaluation Frontiers: Create new, automated benchmarks that define what success looks like for AI in the public sector, ensuring our systems meet the highest standards of accuracy and sovereignty.
  • Consult as a Technical Authority: Act as a subject matter expert for public sector leaders, advising on the practical limits, safety requirements, and performance trade-offs of emerging AI technologies.

Ideally, You Have

  • Engineering Rigour: Exceptional proficiency in Python and experience building agentic harnesses or AI infrastructure. You write production-ready code that is modular, scalable, and reliable.
  • Applied Research Mindset: A track record of taking theoretical AI concepts and turning them into functional prototypes or products. You know how to read a paper and determine if its methods are actually viable for a production system.
  • Evaluation Expertise: Experience in LLM benchmarking, red-teaming, or building evaluations that go beyond standard academic datasets.
  • Advanced Degree: A Master’s or PhD in Computer Science, Mathematics, or a related field (with a focus on ML) is preferred, but we value demonstrated impact and engineering excellence.

Nice to Haves

  • Agentic Systems Expert: Deep experience in building multi-agent systems, including chain-of-thought optimisation and tool-calling reliability.
  • Sovereign AI Experience: Experience working with highly regulated data environments, on-premise deployments, or sensitive government use cases.
  • Inference Optimisation: Knowledge of how to optimise model performance for environments with limited GPU capacity or specific latency requirements.
  • Zero-to-One Mindset: You are comfortable navigating ambiguity and enjoy defining research directions from scratch to solve a specific product or mission need.

View Assessment Process

Think you'll be a good fit?