Role Summary

Embedded directly in a product team (such as search, chat, documents, or audio), you will enhance AI-powered features through rigorous evaluation, prompt and orchestration design, and rapid experimentation. You will own the AI quality end-to-end for your domain, defining, measuring, experimenting, and shipping improvements. This role involves close collaboration with the Science team to deliver measurable enhancements in quality, latency, safety, and reliability.

What you will do

Design and run evaluations for your product area, including reference tests, heuristics, and model-graded checks tailored to search relevance, chat quality, document understanding, or audio performance.
Define and track key metrics such as task success, helpfulness, hallucination proxies, safety flags, latency, and cost.
Own prompt and orchestration design, which involves writing, testing, and iterating on prompts and system prompts.
Run A/B tests on prompts, models, and configurations; analyze results; and make data-driven rollout or rollback decisions.
Set up observability for LLM calls, including structured logging, tracing, dashboards, and alerts.
Operate model releases, managing canary and shadow traffic, sign-offs, SLO-based rollback criteria, and regression detection.
Improve core behaviors in your product area, focusing on aspects like memory policies, intent classification, routing, tool-call reliability, or retrieval quality.
Create templates and documentation to enable other teams to author evaluations and ship safely.
Partner with the Science team to diagnose regressions and lead post-mortems.

About You

3-4 years of experience; ideal backgrounds include ML engineers transitioning closer to product, or software engineers with significant AI/ML production experience.
Strong TypeScript or Python skills, with team placement depending on fit.
Hands-on experience with production LLMs, including prompts, tool/function calling, and system prompts.
Proficiency in evals and A/B testing; able to design metrics, not just execute them.
Comfortable implementing directly in product code rather than solely in notebooks.
Experience with observability tools: logging, tracing, dashboards, and alerting.
Possess a strong product mindset: able to form hypotheses, run experiments, interpret results, and ship solutions.
Clear communication skills, autonomous, and focused on production impact over experimentation for its own sake.

Ideal Qualifications

Experience with safety systems: moderation, PII handling/redaction, guardrails.
Familiarity with release operations: canary/shadowing, automated rollbacks, experiment platforms.
Prior work on search ranking, chat systems, document AI, or audio ML features.

AI Engineer, Product

Role Summary

What you will do

About You

Ideal Qualifications

View Assessment Process

Think you'll be a good fit?