Machine Learning Engineer, Applied AI
As a MLE you'll join our Product Innovations team and work across the full applied ML stack - deploying models, building the evaluation systems that tell us whether they actually work, and making the data and infrastructure decisions that turn experimental data science into cost-efficient products. You'll partner closely with our Data Science and Engineering teams on our vector embeddings ecosystem, ground truth pipelines, model evaluation, and the pre/post-processing decisions that determine product quality.
This is a production focused role, with some research opportunities. You'll be the engineer who makes sure our ML systems - both traditional NLP and embedding models and our LLM-powered features - work reliably at scale (millions of records per day), are continuously evaluated against ground truth, and improve over time.
What you'll do
- Deploy and monitor ML systems in production, from classical NLP and embedding models to LLM-powered features - where "production" means millions of records per day
- Own the evaluation stack - golden datasets, "model-as-a-judge" frameworks, inter-annotator agreement, and regression tests that gate releases
- Build and maintain our vector embeddings ecosystem and the retrieval, classification, and similarity patterns that sit on top of it
- Partner with Data Science on annotation workflows, PII scrubbing, and ground-truth pipelines
- Improve our MLOps foundations - versioning, observability, drift detection - so the rest of the team can ship faster
- Translate fuzzy product problems into measurable AI features with clear success criteria
What you've done
- 4–7 years of professional software or ML engineering experience, including 2+ years shipping ML systems to production
- Strong Python; comfort with the modern data/ML stack
- Hands-on experience deploying and monitoring models in at least one major cloud (AWS or GCP); willingness to learn the other
- Production experience with NLP or ML systems - classification, NER, embeddings, ranking, similarity, or LLM-powered features (most candidates have done some mix of traditional ML and LLM work; we care that you've shipped, not which camp you came up in)
- Practical experience with evaluation for ML or LLM systems - golden datasets, model-as-a-judge, IAA, precision/recall, or equivalent. You don't need to have built one from scratch, but you should know why they matter and how to improve them
- Collaborative communicator - you work well alongside data scientists and engineers, and can clearly explain ideas, requirements, and tradeoffs to non-technical stakeholders
Bonus
- Experience with vector databases or retrieval systems at scale
- Experience with managed ML services on AWS (SageMaker) and/or GCP (Vertex AI)
- Annotation workflow experience (Label Studio, Scale AI, or similar) and a point of view on inter-annotator agreement
- Familiarity with PII scrubbing patterns and privacy-by-design data handling
- Open-source contributions, blog posts, or talks on LLM/embedding production work