About the Role
To build truly global AI, our models must be trained on data that reflects the world's diversity of languages and cultures. We are searching for a Research Engineer to own the quality and coverage of the data behind our models. You will be our in-house expert on global data, ensuring our models perform exceptionally well across dozens of languages. You have a keen eye for linguistic nuance, and a passion for building inclusive and representative datasets at scale.
Your Impact
- Design and build large-scale datasets for model training, and run controlled modeling experiments to measure their impact on model performance and behavior.
- Build evaluations of speech models, both via manual annotation and at scale with automated metrics.
- Implement techniques for steering data generation to improve model intelligence through data and mitigate bias.
- Build automated quality control systems to validate and filter generated data.
- Partner with product teams to ensure support for key languages and markets.
What You Bring
- Experience building or working with large multilingual datasets.
- Experience with generative models (speech, text, or multimodal).
- Ability to help guide human annotation and evaluation across multiple languages.
- Strong applied ML background with a focus on data-centric approaches.
- Excitement for building scalable systems that bridge research and production.