Stripe is a financial infrastructure platform for businesses, processing over $1.9T in payments volume annually. The ML Platform team builds platforms and services that enable ML engineers and data scientists across Stripe to take data and build features and models from prototype to production reliably, at low latency, and at scale. They work closely with product teams, data scientists, and platform infrastructure teams to build powerful, flexible, and user-friendly systems that substantially increase ML velocity across the company.
As a Staff Engineer, Machine Learning Platform, you will serve as a technical lead across the ML Platform space and a key contributor to the evolution of the platforms that power Stripe's ML-driven products. You will be empowered to make decisions with a large impact on Stripe, influencing investments and strategy while making systems more reliable, secure, and a delight to use. This role involves cross-functional collaboration with other tech staff, data science, product, and senior leadership to maximize the impact of ML at Stripe. You will help define the long-term strategy and lead the technical direction for the next generation of ML infrastructure.
Responsibilities include taking ownership of end-to-end architecture and system design for complex ML Platform projects, defining technical directions for projects with high ambiguity, and transforming complex user needs into long-lasting platform strategy. You will design system architecture and solutions for challenging problems in the ML Platform domain, such as low-latency model inference, large-scale feature stores, real-time monitoring, and LLM/agent orchestration. The role also involves turning high-leverage ideas into tangible, robust solutions that shape platform and product roadmaps, scoping and leading large projects with significant business impact, and working directly with ML engineers, data scientists, and product teams to translate their needs into scalable technical solutions. You will arbitrate critical decisions balancing competing priorities while meeting latency, reliability, cost, and security constraints, serve as a key engineering representative, advise leadership on technical considerations related to the end-to-end ML lifecycle, drive cross-team technical initiatives, and mentor other engineers.
Minimum requirements include 10+ years of professional software development experience with a solid background in service-oriented architecture and large-scale distributed systems, a track record of serving as a technical lead, experience working on production ML platform services, strong product instincts, and communication skills. Demonstrated ability to work cross-functionally and thrive with autonomy in ambiguous environments is also required, along with hands-on experience using AI tools to accelerate work.
Preferred qualifications include experience building large-scale serving or data infrastructure for machine learning use cases (e.g., model inference, feature stores, real-time feature computation, model registries), familiarity with LLMs, LLM frameworks, and agentic AI patterns, and experience rapidly developing prototypes. Familiarity with cloud services (e.g., AWS) and cloud-based AI/ML services (e.g., SageMaker, Bedrock, Databricks, OpenAI), experience training and shipping machine learning models, ability to synthesize ideas and set a technical vision, comfort working with geographically distributed teams, and a passion for side-projects or open source are also preferred.