About Stripe
Stripe is a financial infrastructure platform for businesses, used by millions of companies globally. Our mission is to increase the GDP of the internet, offering an opportunity to contribute to the global economy.
About the Team
The ML Platform team at Stripe is responsible for building the platforms and services that enable ML engineers and data scientists across Stripe to develop and deploy features and models from prototype to production reliably, with low latency, and at scale. We collaborate closely with product teams, data scientists, and platform infrastructure teams to create powerful, flexible, and user-friendly systems that significantly enhance ML velocity throughout the company. Stripe processes over $1.9 trillion in payments volume per year, providing a vast amount of data for machine learning opportunities.
What you’ll do
As a Staff Engineer, you will serve as a technical lead across the ML Platform space, contributing significantly to the evolution of platforms powering Stripe's ML-driven products. You will make high-impact decisions, influence investments and strategy, and enhance the reliability, security, and usability of our systems. This role involves cross-functional collaboration with tech staff, data science, product, and senior leadership to maximize the impact of ML at Stripe.
Responsibilities
- Take ownership of end-to-end architecture and system design for large, complex projects within ML Platform.
- Define technical directions for projects with high ambiguity, transforming complex user needs into long-lasting platform strategy.
- Design system architecture and solutions for challenging ML Platform problems, including low-latency model inference, large-scale feature stores, real-time monitoring, and LLM/agent orchestration.
- Translate high-leverage ideas into robust solutions that shape platform and product roadmaps, combining technical excellence with creative problem-solving.
- Scope and lead large projects with significant business impact, managing them from requirements through design, implementation, and production operation.
- Work directly with ML engineers, data scientists, and product teams to translate their needs into functional requirements and scalable technical solutions.
- Arbitrate critical decisions, balancing competing priorities while meeting latency, reliability, cost, and security constraints.
- Serve as a key engineering representative, engaging senior leaders across Stripe and advising leadership on technical considerations for the end-to-end ML lifecycle.
- Drive cross-team technical initiatives to improve ML development velocity and MLOps maturity company-wide.
- Mentor and grow other engineers, acting as a role model for designing, implementing, and operating excellent software systems.
Who you are
Minimum requirements
- 10+ years of professional software development experience, or equivalent domain expertise, with a strong background in service-oriented architecture and large-scale distributed systems.
- Track record of serving as a technical lead, providing technical direction, leading multi-team initiatives, and mentoring team members.
- Experience working on production ML platform services.
- Strong product instincts and a deep understanding of the business context.
- Strong communication skills to explain complex technical concepts to both technical and non-technical stakeholders.
- Demonstrated ability to work cross-functionally, collaborating effectively with ML engineers, data scientists, software engineers, product managers, and business stakeholders.
- Ability to thrive with high autonomy and responsibility, and comfort operating in ambiguous environments.
- Hands-on experience using AI tools to accelerate work.
Preferred qualifications
- Experience building large-scale serving or data infrastructure for machine learning use cases (e.g., model inference, feature stores, real-time feature computation, model registries).
- Familiarity with LLMs, LLM frameworks, and agentic AI patterns (e.g., tool use, multi-agent orchestration, retrieval-augmented generation).
- Experience rapidly developing prototypes and iterating based on user feedback.
- Familiarity with cloud services (e.g., AWS) and cloud-based AI/ML services (e.g., SageMaker, Bedrock, Databricks, OpenAI).
- Experience training and shipping machine learning models to production to solve critical business problems.
- Ability to synthesize ideas across the organization while setting a compelling technical vision.
- Comfortable working with geographically distributed teams.
- Passion for side-projects, open source, or self-driven technical initiatives.