Company Description

At Amwell, we’re transforming healthcare for all—powered by technology and inspired by people. Here, your ideas don’t just matter—they drive real change, improving lives on a global scale.

We marry technology and innovation with clinical excellence to provide trusted solutions that solve the healthcare industry’s biggest pain points and are on a mission to enable greater access to more convenient, affordable, and effective care.

We do this through our technology-enabled care platform that is designed to help our clients achieve their digital care ambitions – today and in the future. We offer programs spanning the full care continuum , including urgent, acute and specialty care, behavioral health, and services for the treatment of chronic conditions such as heart and cardiometabolic diseases. Programs are powered by Amwell as well as our growing partner network.

For almost two decades, Amwell has proudly served some of the largest and most sophisticated healthcare organizations in the U.S. and worldwide. Our team is passionate about technology’s role in transforming care delivery and making it more equitable, accessible, efficient, cost-effective and navigable for all.

Brief Overview

As a Staff Site Reliability Engineer (P4), you will define and elevate the reliability standards across the platform. This role goes beyond owning individual services — you will establish the patterns, practices, and tooling that enable all teams to build and operate reliable systems at scale.

You will operate across team boundaries, identifying systemic reliability risks and designing cross-cutting solutions that improve the overall health of the platform. Acting as a bridge between service-level reliability and organizational maturity, you will help ensure reliability becomes a built-in property of the system rather than a reactive effort.

This role combines deep technical expertise with strong leadership and influence. You will mentor senior engineers, guide architectural decisions, and promote a culture of proactive reliability, observability, and operational excellence across the organization.

Core Responsibilities

Define and evolve reliability standards, patterns, and tooling adopted across the platform.
Own the reliability posture for critical service domains and drive architectural reviews to ensure reliability, operability, and recovery are first-class concerns.
Design and implement cross-cutting reliability mechanisms such as circuit breakers, retry policies, graceful degradation, and load shedding.
Establish and maintain scalable SLO frameworks that teams can adopt with minimal friction.
Lead complex, multi-service incident response as an incident commander and drive high-quality postmortems focused on systemic improvements.
Identify recurring incident patterns and implement structural solutions to prevent future failures.

Staff Site Reliability Engineer

View Assessment Process

Think you'll be a good fit?