This is Adyen
Adyen provides payments, data, and financial products in a single solution for customers like Meta, Uber, H&M, and Microsoft - making us the financial technology platform of choice. At Adyen, everything we do is engineered for ambition.
For our teams, we create an environment with opportunities for our people to succeed, backed by the culture and support to ensure they are enabled to truly own their careers. We are motivated individuals who tackle unique technical challenges at scale and solve them as a team. Together, we deliver innovative and ethical solutions that help businesses achieve their ambitions faster.
Platform Monitoring Engineer / Incident Manager
A team within Engineering under the Platform Excellence pillar exhibits an unwavering attention to detail and a deep understanding of the platform wide monitoring implications to all merchants.
In this role, you will be on-call monitoring platform performance, coordinating and commanding incidents, communicating with our customers, working on monitoring frameworks, providing feedback to product engineering teams to improve the reliability of the platform. You will initiate and lead initiatives across our platform offerings prioritizing merchant impact to proactively detect any issues, inform merchants quickly, and increase the reliability of our platform.
What you’ll do
- On-call: The team operates in a follow the sun model, where you will be participating in the EMEA shift (shifts: 9.00AM - 6.00PM). Observe platform and merchant performance and detect any issues proactively to mitigate risks in partnership with Engineering teams
- Incident Management : Coordinate the mitigation, recovery, and resolution of high-impact incidents, ensuring a rapid and effective response across teams. Represent the customer perspective during incidents, maintaining a strong customer-centric approach.
- Communication: Be an expert in communicating with merchants real time during an incident and present the most accurate and updated information to keep them informed. Escalate critical incidents when needed and provide structured communication to senior management.
- Problem Management : Go beyond reactive incident response by analyzing incident trends to identify recurring issues and systemic weaknesses. Partner with engineering and product teams to advocate for long-term fixes over repeated short-term patches.
- Working together with Operations, Product, and Engineering teams to integrate, grow, and continuously improve our monitoring strategy and increase our reliability.
- Investigate alerts and provide feedback to engineering teams to build effective logging and alerts across the platform architecture.
- Mitigate merchant impact risk by actioning on alerts in partnership with Engineering teams, and contribute to the monitoring playbook by documenting your learnings.
- Improve operations by leading/proj