logo

Adyen

Senior Observability Infrastructure Engineer

Department
Engineering
Job Type / Location
remote
Experience Required
9+ years
Posted On

This is Adyen

Adyen provides payments, data, and financial products in a single solution for customers like Facebook, Uber, H&M, and Microsoft - making us the financial technology platform of choice. At Adyen, everything we do is engineered for ambition.

For our teams, we create an environment with opportunities for our people to succeed, backed by the culture and support to ensure they are enabled to truly own their careers. The people of Adyen are motivated individuals who tackle unique technical challenges at scale and solve them as a team. Together, we deliver innovative and ethical solutions that help businesses achieve their ambitions faster.

Senior Observability Infrastructure Engineer

We are looking for an experienced Observability Infrastructure Engineer to join our Platform Engineering organization. You will be part of the team responsible for building and running Observability pillars on premise and on Kubernetes. Our systems collect, process, and store the logs, metrics, and traces that allow hundreds of product teams to monitor their services in real time.

This is a role for a builder and a problem solver who enjoys deep technical troubleshooting across distributed systems and then turns recurring issues into automated, repeatable solutions. You will work in a large-scale environment where we manage petabytes of data and thousands of servers. We are currently in the middle of a major transformation:  focusing on automation of operations and enabling self service for our users.

What you will do

  • Build the next generation of our platform: Design and implement the future architecture of our logging and metrics systems. You will play a key role in redesigning our infrastructure to support new global regions, ensuring data isolation and regulatory compliance in different geographies, and more.
  • Own infrastructure operations: You will take full ownership of our hybrid infrastructure, managing the lifecycle of over 1,500 servers across both bare-metal and Kubernetes environments.
  • Automate to reduce toil: You will write code in Go or Python to eliminate manual operational tasks. Your goal is to build self-healing systems that do not require manual intervention during the night. You will improve our CI pipelines to ensure that changes to our clusters are safe, predictable, and automated.
  • Optimize for scale and performance: You will dive deep into performance bottlenecks within our distributed tracing and logging pipelines. We deal with high-volume data streams that can overwhelm standard configurations. You will tune our Elasticsearch clusters, optimize Prometheus and VictoriaMetrics storage, and ensure our OpenTelemetry implementation can handle peak traffic without missing a beat.
  • Reliability and Engineering : You will participate in on-call rotations, but your primary focus will be engineering solutions that st

View Assessment Process

Think you'll be a good fit?