As a Site Reliability Engineer on the Frontier Systems Infrastructure team at OpenAI, you will be responsible for designing and implementing scalable and reliable systems infrastructure. You will work closely with the engineering teams to ensure the reliability and performance of our systems, and collaborate with other teams to identify and prioritize infrastructure needs.

Key Responsibilities:

Design and implement scalable and reliable systems infrastructure
Collaborate with engineering teams to ensure system reliability and performance
Identify and prioritize infrastructure needs with other teams
Develop and maintain monitoring and alerting systems
Improve system efficiency and reduce costs

Requirements:

3+ years of experience in site reliability engineering or a related field
Strong understanding of cloud computing platforms, such as AWS
Experience with containerization and orchestration tools, such as Docker and Kubernetes
Proficiency in programming languages, such as Python and Node.js
Strong problem-solving skills and ability to work in a fast-paced environment

Site Reliability Engineer, Frontier Systems Infrastructure

View Assessment Process

Think you'll be a good fit?