logo

openai

Site Reliability Engineer, Frontier Systems Infrastructure

Department
Engineering
Job Type / Location
remote
Experience Required
3+ years
Posted On

As a Site Reliability Engineer on the Frontier Systems Infrastructure team at OpenAI, you will be responsible for designing and implementing scalable and reliable systems infrastructure. You will work closely with the engineering teams to ensure the reliability and performance of our systems, and collaborate with other teams to identify and prioritize infrastructure needs.

Key Responsibilities:

  • Design and implement scalable and reliable systems infrastructure
  • Collaborate with engineering teams to ensure system reliability and performance
  • Identify and prioritize infrastructure needs with other teams
  • Develop and maintain monitoring and alerting systems
  • Improve system efficiency and reduce costs

Requirements:

  • 3+ years of experience in site reliability engineering or a related field
  • Strong understanding of cloud computing platforms, such as AWS
  • Experience with containerization and orchestration tools, such as Docker and Kubernetes
  • Proficiency in programming languages, such as Python and Node.js
  • Strong problem-solving skills and ability to work in a fast-paced environment

View Assessment Process

Think you'll be a good fit?