logo

Cohere

Site Reliability Engineer, Inference Infrastructure

Department
Engineering
Job Type / Location
remote
Experience Required
3+ years
Posted On

About the Role

Cohere is seeking a Site Reliability Engineer, Inference Infrastructure. This role focuses on the reliability and performance of our inference infrastructure.

Responsibilities

  • Ensure high availability and reliability of Cohere's inference systems.
  • Optimize the performance and scalability of inference infrastructure.
  • Work across teams to troubleshoot and resolve production issues.
  • Implement and maintain monitoring, alerting, and logging solutions.
  • Develop automation to streamline operational tasks.

Requirements

  • Experience with site reliability engineering or a similar role.
  • Strong background in managing large-scale distributed systems.
  • Proficiency in cloud platforms (e.g., AWS, GCP, Azure).
  • Experience with containerization and orchestration (e.g., Docker, Kubernetes).
  • Solid understanding of Linux operating systems.
  • Ability to work in a fast-paced, dynamic environment.

View Assessment Process

Think you'll be a good fit?