logo

CoreWeave

Senior Software Engineer, Compute Architecture

Department
Engineering
Job Type / Location
Livingston, WA
Experience Required
5+ years
Posted On

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com .

About the Role

As a Senior Software Engineer within our Compute Architecture organization, you will help build the software control plane for hardware lifecycle management across large-scale GPU data centers. The METALDEV team builds Go-based distributed services that bring infrastructure online, monitor production hardware health, automate safe operational workflows, and give operators the observability and control needed to manage GPU servers and rack-scale systems with reliability and confidence. This is a software-first role at the intersection of distributed systems, production reliability, and hardware-aware automation, ideal for engineers who want their code to operate real-world infrastructure at massive scale.

What You’ll Do

  • Design, build, and operate Go-based services that manage the lifecycle of large-scale GPU data center infrastructure.
  • Build automation for data center bring-up, hardware discovery, health monitoring, remediation, and production operations.
  • Develop reliable APIs, services, and workflows for managing BMCs, firmware state, server health, and rack-level infrastructure.
  • Improve observability, alerting, and operational tooling so production issues can be detected, understood, and resolved quickly.
  • Translate incidents and hardware failure modes into software improvements that make the platform more resilient.
  • Partner with hardware-adjacent, infrastructure, operations, and software teams to design systems that work safely at fleet scale.

Who You Are

  • 5+ years of experience building and operating infrastructure or backend systems.
  • Bachelor’s or Master’s degree in Computer Science or a related field, or equivalent practical experience.
  • Strong proficiency in Go for building production services and tools.
  • Experience designing and building gRPC and REST APIs.
  • Experience with Kubernetes and containerized workloads in production environments.
  • Familiarity with observability tooling such as Prometheus and Grafana.

Preferred

  • Experience working with GPU-based systems.
  • Experience with low-level hardware management such as BMCs or Redfish.
  • Experience operating large-scale distributed systems or high-throughput infrastructure.
  • Experience collaborating with or contributing

View Assessment Process

Think you'll be a good fit?