About the Role
Join the NVIDIA Enterprise Software Infrastructure Platform team as a Senior Staff Platform Infrastructure Engineer. We are building a best-in-class platform to enable accelerated computing for a variety of Enterprise applications. Our team is rapidly growing and passionate about building the infrastructure that powers NVIDIA's Enterprise AI software offerings. As a Senior Staff Engineer, you will be a key contributor to our core platform and infrastructure, designing and implementing features that empower our engineering teams to build and deliver cutting-edge software products.
What you'll be doing:
- Lead the design, development, and maintenance of core platform infrastructure components and services.
- Drive the adoption of best practices for scalability, reliability, security, and performance across the platform.
- Collaborate with cross-functional teams to understand their infrastructure needs and provide innovative solutions.
- Mentor junior engineers and contribute to a culture of technical excellence and continuous improvement.
- Participate in on-call rotations and provide operational support for critical infrastructure systems.
What we need to see:
- BS, MS, or PhD in Computer Science or a related technical field, or equivalent experience.
- 8+ years of experience in software development with a focus on platform infrastructure, distributed systems, or site reliability engineering.
- Strong proficiency in one or more programming languages such as C++, Java, Python, Go, or Rust.
- Expertise with cloud infrastructure technologies (e.g., Kubernetes, Docker, public cloud platforms).
- Experience with CI/CD pipelines and tools (e.g., Git, Jenkins, TeamCity).
- Deep understanding of microservices architecture, distributed systems, and SRE principles.
- Excellent problem-solving skills, with a track record of tackling complex technical challenges.
- Strong communication and collaboration skills, with the ability to influence technical direction.
Ways to stand out from the crowd:
- Experience building and operating large-scale, highly available distributed systems.
- Contributions to open-source projects related to platform infrastructure.
- Familiarity with observability tools and practices (monitoring, logging, alerting).
- Experience working in an Agile/Scrum development environment.