logo

NVIDIA

Senior Software Engineer - GPU Workload Automation

Department
Engineering
Job Type / Location
Santa Clara
Experience Required
5+ years
Posted On

About the Role

Join the NVIDIA GPU Workload Automation Team where we are building a large-scale automation system to orchestrate the lifecycle of GPU workloads for internal engineering and research teams. We are passionate about our work, and constantly looking for ways to improve performance, scalability, and usability. If you share our passion for GPU computing, large-scale distributed systems, and automation, we want to hear from you!

What you'll be doing:

  • Design and implement highly scalable and reliable distributed systems for GPU workload automation.
  • Develop and maintain tools and services for job scheduling, resource management, and performance monitoring.
  • Collaborate with engineering and research teams to understand their needs and deliver innovative solutions.
  • Troubleshoot and debug complex issues in a distributed environment.
  • Stay up-to-date with the latest trends and technologies in GPU computing, distributed systems, and automation.

What we need to see:

  • BS or MS in Computer Science or a related field, or equivalent experience.
  • 5+ years of experience in software development, with a focus on large-scale distributed systems.
  • Strong programming skills in Python or Go.
  • Experience with Kubernetes, Docker, and other containerization technologies.
  • Experience with cloud platforms such as OpenStack, AWS, Azure, or GCP.
  • Experience with CI/CD tools such as GitLab CI or Jenkins.
  • Experience with monitoring and logging tools such as Grafana and Prometheus.
  • Strong problem-solving and debugging skills.
  • Excellent communication and collaboration skills.

Ways to stand out from the crowd:

  • Experience with GPU computing and deep learning frameworks (e.g., TensorFlow, PyTorch).
  • Experience with job schedulers such as Slurm, LSF, or PBS.
  • Experience with MPI libraries such as OpenMPI, MVAPICH, or Intel MPI.
  • Experience with Jupyter notebooks and other interactive computing environments.
  • Contributions to open-source projects.

View Assessment Process

Think you'll be a good fit?