logo

Deepinfra Inc.

Inference Optimization Engineer - Deepinfra Inc.

Department
Engineering
Job Type / Location
remote
Experience Required
5+ years
Posted On

Key Responsibilities

  • Optimize ML inference engines for GPU acceleration
  • Develop quantization and pruning techniques for model efficiency
  • Profile and benchmark inference performance across hardware
  • Implement compiler optimizations for ML workloads
  • Collaborate with hardware teams to improve compute efficiency
  • Automate performance testing and regression detection

Requirements

  • 5+ years in systems programming or performance engineering
  • Expertise in C++ and GPU computing (CUDA/OpenCL)
  • Experience with compiler toolchains and optimization techniques
  • Knowledge of ML model architectures and bottlenecks
  • Strong debugging and profiling skills

View Assessment Process

Think you'll be a good fit?