logo

DeepInfra

Inference Optimization Engineer - DeepInfra

Department
Engineering
Job Type / Location
remote
Experience Required
5+ years
Posted On

Key Responsibilities

  • Optimize inference engines for maximum throughput and minimal latency
  • Develop GPU-accelerated algorithms for large language models
  • Implement model quantization and pruning techniques
  • Profile and benchmark inference performance across hardware
  • Collaborate with hardware teams to maximize hardware utilization
  • Create tools for automated performance testing and validation

Requirements

  • 5+ years in systems programming or performance engineering
  • Expertise in C++ and GPU computing (CUDA/OpenCL)
  • Experience with model optimization techniques
  • Strong debugging and profiling skills
  • Knowledge of computer architecture and memory hierarchies

View Assessment Process

Think you'll be a good fit?