Work Partners CHANGE About us Blog Events Careers Top 3%Apply

Deepinfra Inc.

Inference Optimization Engineer - Deepinfra Inc.

Department: Engineering
Job Type / Location: remote
Experience Required: 5+ years
Posted On: June 3, 2026

Key Responsibilities

Optimize ML inference engines for GPU acceleration
Develop quantization and pruning techniques for model efficiency
Profile and benchmark inference performance across hardware
Implement compiler optimizations for ML workloads
Collaborate with hardware teams to improve compute efficiency
Automate performance testing and regression detection

Requirements

5+ years in systems programming or performance engineering
Expertise in C++ and GPU computing (CUDA/OpenCL)
Experience with compiler toolchains and optimization techniques
Knowledge of ML model architectures and bottlenecks
Strong debugging and profiling skills

View Assessment Process

Think you'll be a good fit?