Work Partners CHANGE About us Blog Events Careers Top 3%Apply

DeepInfra

Inference Optimization Engineer - DeepInfra

Department: Engineering
Job Type / Location: remote
Experience Required: 5+ years
Posted On: June 1, 2026

Key Responsibilities

Optimize inference engines for maximum throughput and minimal latency
Develop GPU-accelerated algorithms for large language models
Implement model quantization and pruning techniques
Profile and benchmark inference performance across hardware
Collaborate with hardware teams to maximize hardware utilization
Create tools for automated performance testing and validation

Requirements

5+ years in systems programming or performance engineering
Expertise in C++ and GPU computing (CUDA/OpenCL)
Experience with model optimization techniques
Strong debugging and profiling skills
Knowledge of computer architecture and memory hierarchies

View Assessment Process

Think you'll be a good fit?