Work Partners CHANGE About us Blog Events Careers Top 3%Apply

Deepinfra Inc.

Inference Engineer - Deepinfra Inc.

Department: Engineering
Job Type / Location: remote
Experience Required: 5+ years
Posted On: June 1, 2026

Key Responsibilities

Optimize and deploy ML models for high-performance inference at scale
Develop low-latency systems for real-time AI applications
Implement quantization, pruning, and other optimization techniques
Collaborate with hardware teams to maximize hardware utilization
Benchmark and profile inference performance across different platforms
Ensure reliability and efficiency of production inference pipelines

Requirements

3+ years in systems programming or ML inference optimization
Expertise in C++ and Python for performance-critical applications
Experience with GPU computing and CUDA programming
Knowledge of model optimization techniques and hardware acceleration
Strong debugging and profiling skills for performance tuning

View Assessment Process

Think you'll be a good fit?