Work Partners CHANGE About us Blog Events Careers Top 3%Apply

Deepinfra Inc.

Inference Optimization Engineer

Department: Engineering
Job Type / Location: remote
Experience Required: 5+ years
Posted On: May 13, 2026

Key Responsibilities

Optimize inference engines for GPU acceleration
Implement quantization and pruning techniques for model efficiency
Develop low-latency inference pipelines
Profile and benchmark performance bottlenecks
Collaborate with ML engineers to integrate optimized models
Research novel optimization techniques for large language models

Requirements

3+ years of experience in systems programming or ML optimization
Expertise in C++ and GPU computing (CUDA/OpenCL)
Experience with model quantization and pruning
Strong understanding of computer architecture
Familiarity with LLMs and transformer architectures

View Assessment Process

Think you'll be a good fit?