Work Partners CHANGE About us Blog Events Careers Top 3%Apply

DeepInfra

AI Inference Engineer - DeepInfra

Department: Engineering
Job Type / Location: remote
Experience Required: 5+ years
Posted On: June 3, 2026

Key Responsibilities

Develop high-performance inference engines for AI models across diverse hardware platforms
Optimize model architectures for low-latency and high-throughput inference
Implement quantization, pruning, and other optimization techniques
Collaborate with hardware teams to leverage GPU/TPU acceleration
Design benchmarking frameworks to evaluate inference performance
Ensure cross-platform compatibility and scalability of inference solutions

Requirements

5+ years of experience in AI inference or related fields
Expertise in Python and C++ with GPU programming experience
Strong understanding of model optimization techniques
Experience with CUDA, OpenCL, or similar acceleration frameworks
Familiarity with AI frameworks (PyTorch, TensorFlow) and deployment tools

View Assessment Process

Think you'll be a good fit?