logo

DeepInfra

AI Inference Engineer - DeepInfra

Department
Engineering
Job Type / Location
remote
Experience Required
5+ years
Posted On

Key Responsibilities

  • Develop high-performance inference engines for AI models across diverse hardware platforms
  • Optimize model architectures for low-latency and high-throughput inference
  • Implement quantization, pruning, and other optimization techniques
  • Collaborate with hardware teams to leverage GPU/TPU acceleration
  • Design benchmarking frameworks to evaluate inference performance
  • Ensure cross-platform compatibility and scalability of inference solutions

Requirements

  • 5+ years of experience in AI inference or related fields
  • Expertise in Python and C++ with GPU programming experience
  • Strong understanding of model optimization techniques
  • Experience with CUDA, OpenCL, or similar acceleration frameworks
  • Familiarity with AI frameworks (PyTorch, TensorFlow) and deployment tools

View Assessment Process

Think you'll be a good fit?