logo

Deepinfra Inc.

Inference Engineer - Deepinfra Inc.

Department
Engineering
Job Type / Location
remote
Experience Required
5+ years
Posted On

Key Responsibilities

  • Optimize and deploy ML models for high-performance inference at scale
  • Develop low-latency systems for real-time AI applications
  • Implement quantization, pruning, and other optimization techniques
  • Collaborate with hardware teams to maximize hardware utilization
  • Benchmark and profile inference performance across different platforms
  • Ensure reliability and efficiency of production inference pipelines

Requirements

  • 3+ years in systems programming or ML inference optimization
  • Expertise in C++ and Python for performance-critical applications
  • Experience with GPU computing and CUDA programming
  • Knowledge of model optimization techniques and hardware acceleration
  • Strong debugging and profiling skills for performance tuning

View Assessment Process

Think you'll be a good fit?