logo

Deepinfra Inc.

Inference Optimization Engineer

Department
Engineering
Job Type / Location
remote
Experience Required
5+ years
Posted On

Key Responsibilities

  • Optimize inference engines for GPU acceleration
  • Implement quantization and pruning techniques for model efficiency
  • Develop low-latency inference pipelines
  • Profile and benchmark performance bottlenecks
  • Collaborate with ML engineers to integrate optimized models
  • Research novel optimization techniques for large language models

Requirements

  • 3+ years of experience in systems programming or ML optimization
  • Expertise in C++ and GPU computing (CUDA/OpenCL)
  • Experience with model quantization and pruning
  • Strong understanding of computer architecture
  • Familiarity with LLMs and transformer architectures

View Assessment Process

Think you'll be a good fit?