Key Responsibilities
- Develop high-performance inference engines for AI models
- Optimize model architectures for low-latency and high-throughput inference
- Implement GPU-accelerated computing solutions
- Collaborate with ML teams to integrate optimized models into production systems
- Profile and benchmark inference performance
- Ensure compatibility across diverse hardware platforms
Requirements
- 2+ years of experience in systems programming or AI inference
- Strong proficiency in C++ and Python
- Experience with GPU computing (CUDA/OpenCL) and model optimization
- Knowledge of neural network architectures and performance tuning
- Familiarity with Linux and performance profiling tools