Key Responsibilities
- Optimize ML inference engines for GPU acceleration
- Develop quantization and pruning techniques for model efficiency
- Profile and benchmark inference performance across hardware
- Implement compiler optimizations for ML workloads
- Collaborate with hardware teams to improve compute efficiency
- Automate performance testing and regression detection
Requirements
- 5+ years in systems programming or performance engineering
- Expertise in C++ and GPU computing (CUDA/OpenCL)
- Experience with compiler toolchains and optimization techniques
- Knowledge of ML model architectures and bottlenecks
- Strong debugging and profiling skills