Role Description
As AI HPC Engineer at ACL, you will be contributing to achieving ACL's mission of delivering advanced parallel computing research by:
- Building theoretical models that break down KLA's image processing algorithms, that leverage AI, in computing terms such as bandwidth, computational FLOPS, etc.
- Bridging the gap between the theoretical peak performance achievable on current and next-gen hardware such as GPUs and AI accelerators by enhancing the algorithm.
- Porting and optimizing algorithms on current and next-gen CPUs, GPUs, and AI accelerators by leveraging constructs in high-performance modern programming languages such as C++-14/C++-17, and low-level programming models such as SIMD extensions (SSE/AVX), CUDA, OpenVINO, etc.
- Exploring paths to achieve price-optimized-performance in next-generation devices that implement revolutionary new solutions to accelerate AI algorithms for training and inference.
Expected Background
- 3-7 Year's of Experience required in GPU Programming using CUDA.
- Graduates in Ph.D, MS in EE/CS/CSE.
- Bachelors graduates will also be considered with exceptional background and prior experience in HPC field.
- Strong foundation in computer architecture, with interest in high performance parallel processing at the device level (GPUs or CPUs/SIMD).
- Strong mental model of computational loads and mapping different algorithms to parallel architectures.
- Proficient in programming skills in C/C++/Python.
- Good understanding and exposure to the Linux operating system at the user level.
- Exposure to multiprocessor and multithreading concepts
- A self-motivated individual with good communication skills.
Bonus Skills
- Hands-on experience with GPU programming using CUDA, OpenCL or SYCL, and modern CPU programming constructs such as those in C++-14 / C++-17
- Exposure to profiling tools such as NSIGHT or VTUNE.
- Experience with large-scale distributed HPC systems.
- Familiarity with AI frameworks like TensorFlow.
- Hands-on work in developing and optimizing computer vision algorithms at scale.