Abstract: Deep Learning applications have tremendous computation requirements, making running them on traditional computers (CPUs) very inefficient. Modern computer systems deploy hardware acceleration, which involves offloading compute-intensive and memory-intensive tasks to specialized hardware. In the space of hardware acceleration alternatives, FPGAs lie in the middle of the programmability-efficiency spectrum, with GPUs being more programmable and ASICs being more efficient. FPGAs provide massive parallelism and are reconfigurable, which makes them very well suited for the fast-changing needs of DL applications. But how can we minimize the gap between ASICs and FPGAs in terms of performance and efficiency, while retaining their strength - the reconfigurability?
This talk will dive into research that attempts to answer this question by exploring better reconfigurable fabrics for Deep Learning. The evolution of FPGAs into domain-optimized reconfigurable fabrics will be discussed. Specifically, new blocks called Tensor Slices and CoMeFa RAMs will be introduced. This talk will provide a peek into the architecture of these blocks and the performance improvement and energy reduction that can be obtained for DL applications by using modern FPGAs containing these blocks. The methodology for FPGA architecture exploration, including tools and benchmarks, will be discussed as well. The talk will conclude with a forward look into the challenges and possibilities that exist within the realm of reconfigurable computing for Deep Learning.
Bio: Aman Arora is a PhD candidate in the Department of Electrical and Computer Engineering at The University of Texas at Austin. His research interests are in the area of reconfigurable computing and domain-specific acceleration for machine/deep learning. His PhD dissertation research focuses on optimizing the architecture of FPGAs to make them better Deep Learning accelerators. His work received a Best Paper Award at the IEEE FCCM conference in 2022, and he currently holds a fellowship from the UT Austin Graduate School. He has over 12 years of experience in the semiconductor industry in design, verification, testing and architecture roles. Most recently, he worked in the GPU Deep Learning architecture team at NVIDIA. He is in the job market for a faculty position, looking to start as an Assistant Professor in Summer/Fall 2023.