Time: 9:00-10:30, June 27, 2016
Venue: Room 202, Office Building, Software Campus
Title: Speeding Up Inference for Deep Neural Networks
Abstract:
We know that deep neural networks can classify image objects and others with high accuracy. However, in feedforward inference, layer-by-layer processing can incur long delays, which would be intolerable for real-time applications such as millimeter wave antenna control for emerging 5G cellular systems.
We observe that not all data items are equal in their recognition difficulties. In particular, some samples may be relatively easy, in the sense that a deep neural network can quickly classify them via early exit, thereby skipping all later layers to allow sped-up inference. This presentation will describe Dynamic Adaptation during Testing (DAT), a method that can exploit this observation, by automatically configuring early-exit criteria. By adapting to the given test set at hand, DAT can significantly shorten inference time without retraining the network. We have evaluated the DAT method by augmenting a well-studied network (ResNet). We demonstrate that DAT can automatically shorten inference latency for easy test samples by 7.4x and for hard samples by 2.8x, using the same pre-trained network. DAT is joint work with Harvard graduate students, Brad McDanel and Surat Teerapittayanon.
Bio: HT Kung is William H. Gates Professor of Computer Science and Electrical Engineering at Harvard University. He is interested in computer systems, networking, sensing and wireless communications, with a current focus on machine learning, high-performance computing and the Internet of Things. He received his Ph.D. at Carnegie Mellon University in 1974, and then taught there for 19 years before coming to Harvard in 1992. Professor Kung is well-known for his pioneering work on systolic arrays in parallel processing and optimistic concurrency control in database systems. His academic honors include membership in the National Academy of Engineering and the ACM SIGOPS 2015 Hall of Fame Award (with John Robinson) that recognizes the most influential Operating Systems papers that were published at least ten years in the past.