Yung-Hsiang Lu is a professor at the Elmore Family School of Electrical and Computer Engineering of Purdue University. He is a fellow of the IEEE (2021), ACM Distinguished Scientist (2013), and ACM Distinguished Speaker (2013). In 2015-2019, he was a co-founder and adviser of a technology startup that received SBIR-1 and SBIR-2 (Small Business Innovation Research). In 2020-2022, he was the director of the John Martinson Engineering Entrepreneurial Center at Purdue University. His research topics include efficient computer vision for embedded systems, cloud and mobile computing. He leads a research project analyzing real-time video streams from thousands of network cameras. He is the lead organizer of the IEEE Low-Power Computer Vision Challenge since 2015. He has published two books: Intermediate C Programming (ISBN 9781498711630) and Low-Power Computer Vision: Improve the Efficiency of Artificial Intelligence (editor, ISBN 9780367744700).
Title: Efficient Computer Vision for Embedded Systems
Since deep learning became popular a decade ago, computer vision has been adopted by a wide range of applications. Many applications must run on embedded systems with limited resources (energy, time, memory capacity, etc). This speech will survey methods designed to improve efficiency of computer vision, including quantization, architecture search, and trade-off between accuracy and speed. A new architecture called modular neural network is introduced. This architecture breaks a deep neural network into multiple shallower networks and can significantly reduce the sizes of machine models and execution time. A modular neural network is a tree-like structure to progressively analyze different features in images and divide images into different groups based on visual similarities. Modular neural networks can be used for image classification, object counting, and re-identification. This speech will also explain how to use contextual information to reduce computation for convolution. Context suggests where objects may appear. For example, a vehicle may appear on a road but not in the sky. The contextual information can reduce the search space in object detection and improve execution time.
World-Wide Camera Networks
More than 80% consumer Internet traffic is for videos and most of them are recorded videos. Meanwhile, many organizations (such as national parks, vacation resorts, departments of transportation) provide real-time visual data (images or videos). These videos allow Internet users to observe events remotely. This speech explains how to discover real-time visual data on the Internet. The discovery process uses a crawler to reach many web pages. The information on these web pages are analyzed to identify candidates of real-time data. The data is downloaded multiple times over an extended time period; changes are detected to determine whether it is likely to provide real-time data. The data can be used during an emergency. For example, viewers may check whether a street is flooded and cannot pass. It is also possible using the data to observe long-term trends, such as how people react to movement restrictions during the COVID pandemic.