The Community for Technology Leaders
2018 24th International Conference on Pattern Recognition (ICPR) (2018)
Beijing, China
Aug. 20, 2018 to Aug. 24, 2018
ISSN: 1051-4651
ISBN: 978-1-5386-3789-0
pp: 2050-2055
Li Yang , National ASIC System Engineering Research Center, Southeast University, Nanjing, China
Zhi Qi , National ASIC System Engineering Research Center, Southeast University, Nanjing, China
Zeheng Liu , National ASIC System Engineering Research Center, Southeast University, Nanjing, China
Shanshan Zhou , National ASIC System Engineering Research Center, Southeast University, Nanjing, China
Yang Zhang , National ASIC System Engineering Research Center, Southeast University, Nanjing, China
Hao Liu , National ASIC System Engineering Research Center, Southeast University, Nanjing, China
Jianhui Wu , National ASIC System Engineering Research Center, Southeast University, Nanjing, China
Longxing Shi , National ASIC System Engineering Research Center, Southeast University, Nanjing, China
ABSTRACT
Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve a robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors' projections along the horizontal and vertical axes then recover the size and orientation of a bounding box exactly enclosing the hand. Evaluated on the challenging Oxford hand dataset, our method reaches 83.2% average precision (AP) at 139 FPS on a Nvidia Titan X, outperforming the previous methods both in accuracy and efficiency.
INDEX TERMS
Feature extraction, Estimation, Computer architecture, Object detection, Convolution, Detectors, Skin
CITATION

L. Yang et al., "A Light CNN based Method for Hand Detection and Orientation Estimation," 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 2018, pp. 2050-2055.
doi:10.1109/ICPR.2018.8545493
306 ms
(Ver 3.3 (11022016))