Qiang Ji
2021-2023 Distinguished Visitor


Qiang Ji is a Professor with the Department of Electrical, Computer, and Systems Engineering at Rensselaer Polytechnic Institute (RPI). He received his Ph.D degree in Electrical Engineering from the University of Washington. He was a program director at the National Science Foundation, where he managed NSF’s computer vision and machine learning programs. He also held teaching and research positions at University of Illinois at Urbana-Champaign, Carnegie Mellon University, and University of Nevada at Reno. His research interests are in computer vision, probabilistic machine learning, and their applications. He has published over 300 papers, received multiple awards for his work, serve as an editor for multiple international journals, and organize numerous international conferences/workshops. He is a fellow of the IEEE and the IAPR.

Rensselaer Polytechnic Institute

Email: qiangjirpi@gmail.com

DVP term expires December 2023


Joint Top-down and Bottom-up Inference for for Data Efficient and Generalizable Visual Learning

Substantial progresses have been made in computer vision recently as a result of the latest developments in deep learning. Despite these developments, current data-driven visual learning methods are purely bottom-up, inefficient, and do not generalize well beyond their training data. In contrast, human vision performs joint top-down and bottom-up learning. It is hence much more data efficient and can readily generalize to different tasks. To emulate human vision, we propose to systemically identify the related prior knowledge and encode them into a prior model. Visual recognition can then be formulated by integrating the top-down projections by the prior model with the visual features learnt from a bottom-up data model. In this talk, I will discuss the proposed research, its motivations, the identification of related prior knowledge, the specific methods for representing and encoding prior knowledge and for integrating them with image data for different computer vision tasks. I will also review recent related work from other groups.

Hierarchical Context Modeling for Video Event Recognition

Current video event recognition research remains largely target-centered. Target-centered event recognition faces great challenges due to large intra-class target variation, limited image resolution, and significant variation in background and illumination. To mitigate these challenges, we introduced a context-augmented video event recognition approach, whereby we propose to augment the conventional target-driven event recognition with contextual information of different types and at different levels. Specifically, we explicitly capture contexts in three levels including the feature level, the semantic level, and the prior level. At the feature level, we introduce two types of contextual features including the appearance context features and the interaction context features. The appearance context features capture the appearance of the contextual objects, and the interaction contextual features capture the interactions between the contextual objects and the target objects. At the semantic level, we propose a deep model based on the deep Boltzmann machine to learn event object representations and their interactions. At the prior level, we utilize two types of prior-level contexts including scene priming and dynamic cueing. Finally, we introduce a hierarchical context model that systematically integrates the contextual information at different levels. Through the hierarchical context model, contexts at different levels jointly contribute to the event recognition.

We evaluate the proposed hierarchical context models for event recognition on benchmark surveillance video data sets including VIRAT 1.0, VIRAT 2.0, and UT-Interaction datasets. Results show that incorporating contexts in each level can improve the event recognition performance, and jointly integrating three levels of contexts through the hierarchical context model achieves the most improvement. It outperforms state of the art video event recognition methods.

User Affect Modeling, Recognition, and Assistance

User emotional states can seriously affect user’s psychomotor and decision-making capabilities. The goal of this research is to develop a system to recognize task-specific negative user affective states (e.g. fatigue and stress), and to provide the appropriate intervention to compensate performance decrement resulted from these negative states. The proposed system consists of two major components: multi-modality user state sensing, and user affect and assistance modeling.

For user state sensing, we develop a real time non-invasive system that provides user state measurements from sensors of different modalities. The sensory measurements include physical appearance (facial expression, eye movements, and head movements) extracted from remote video cameras, physiological measurements collected from an emotional mouse we developed, behavioral data from user interaction with the computer, and performance measures. For user affect and assistance modeling, we introduce a general unified decision-theoretic framework based on the Dynamic Influence Diagrams for simultaneously modeling user affect recognition and assistance. Using the framework, affective state recognition is achieved through active probabilistic inference from the available sensory data.

Specifically, we introduce an active sensing strategy that allows performing purposive and sufficing information integration in order to infer user’s affective state in a timely and efficient manner. User assistance is automatically accomplished through a decision-making process that balances the benefits of keeping the user in productive affective states and the costs of performing user assistance. An information-theoretic approach is introduced to probabilistically determine the most appropriate user augmentation and its application timing in order to maximize the chance of returning user to a productive affective state while minimizing the associated costs. Validation of the proposed framework via a simulation study demonstrates its capability in efficient user affect recognition as well as timely and appropriate user assistance. The affect recognition component of the prototype system is subsequently validated through a real-world study involving human subjects.




  • Joint Top-down and Bottom-up Inference for for Data Efficient and Generalizable Visual Learning
  • Hierarchical Context Modeling for Video Event Recognition
  • User Affect Modeling, Recognition, and Assistance

Read the abstracts for each of these presentations