Speaker - Mubarak Shah (Center for Research in Computer Vision, University of Central Florida, USA)
http://crcv.ucf.edu/people/faculty/shah.php
Title - SPATIOTEMPORAL GRAPHS FOR OBJECT SEGMENTATION AND HUMAN POSE ESTIMATION IN VIDEOS.
Abstract -
Images and videos can be naturally represented by graphs, with spatial graphs for images and
spatiotemporal graphs for videos. However, for different applications, there are usually different
formulations of the graphs, and algorithms for each formulation have different complexities.
Therefore, wisely formulating the problem to ensure an accurate and efficient solution is one of the
core issues in Computer Vision research. In this talk, I will explore three problems in this domain to demonstrate
how to formulate all of these problems in terms of spatiotemporal graphs and obtain good and
efficient solutions.
The first problem is video object segmentation. The goal is to segment the primary
moving objects in the videos. In our framework, we use object proposals, which are object-like regions obtained by low level
visual cues. Each object proposal has an object-ness score associated with it, which indicates
how likely this object proposal corresponds to an object. The problem is formulated as a directed
acyclic graph, for which nodes represent the object proposals and edges represent the spatiotemporal
relationship between nodes. A dynamic programming solution is employed to select one object
proposal from each video frame, while ensuring their consistency throughout the video frames.
Gaussian mixture models (GMMs) are used for modeling the background and foreground, and
Markov Random Fields (MRFs) are employed to smooth the pixel-level segmentation.
In the above spatiotemporal graph formulation, we consider the object segmentation in only single
video. Next, we consider multiple videos and model the video co-segmentation problem as a
spatiotemporal graph. The goal here is to simultaneously segment the moving objects from multiple
videos and assign common objects the same labels. The problem is formulated as a regulated maximum
clique problem using object proposals. The object proposals are tracked in adjacent
frames to generate a pool of candidate tracklets. Then an undirected graph is built with the nodes
corresponding to the tracklets from all the videos and edges representing the similarities between
the tracklets. A modified Bron-Kerbosch Algorithm is applied to the graph in order to select the
prominent objects contained in these videos, hence relate the segmentation of each object in different
videos.
In online and surveillance videos, the most important object class is the human. In contrast to
generic video object segmentation and co-segmentation, specific knowledge about humans, which
is defined by a pose (i.e. human skeleton), can be employed to help the segmentation and tracking
of people in the videos. We formulate the problem of human pose estimation in videos using the
spatiotemporal graph. In this formulation, the nodes represent different body parts in the video
frames and edges represent the spatiotemporal relationship between body parts in adjacent frames.
The graph is carefully designed to ensure an exact and efficient solution. The overall objective for
the new formulation is to remove the simple cycles from the traditional graph-based formulations.
Dynamic programming is employed in different stages in the method to select the best tracklets
and human pose configurations.
Short CV -
Dr. Mubarak Shah, Trustee Chair Professor of Computer Science, is the founding director of the Center for Research in Computer Vision at UCF. His research interests include: video surveillance, visual tracking, human activity recognition, visual analysis of crowded scenes, video registration, UAV video analysis, etc. Dr. Shah is a fellow of IEEE, AAAS, IAPR and SPIE. In 2006, he was awarded a Pegasus Professor award, the highest award at UCF. He is ACM distinguished speaker. He was an IEEE Distinguished Visitor speaker for 1997-2000 and received IEEE Outstanding Engineering Educator Award in 1997. He received the Harris Corporation's Engineering Achievement Award in 1999, the TOKTEN awards from UNDP in 1995, 1997, and 2000; Teaching Incentive Program award in 1995 and 2003, Research Incentive Award in 2003 and 2009, Millionaires' Club awards in 2005 and 2006, University Distinguished Researcher award in 2007, honorable mention for the ICCV 2005 Where Am I? Challenge Problem, and was nominated for the best paper award in ACM Multimedia Conference in 2005. He is an editor of international book series on Video Computing; editor in chief of Machine Vision and Applications journal, and an associate editor of ACM Computing Surveys journal. He was an associate editor of the IEEE Transactions on PAMI, and a guest editor of the special issue of International Journal of Computer Vision on Video Computing.