[back to Tsung-Yu Lin]


For visual surveillance tracking, to separate multiple targets when they are mutually occluded in the view of a single camera is extremely challenging. Now that multiple cameras can provide different perspectives, how to localize the targets on the 3D ground plane in a network of cameras becomes an important issue. Previous methods usually solve the localization problem iteratively based on background subtraction results, yet high-level image information is neglected. In order to fully exploit the image information and thus to gain superior localization precision, we suggest incorporating human detection into surveillance system. We develop a novel formulation combining human detection and background subtraction result to localize targets on the ground plane and solve it by convex optimization.

system overview

Figure The overview of our method. The first step is to train a human detector and then create a map PHUMc. Secondly, foreground regions are used in generating the map PBSc. Finally, these two kinds of information are integrated into an objective function of true occupancy map, and the solution is obtained by convex optimization.


We evaluate our method on three public datasets, namely, 6p sequence, Terrace sequence, and Passageway sequence. Each sequence contains four views and ground truth map. First, we test our method on Terrace sequence and show that fusion the maps generate by human detection and background subtraction can improve the localization result over using a single map respectively. We also show the results on other two datasets.


Table 1. Compare our fusion method on Terrace sequence to using human detection and background subtraction respectively


Table 2. Evaluation of our method on 6p sequence and Passageway sequence

Some visualization results. The boxes are rendered by back-projecting cubes, which height are 1.75m, located on 3D ground plane.

localized boxes


Tung-Ying Lee, Tsung-Yu Lin, Szu-Hao Huang, Shang-Hong Lai, Shang-Chih Hung, People localization in a camera network combining background subtraction and scene-aware human detection, Proceedings of the 17th international conference on Advances in multimedia modeling(MMM), 2011, [pdf]