TRIDIMENSIONAL PROBABILISTIC TRACKING FOR ...

Viewer
Transcript

TRIDIMENSIONAL PROBABILISTIC TRACKING FOR SPORTS F´abio A. S. Dias∗ , Neucimar J. Leite, Siome K. Goldenstein, Ricardo M. L. de Barros University of Campinas UNICAMP P.O. 6176, 13084-971, Campinas, SP, Brazil ABSTRACT In this work we present a simple, but powerful, method for multi-camera people tracking, aiming collective sports tracking. We explore the tridimensional information to perform the tracking, a concept not widely explored in this context, using the video images only to reconstruct the tridimensional information of the current scene. To deal with false reconstructed objects, we considered a generalization of the concept of visual rhythm, transforming the tracking problem into a segmentation problem, solved by a simple graph based approach. Index Terms— Sport Tracking, Tridimensional reconstruction, Probabilistic object identification, Generalized Visual Rhythm, Trajectory definition. 1. INTRODUCTION This work introduces a new and simple method for people tracking in sports, exploring the tridimensional information of the scene as base data for tracking. This approach is not widely explored in the general people tracking context and even less explored in sports tracking. However, in the sports context this approach can reveal its full potention to deal with occlusions, fast motion and nonconstrained moviment like jumping. We propose a method with a few requirements: resonable frame rate, color images, syncronized cameras with a good coverage. With a good frame rate, we can assume smooth transitions, color images provides more information for object identification and the camera coverage is needed to provide different points of view of the same object, improving the reconstruction result. Each part of the considered field of action should be covered with at least two cameras. The main contribution of this work is the generalization of the visual rhythm concepts and its use to reduce the complexity of the tracking problem, transforming it into a segmentation problem, expliciting spatio-temporal continuity. This paper is organized as follows: section 2 places this work among other related works in the considered context, section 3 introduces a probabilistic object detection method, ∗ Thanks

to FAPESP for funding, project 06/59526-8.

based on Bayes’ Law, section 4 introduces a probabilistic reconstruction method, using the computed object probabilities, section 5 introduces the generalization of the Visual Rhythm concept and its application to the tridimensional video and section 7 introduces the considered method to estimate each trajectory. In section 8 we present some experimental results and in section 9 we discuss the results and propose some future work.

2. RELATED WORK The table 1 summarizes the related works. The Sports field represents the considered sport, 3D indicates the domain of the considered information, Stat.C. indicates the need of static cameras, Sup.C. indicates the need of superior cameras and B.M. indicates the use of a background model. # [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] This work

Sport Handball Tennis Squash Soccer Soccer Handball Soccer Soccer Soccer Soccer Soccer Soccer People People Handball

3D X X X

Stat.C. X X X X X X X X X X X X X

Sup.C. X X X -

B.M. X X X X X X X X X X

Table 1. Relevant related works This table also contains related work in general people tracking due to its similarity to the proposed work. In [13], dinamic proggraming is used to estimate the trajectories and the moving subjects do not leave the floor. In [14], a very similar approach is used to the reconstruction step, but aiming people counting.

3. OBJECT DETECTION The first step of the proposed method is the object detection. Assuming static cameras in a static background, we build a statistical background model, modelling the background as pixel-wise gaussian distributions and the foreground using an histogram in RGB color space. To improve the classification results and reduce processing time, this step uses only a small subset of the video frames to estimate these distributions, with each sampled frame equally spaced in time of another. Then the method identify pairs of frames with maximum difference, computed as follows: X ∀Si ∈ S, maxSj ∈S |Si − Sj | (1) With S representing the subset of frames used to compute de model. This pairs are subtracted and all pixels with difference smaller of equal to a threshold are used to model the background. The remaining pixels are used to compute an tridimensional color histogram of the foreground distribution. Additionally, more frames can be used for each comparison frame, improving results for videos with fast motion. Using the foreground and background models, previously computed, we perform the object detection, based on the Bayes’ Law. We defined the following events, in a pixel-wise way: • Ci : The pixel in question has cromatic information of the class i. These classes are defined by the used histogram, each bin defining one different class. • B: The pixel belongs to the background. • F : The pixel belongs to the foreground. With these events, we formulate the following equation, based on Bayes’ Law, to estimate the probability of a given pixel to belong to the foreground, given the cromatic information: P (Ci /B)P (B) P (Ci /B)P (B) + P (Ci /¬B)(1 − P (B)) (2) The valor of P (Ci /B) is obtained from the corresponding gaussian and P (Ci /¬B) from the foreground histogram. Despite the simplicity of this method, it have two major advantajes over the traditional method of background subtraction: the result is a probabilistic measure of the existance of a non-background object, which, in this work, is used as is for further exploration of the scene. Additionally, it can easily overcome small differences between foreground and background colors, the typical scenario of failure of the traditional approach, ocasionaly leading to a result with almost none false negatives and some false positives, which is a desirable behavior in our context. P (F/Ci ) = 1 −

The threshold used in this step represents the expected color flutuation of the scene. Its influence to the performance of the method can be reduced by using more frames to the estimation. 4. TRIDIMENSIONAL RECONSTRUCTION In order to use the tridimensional information as base for people tracking, we first need to reconstruct the current scene. We considered a reconstruction algorithm, inspired in space carving. In our method, the probability of existance of a moving object in a given voxel Vi is estimated by: Nc Y 1 P (Vi ) = N p c=1

X

P (xci )

(3)

xi ∈proj(Vi ,c)

Where Nc is the number of cameras viewing the voxel Vi , proj(Vi , c) represents the set of all pixels belonging to the projection of the voxel Vi in the camera c, Np is the number of pixels in this projection and P (xci ) is the probability of a the pixel xi belong to a foreground object, calculated in the object detection step, in the camera c. In a simple way, the probability of a given voxel exists is a function of the probabilities of the corresponding projection in the images of identified objects. We average these regions and multiply them between the cameras, representing a probabilistics and operation. To reduce small noise, tridimensional morphological operations are applied to the scene. This reconstruction technique leads to a major problem, shared with visuall hull reconstruction, the generation of false volumes, often called phantoms, results of all possible combinations between identified blobs. We deal with this problem during the trajectory estimation step. However, we strongly recommend the use of as many cameras as possible, to reduce the generated phantoms. 5. INFORMATION SAMPLING BY GENERALIZED VISUAL RHYTHM With the reconstructed tridimensional information, the final step of this work is to estimate the trajectories of all moving objects in the current scene. We considered a generalization of the visual rhythm concept, widely used to cut detection in video images [15, 16, 17]. In the classic approach, each frame of a given video is sampled, providing an unidimensional information, which is re-arranged with all other samples, resulting in a bidimensional information. This resulting image contains all the desired relevant information of the video, often expliciting this relevant features. In this work, we sample each tridimensional frame to a bidimensional sample and re-arrange these samples to obtain a tridimensional information. The considered sampling

function is the maximum operation in the Z direction. To rearrange the samples, we used the stacking operation, replacing the Z axis with the time axis. This simple operation reinforces one of the most reliable features in the considered videos: the space-time continuity. At the considered frame rate, the trajectories of the moving objects will correspond to connected volumes in the final volume, transforming the tracking problem into a segmentation problem, with the advantage of well defined shapes to segment. 6. SMALL GAPS COMPLETION BY GENERALIZED MULTISCALE DIRECTIONAL INFORMATION To improve the robustness of the considered method, we consider an approach to reconnect small gaps in the considered data, by a generalization of approach suggested in [18] for fingerprint images to a tridimensional domain, due to the abstract similarity between fingerprint images and the re-arraged sampled data considered here. The tridimensional generalization of [18] is straightforward, we use two angles to define the orientation vector. For each considered direction, we also define two perpendicular vectors, arbitrarily. The directional information for each voxel is the direction with maximum information. In this work, we define information as the double of the sum of the sampled data in the considered direction minus the sum in the perpendicular directions, considered in a window centered in the current voxel and size based on the manual initialization. The multiscale approach is also considered, removing small noises in the orientation field. Instead of a watershed operation, as suggested in the original work, we simply apply a morphological dilatation, using the corresponding directions as structuring elements. 7. TRAJECTORY ESTIMATION With the re-arranged sampled data from the reconstruction step, we need to identify each connected volume, to define the respective trajectories. Ideally, the only connected components are the real moving objects, but due to non-ideal camera positioning or ilumination, this assumption may fall. To increase noise robustness, we perform a manual initialization of the method, provinding a reliable count of subjects in scene. Then, we proceed with a binarization in the considered information, however, due to noise, we cannot simply apply a threshold. We compute the gradient information of the tridimensional data, followed by a morphological closing, with size of the structuring element infered from the inicial frame. This operation aims to capture the regions with higher values, even with constant noise. The morphological operation closes the tridimensional solid and filters some noise. In this binarized data, we identify connected components and if the selected volume goes from the begin to the end of

the volume, in the temporal Z axis, it must contain the trajectory of, at least, one object. Further, due the manual initialization, the number of objects is equal to the connected areas in the first frame. To identify the correspondent trajectories, we label each connected component, for each temporal slice, and build a directed graph, connecting a label X to a label Y if they share pixels locations in temporal consecutive frames. Next, we process this connexity graph, as follows: 1. If a label does not point to another label and is not at the topmost level, it should be removed. 2. If a label is not pointer by another label and is not at the bottom-most level, it should be removed. 3. If a label Y is pointed only by a label X and X only points to Y , Y should be equal to X. 4. If a label X points only to a label Y , others labels which also points to Y should not point to Y . 5. If a label is pointed by many labels, it must be partitioned. 6. If a partitioned label is much smaller than the known true size of the objects from the manual initialization, it should be removed. The rules 1 and 2 eliminates abrupts phantoms appearing and disappearing, forcing the spatio-temporal constraint. The rule 3 explicites the equivalence between labels, merging labels which implicates each other. The rule 4 deals with the case of only one possible path, removing the option from others possible paths, for different labels. The rules 5 and 6 enforces the spatio-temporal continuity by breaking wrongly connected components in the reconstruction step, but removing small labels to cover the situation of smooth phantom appearing and disappearing in the trajectory of real subjects. To process the information, we first split the labels, using rules 4, 5 and 6, then removing pointless labels, using rules 1 and 2, then merging the equivalent labels, using rule 3. All operations are repeated until convergence. 8. EXPERIMENTAL RESULTS In the first considered video, we utilized four Basler A602fc cameras with external hardware trigger, 35 fps and three moving persons, as shown in the figures 1, 2, 3, 4, including the corresponding object identification result. We used = 10, RGB histogram with 128x128x128 bins and 100 frames to model the objects, as described in section 3. The sequence has 1000 frames and each voxel has 10cm x 10cm x 10cm. The object identification method also detects casted shadows, moving background objects and some illumination noise. However, the reconstruction process eliminates great

(a) Original image.

(b) Object identification result.

Fig. 1. Frame #1 from camera 1.

(a) Original image.

(b) Object identification result.

Fig. 2. Frame #1 from camera 2.

(a) Original image.

(b) Object identification result.

Fig. 4. Frame #1 from camera 4.

(a) Reconstructed scence of the manual initialization.

(b) Reconstructed scene of the subsequent frame.

Fig. 5. Reconstruction results.

part of this noise. Therefore, this step can have false positive results, but we should avoid methods with false negative results because it will remove existent volumes of the reconstructed scene. The figure 5 shows the result of the manual initialization and the reconstruction of the subsequent frame. We proceed with the information sampling, considering batches of 200 frames, with the first batch shown in the figure 6(a). The corresponding result of the trajectory estimation step is shown in the figure 6(b). The multiscale directional filtering was not used in this example. 9. CONCLUSION AND FUTURE WORK In this paper we presented a simple approach to tracking for sports, exploring the tridimensional nature of the scene. Despite its originality and potential, more work is needed to achieve the robustness necessary for pratical applications, as: • Remove the binarization step and deal with the raw probabilistic data, exploring the sampling operation to remove the complexity of the problem, avoiding aproximations as in [13].

(a) Original sampled information.

(b) Estimated trajectories.

Fig. 6. Trajectory estimation results.

• Explore the utilization of the multiscale directional information to define a tridimensional skeleton of the considered trajectories. (a) Original image.

(b) Object identification result.

Fig. 3. Frame #1 from camera 3.

• Aditional processing of the resulting trajectories removing noise and smoothing the resulting solids.

10. REFERENCES [1] J. Pers and S. Kovacic, “Computer vision system for tracking players in sports games,” in Proceedings of the First Int’l Workshop on Image and Signal Processing and Analysis, 2000, pp. 81–86. [2] G. Pingali, A. Opalach, Y. Jean, and I. Carlbom, “Visualization of sports using motion trajectories: providing insights into performance, style, and strategy,” in Visualization, 2001. VIS ’01. Proceedings. October 2001, pp. 75 – 83, IEEE. [3] J. Pers, G. Vuckovic, S. Kovacic, and B. Dezman, “A low-cost real-time tracker of live sport events,” in In 2nd international symposium on image and signal processing and analysis, June 2001, pp. 362–365. [4] E.L. Andrade, E. Khan, J.C. Woods, and M. Ghanbari, “Player identification in interactive sport scenes using region space analysis prior information and number recognition,” in Visual Information Engineering, 2003. VIE 2003. International Conference on, July 2003, pp. 57 – 60. [5] Bruno Mller and Ricardo de Oliveira Anido, “Distributed real-time soccer tracking,” in VSSN ’04: Proceedings of the ACM 2nd international workshop on Video surveillance & sensor networks, New York, NY, USA, 2004, pp. 97–103, ACM Press. [6] M. Kristan, J. Pers, M. Perse, S. Kovacic, and M. Bon, “Multiple interacting targets tracking with application to team sports,” in Image and Signal Processing and Analysis, 2005. ISPA 2005. Proceedings of the 4th International Symposium on, September 2005, pp. 322 – 327. [7] J.B. Hayet, Mathes, J. T. Czyz, J. Piater, J.and Verly, and B. Macq, “A modular multi-camera framework for team sports tracking,” in Advanced Video and Signal Based Surveillance. AVSS, September 2005, pp. 493– 498, IEEE. [8] Wayne Chelliah Naidoo and Jules Raymond Tapamo, “Soccer video analysis by ball, player and referee tracking,” in SAICSIT ’06: Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries, , Republic of South Africa, 2006, pp. 51–60, South African Institute for Computer Scientists and Information Technologists. [9] Aye Pa Pa Mya, Thin Lai Lai Thein, and Myint Myint sein, “Extracting the motion pattern of the players from a video stream of the football game,” in SICE-ICASE, 2006. International Joint Conference, October 2006, pp. 5624 – 5627.

[10] Pascual J. Figueroa, Neucimar J. Leite, and Ricardo M. L. Barros, “Tracking soccer players aiming their kinematical motion analysis,” Comput. Vis. Image Underst., vol. 101, no. 2, pp. 122–135, 2006. [11] Guangyu Zhu, Qingming Huang, Changsheng Xu, Yong Rui, Shuqiang Jiang, Wen Gao, and Hongxun Yao, “Trajectory based event tactics analysis in broadcast sports video,” in MULTIMEDIA ’07: Proceedings of the 15th international conference on Multimedia, New York, NY, USA, 2007, pp. 58–67, ACM Press. [12] Jacek Czyz, Branko Ristic, and Benoit Macq, “A particle filter for joint detection and tracking of color objects,” Image and Vision Computing, vol. 25, no. 8, pp. 1271–1281, August 2007. [13] F. Fleuret, J. Berclaz, R. Lengagne, and P. Fua, “Multicamera people tracking with a probabilistic occupancy map,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30, no. 2, pp. 267–282, Feb. 2008. [14] D.B. Yang, H.H. Gonzalez-Banos, and L.J. Guibas, “Counting people in crowds with a real-time network of simple image sensors,” Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, pp. 122–129 vol.1, Oct. 2003. [15] Hyeokman Kim, Jinho Lee, Jae-Heon Yang, Sanghoon Sull, Woonkyung M. Kim, and S. Moon-Ho Song, “Visual rhythm and shot verification,” Multimedia Tools and Applications, vol. 15, no. 3, pp. 227–245, December 2001. [16] Chong-Wah Ngo, “A robust dissolve detector by support vector machine,” in MULTIMEDIA ’03: Proceedings of the eleventh ACM international conference on Multimedia, New York, NY, USA, 2003, pp. 283–286, ACM Press. [17] Francisco N. Bezerra and Neucimar J. Leite, “Using string matching to detect video transitions,” Pattern Analysis & Applications, vol. 10, no. 1, pp. 45–54, October 2006. [18] M.D. Oliveira and N.J. Leite, “Reconnection of fingerprint ridges based on morphological operators and multiscale directional information,” Computer Graphics and Image Processing, 2004. Proceedings. 17th Brazilian Symposium on, pp. 122–129, Oct. 2004.

TRIDIMENSIONAL PROBABILISTIC TRACKING FOR ...

[1] J. Pers and S. Kovacic, âComputer vision system for ... 362â365. [4] E.L. Andrade, E. Khan, J.C. Woods, and M. Ghan- bari, âPlayer identification in interactive sport scenes us- ... [16] Chong-Wah Ngo, âA robust dissolve detector by support.

Download PDF

2MB Sizes 2 Downloads 312 Views

Report

TRIDIMENSIONAL PROBABILISTIC TRACKING FOR ...

Recommend Documents