Hypothesis Generation of Vision-Based Vehicle ...

Viewer
Transcript

Proceedings of the 7th Philippine Computing Science Congress (PCSC 2007), February 2007

Hypothesis Generation of Vision-Based Vehicle Tracking System Joel P. Ilao Mong Tuyet Trinh Dang Alvin Paul D. Daquioag Timothy Joseph R. Ramos Franz Allan V. See College of Computer Studies De La Salle University 2401 Taft Ave., Malate, Manila, Philippines (632)-526-4247

[email protected] [email protected] [email protected] [email protected] [email protected] ABSTRACT Vision-based Vehicle Tracking (ViVeT) is a vision-based system for tracking multiple vehicles on Philippine roadways. ViVeT mainly consists of a video acquisition and a tracking module. Prior to tracking, a still digital video camera records a video footage of a roadway and converts it to an image sequence. The actual tracking then starts off with the processing of the image sequence to hypothesize and verify all possible locations of vehicles. The system is designed to operate under roadways subject to varying levels of occlusion, shadow and luminance taking into consideration the localized factors relating to these identified problems. In this paper, the design and implementation of the Hypothesis Generation module of ViVeT is presented. The relevant features and the parameter set-up are analyzed and discussed. A critical evaluation of performance in terms of accuracy is also outlined. Finally, this paper completes the study with a conclusive summary of its findings and recommendations for future work.

Keywords Vehicle Tracking, Hypothesis Generation, Object Detection, Background Subtraction, Motion Segmentation,

1. INTRODUCTION The use of computer vision has started to make its way in the field of traffic surveillance. Through video image processing, valuable traffic data such as vehicle count and density is generated automatically with minimal human intervention. To date, this technology has been implemented by many countries for purposes of traffic surveillance, management and control; breaking away from the manual surveillance system and the problems encountered by the conventional in-pavement loops. However in the Philippines, this technology has not yet been adopted. Manual schemes and inductive loops remain as the dominant methodology for traffic management. More so, despite the abundance of image processing algorithms, vehicle-tracking approaches are still

affected by some common problems due to certain environmental conditions [1]. The most common examples of these are occlusion, shadow, and varying luminance due to different times of the day [5]. To solve these problems, attempts in the improvement of vehicle tracking algorithms have been made [3] [5][9]. Though algorithms are getting more complex, there is no vehicle extraction approach that can thoroughly solve these problems [5]. There is also the possibility that the performance of these systems might degrade significantly when implemented in other countries due to rain, smog or other unforeseen localized conditions. In this regard, ViVeT endeavors the implementation of a multiple vehicle tracking system used on Philippine roadways. Vision-based vehicle tracking systems can be seen as consisting of four modules namely: video acquisition, hypothesis generation, hypothesis verification and vehicle tracking [2]. Hypothesis generation module detects all objects in the scene that are likely to be vehicles. This can be accomplished either through Knowledge-based, Stereo-based or Motion-based approaches. Knowledge-based methods exploit certain features of the vehicles and roadways. For instance, the intensity of the road and shadows of vehicles can be modeled to estimate the possible presence of vehicles. Stereo-based approaches on the other hand use two images from a pair of cameras. These images are inputted simultaneously to form a 3D model that can be used to extract certain features of the vehicle such as its silhouette figure. Lastly, in Motion-based approaches, vehicles are detected through background subtraction or motion segmentation. This is accomplished by regarding those pixels with a small associated velocity vector as being part of the background [4]. Hypothesis Generation is the one that deals with the dominant problems in vision-based vehicle tracking. The performance of the other modules (i.e. Hypothesis Verification and Vehicle Tracking) depends on how well the Hypothesis Generation module is designed and implemented.

Proceedings of the 7th Philippine Computing Science Congress (PCSC 2007), February 2007

2. ViVeT SYSTEM

Figure 1. System Setup ViVeT is a vision-based system for tracking multiple vehicles on Philippine roadways. It is designed and implemented to detect vehicles and determine their locations throughout the image sequence. A digital camera is elevated and aimed at a street section facing upstream traffic, with a view depth of approximately thirty (30) meters (see Figure 1), and positioned at the level of the overpass structure or Metro Rail Transit (MRT) Station. Image resolution used 320x240 in RGB Format. Figure 2 shows the block diagram of the system. The main modules are Video Acquisition, Hypothesis Generation, Hypothesis Verification and Vehicle Tracking. Given a video file, the Video Acquisition module converts it into a stream of images ready for processing. The second module, Hypothesis Generation, uses the image sequence to identify all possible locations of vehicles. Subsequently, this is verified by the third module, Hypothesis Verification, by distinguishing all the actual vehicles from the nonvehicles. Finally, the Vehicle Tracking module updates the location of detected vehicles throughout the image sequence.

camera. This can be accomplished through knowledge, stereo or motion-based approaches. The actual implementation of the module uses a motion-based approach to detect possible locations of vehicle. Figure 3 shows a cascaded block diagram used in implementing this approach.

3.1 Background Subtraction Masks of moving vehicles on a roadway can be extracted from the static background by matching a background model against the current image frame, through image differencing. If the difference in brightness values between a background pixel and the current frame pixel located in the same image coordinates is higher than a set threshold, then that pixel is classified as part of the foreground or motion mask. However, the background model needs to be updated over time in order to account for varying lighting conditions. Equation 1 shows how the background is updated, ßt and ßt+1 are the current and the next background model estimates, respectively, Ft is the current frame being processed, T is the nonchanging background of the video, Dt is the binary moving objects mask and α = 0.05 is the learning rate constant.

β t +1 = β t (1 − α ) + α (Ft (1 − Dt ) + T (Dt ))

Equation 1

Figure 3. Block Diagram of Hypothesis Generation Module

Figure 2. ViVeT System Block Diagram

3. HYPOTHESIS GENERATION The Hypothesis Generation module is responsible for determining locations of candidate vehicles in the image stream taken by the

After obtaining the next background model estimate, Dt is computed by getting the difference image between the current frame and the background frame. Thresholding is then performed to separate the foreground from the background, using approach suggested by Otsu et. al [6]. The system recognizes vehicles from non-vehicles by comparing the silhoutte of the motion masks of the candidate vehicles to contour templates of valid vehicles. Shadows, however, may significantly affect the accuracy of recognition, and therefore

Proceedings of the 7th Philippine Computing Science Congress (PCSC 2007), February 2007

need to be removed prior to determination of individual motion mask contours. Shadow removal implementation follows the approach suggested by [4] which operates on HSV color space but uses varying threshold values. This approach relates the intensity change of the Value (V) component of a foreground pixel to the V component of the corresponding background pixel using a cubic model curve. The cubic curve is used to determine whether the pixel is part of the shadow or not. Using these approach, different thresholds can be assigned for different intensity values. The saturation component is also examined to employ a much stricter shadow removal. In order to remove the shadow effectively, a pre-calibration step is required from the system user. Before processing a video, a user must select shadow pixels from a representative frame. This way, the system can characterize the local shadow characteristics better in changing light conditions. Figure 5 illustrates the system GUI showing the shadow calibration step, and the generated shadow masks after calibration.

Figure 5. Shadow Calibration Step (top), Shadow Masks

3.2 Objec t Thresholdi ng A vehicle can also be recognized by Figure 4. Flowchart of Object Identification Processing Block the number of pixels after Calibration (bottom) comprising it. Empirical tests indicated that the average minimum pixel count of a possible vehicle is 800 in an image size of 320x240 pixels. Therefore, all blobs whose pixel count fall below this value 3.3 Object Identification is automatically disregarded by the system. The system is designed for vehicle tracking. Therefore, the only relevant descriptors of the objects are the centroid, contour and the bounding box. These properties are enough to track the vehicles along the image sequence. The calculation for the texture and grayscale distribution of every object adds computational load but does not contribute significantly to the accuracy of the system. The contour of the object is approximated by finding the convex hull of each of the foreground masks that have passed the thresholding stage. However, it is not enough to use the vertices of the convex hull since the number of vertices for blob change throughout the image sequence. These vertices also have unequal distances between them and therefore would give rise to oscillations on contours with sparse sample points as compared to those with dense sample points. To solve this, a smooth contour is obtained by using the cubic spline approximation with a fixed number of equidistant points.

Proceedings of the 7th Philippine Computing Science Congress (PCSC 2007), February 2007

4. RESULTS AND DISCUSSION Minimum footage length chosen is 5 minutes taken during daytime between 6 AM to 5 PM following a planned schedule. This schedule is determined considering the amount of traffic congestion, traffic flow, shadow, luminance level, rain, and smog as suggested by Philippine traffic authorities such as the Metropolitan Manila Development Authority (MMDA) and Traffic Engineering Center (TEC). Video footages are taken at the right side of the road for ease of deployment. This orientation is also selected to lessen the amount of occlusion considering that large vehicles (i.e. buses and trucks) are designated to be at the left lane. Camera is set up at MRT stations and bridges which are approximately 5 meters above the ground. The following tests show how the Hypothesis Generation module design affects the performance of ViVeT system.

Based on the summary of results, the performance of the system improves as the average brightness of video increases, conversely, degradation results from the darkening of the ambient lighting. Ambient luminance affects the contrast of the image which is used by the 3 main modules of the system. Figure 6 shows the gray value distribution of a high and low luminance image. The histogram on the left shows a bimodal graph which signifies a high contrast image since the intensity levels are distributed at a wide range of gray values. Conversely, the histogram on the right shows a single mode graph which represents a low contrast image since the intensity levels are concentrated on a short range of gray values. Unlike the other histogram, not much variance can be seen from the gray level distribution of the image.

4.1 Image luminance tests based on varying times of the day The test aims to assess the system performance at different lighting conditions, setting aside all other factors such as occlusion and shadow. Therefore, videos exhibiting different lighting scenarios but with minimal shadow and occlusion are selected from the sample set. After which, the brightness values of all the selected video footages are computed by getting the mean intensity (gray scale value) of every image from the frame sequences. After computation it has been found that the mean brightness values of the selected sample set fall between 107 and 127 with the latter being the brightest. The range has been further divided into four segments namely 107-112, 112-117, 117-122 and 122-127; 30 random samples are tested for each segment to show the performance degradation of the system with respect to luminance. A sample histogram of the images is presented to show a graphical analysis of the contrast in terms of grayscale value distribution and how it affects the performance of the system. Table 4.1. Summary of Luminance Test Results Luminance

107112

112117

117122

122127

Total Vehicles Detected

48

53

53

47

Actual Vehicles

62

52

55

49

False Positives

7

9

5

3

False Negatives

21

7

7

3

True Positives

41

44

48

44

%True Positives

59.42

73.33

80.00

88.00

%False Positives

10.14

15.00

8.33

6.00

%False Negatives

30.43

11.67

11.67

6.00

(gray level 0-255)

Figure 6. Histogram of a High Luminance Image (Left), and Histogram of a Low Luminance Image (Right) Hypothesis Generation works by generating a threshold to separate gray value intensities for the segmentation of the foreground and background. Similarly, the Hypothesis Verification relies partly on the contrast information to construct the boundary descriptors of the vehicle. Low luminance and contrast hinder the Hypothesis Generation module from generating an optimal threshold and therefore may end up removing portions or even the whole vehicle itself. As shown in Figure 8, the percentage of vehicles missed (False negatives) and the percentage of vehicles wrongly identified (False positives) increases as the luminance of the image drops.

4.2 Occlusion Test One of the major causes of misdetection is occlusion. Vehicles occluding or occluded by other vehicles put a great stress in the system. The system requires vehicles to be seen for two consecutive frames. This starts the tracking of the vehicle. In the real world scenario, you cannot prevent the possibility that the occlusion case will not separate through its span of view on the video. This test aims to test how well the system can track vehicles amidst the presence of occlusion. The occlusion testing covers the following three cases (see Figure 7) (1) Initially, vehicles are detected as individual objects then occlude, (2) Vehicles occlude due to shadow or low resolution, and (3) Vehicles occlude from the start of the frame to the end.

Proceedings of the 7th Philippine Computing Science Congress (PCSC 2007), February 2007

Area

% Occlusio n

Occluded

Area1

Area2

(pixels)

(pixels)

(pixels)

Actual

Found

0.843

38

3452

1055

2

2

1.211

27

1727

559

2

1

1.793

45

1100

1410

2

1

1.821

35

1286

655

2

1

2.386

33

846

537

2

1

2.395

74

2202

888

2

2

This test verifies the performance of the system with respect to the shadows being cast through out the different times of the day. It shows how well the system is able to identify vehicles amidst the distortions in the image due to the shadows. To accomplish this, videos exhibiting varying levels of shadow but with an ideal luminance and minimal occlusion are selected as sample set. In the test data used, the average amount of shadows cast by the vehicles ranges from 17 to 50%. This range has been further divided into 3 segments with 30 random samples tested from each segment.

2.526

46

1228

593

2

1

Table 4.3. Summary of Shadow Test Result

2.749

131

983

3783

2

1

3.137

114

2908

726

2

1

3.420

214

1431

4835

2

1

4.294

107

1430

1062

2

1

4.3 Shadow Test

Figure 7. Different Occlusion Cases: Case 1 (top row), Case 2 (middle row), and Case 3 (bottom row) The test is done by getting samples of occluded vehicles and getting the estimated percentage of occlusion (in pixels) which is computed using Equation 2. The correctness of detection is then compared against the amount of occlusion present.

%Occlusion= OccludedArea /(VehicleArea1 + VehicleArea 2) Equation 2 Table 4.2 summarizes the result of occlusion testing. The segmented foreground mask is used as basis for the initial estimate of candidate vehicle location. Shadows that were not removed may connect blobs of vehicles that do not necessarily occlude which aggravate the segmented foreground mask. Furthermore, for video files with occlusion cases, vehicles usually come in the Field of View (FOV) occluded already by other vehicles. This results to poor detection since the system cannot perform motion estimation nor occlusion reasoning until the tracker of an object is initialized, which is done after the second appearance of an object. For other cases of low percentage of occlusion, the system is able to track them but fails when the vehicle is already in the upper portion of the FOV. The motion estimation of the system functions favorably on the lower 60-70% of the FOV. This is because the motion estimation relies on edge correlation to determine the actual location of the vehicles. As the vehicle draws far away from the camera, the resolution degrades therefore affecting the edge content of the vehicle. Table 4.2. Summary of Occlusion Test Resulta

% of Shadow Cast

17-28

28-39

39-50

% Accuracy

90.00

83.33

66.67

Based on the results, the accuracy of the system increases as the percentage of shadows cast decreases. The presences of shadows can degrade significantly the performance of the system since it may cause the system to wrongly segment the boundaries. Even with the shadow removal algorithm, it can only remove a certain portion of the shadow. Another cause of the degradation is the large areas of shadows cast which lead to occlusion cases. The vehicles although apart from each other are joined by their shadows therefore resulting to spurious identifications.

4.4 Overall Results The overall performance of the system brings about 74.58% correctly identified vehicles, 11.86% falsely identified vehicles and 13.56% falsely unidentified vehicles.

5. CONCLUSIONS This paper describes the design of the Hypothesis Generation module of a Vehicle Tracking System developed, taking into consideration the local factors usual in Philippine Roadways. The Hypothesis Generation module developed is capable of hypothesizing candidate vehicles with proper shadow calibration, even in varying luminance and isolated cases of shadow. Shadow detection and segmentation in varying luminance is done using a probabilistic model which aids the segmentation of foreground objects. Moreover, an updating background model is used to adapt to changes in the ambient luminance. Occlusion is the dominant problem of the system. Modern tracking systems employ higher setups (approximately 14 meters) to minimize the amount of occlusion on the roadway. However, for ease of deployment and testing, the system has been implemented on lower heights (i.e. MRT bridges and pedestrian overpass approximately 5 meters) therefore increasing the possibility of occlusion. To improve performance, it is recommended that video recording be also done on higher platforms to reduce cases of occlusion.

Proceedings of the 7th Philippine Computing Science Congress (PCSC 2007), February 2007

6. ACKNOWLEDGMENTS The researchers express their gratitude to the Traffic Engineering Center of the Metro Manila Development Authority – Traffic Management Group for the assistance and information they have given during the course of the research.

7. REFERENCES [1] Asokawa, Y., Ohashi, K., Kimachi, M., Wu, Y., & Ogata, S. “Automatic vehicle recognition by silhouette theory” (1998), [online]. Available: http://www.society.omron.com/traffic/svs/pdf/ITS98_Seoul. pdf. (January 27, 2005)

[2] De Guzman, S., Espiritu, Traffic sign recognition system. Archives, College of Computer Studies, De La Salle University (2004)

[3] Highet, R., "Optical vehicle tracking - a framework and tracking solution" (2004) [online]. Available: http://nz.cosc.canterbury.ac.nz/research/reports/HonsReps/2 004/hons_0404.pdf. (February 18, 2005)

[4] Kastrinaki, V., Zervakis, M., Kalaitzakis, K., “A Survey of Video Processing Techniques For Traffic applications” (2002) [online]. Available: http://www.elci.tuc.gr/downloads/Kalaitzakis/J.25.pdf. (June 26, 2005)

[5] Koller, D., Weber, J., & Malik, J., “Robust multiple car tracking with occlusion reasoning” (1994) [online]. Available: http://sunsite.berkeley.edu/Dienst/Repository/2.0/Body/ncstr l.ucb/CSD-93-780/pdf. (January 12, 2005)

[6] Otsu, Noboyuki., "A Threshold Selection Method from GrayLevel Histograms" (1979) [online]. Available : http://140.128.142.166/chsu/archives/AThresholdSelectionM ethodfromGray-Level.ppt .(November 25, 2005)

[7] Piccardi, Massimo., "Background subtraction techniques: - a review" (2004) [online]. Available: http://wwwstaff.it.uts.edu.au/~massimo/BackgroundSubtractionReviewPiccardi.pdf. (November 25, 2005)

[8] Prati, A., Cucchiara, R., Mikic, I., Trivedi, M., Analysis and Detection of Shadows in Video Streams: A Comparative Evaluation [online]. Available: http://cvrr.ucsd.edu/aton/publications/pdfpapers/Andrea_cvp r01.pdf. (November 25, 2005)

[9] Saito, A., Kimachi, M., & Ogata, S., "New video vehicle detection field proven robust and accurate. 6th World congress on intelligent transportation systems" (1999), [online]. Available: http://www.society.omron.com/traffic/svs/pdf/its99_toronto. pdf. (January 27, 2005)

Hypothesis Generation of Vision-Based Vehicle ...

The test aims to assess the system performance at different lighting conditions .... [3] Highet, R., "Optical vehicle tracking - a framework and tracking solution" ...

Download PDF

310KB Sizes 2 Downloads 139 Views

Report

Hypothesis Generation of Vision-Based Vehicle ...

Recommend Documents