Learning-based License Plate Detection on Edge ...

Viewer
Transcript

Learning-based License Plate Detection on Edge Features Wing Teng Ho, Wooi Hen Yap, Yong Haur Tay Computer Vision and Intelligent Systems (CVIS) Group Universiti Tunku Abdul Rahman, Malaysia [email protected], [email protected], [email protected] Abstract This paper presents Adaboost learning-based method for license plate detection in unconstrained environment (cluttered scenes, changing illumination, in-plane and out-plane rotation of license plates). Our approach is motivated by the idea that learning-based method can implicitly derive a robust object model through training using large set of positive and negative samples. In addition, edge rather than intensity information is used to train license plate detector (LPD) since edge information – using canny edge detector – has shown better representation than intensity for license plate problem. We present comparative results of our approach against intensity, selection of different number of stages as well as our LPD detection speed. Our approach achieves true positive rate of ~70%, with detection speed ~80 ms for image size of 320 x 240. Key words: license plate detection, machine learning, Adaboost, canny edge detection

1. Introduction As the number of vehicles is growing, license plate detection is becoming more important. It can be applied to applications such as traffic control, security system, automated vehicle verification, car park payment system and etc. A license plate detector (LPD) locates the position of the license plate from a given image. It is challenging to detect a license plate from a cluttered background, and image with a lot of noises such as illumination, rotation, and etc. The rest of the paper is presented in the following structure. Section 2 presents background information related to license plate detection. The architecture of our algorithm is introduced in section 3, and section 4 is introduces Haar-like features with AdaBoost learning algorithm. Experimental results and discussion are presented in section 9. We conclude this paper in section 10.

2. Background

Viola et al. [3] presented a framework for face detection that achieves high detection rate and yet with extremely rapid image processing. Motivated by [1], they introduced a new image representation known as the integral image that allows the features used in the detection to be calculated very rapidly. They use the Haar-like features to classify the patterns for an image, and the Haar-like features used in their framework are reminiscent of Haar wavelet used by Papageorgiou and Poggio [1].Then, they used AdaBoost learning algorithm to select a simple Haar-like features from the over-complete features set. Each feature is known as a weak classifier, and the weak classifiers will be combined to become a strong classifier. Chen and Yuille [4] demonstrated an algorithm for detecting text in natural images, also based on AdaBoost. They claimed that the set of features used for face detection by Viola and Jones [3] might not be suitable for detecting text. This is because there is less spatial similarity for the text compare to face; a face can be regarded as spatial similar object since it consists of facial features such as eyes, nose, and mouth that are approximately the same spatial position for any face. Some of the algorithms are designed specifically for a particular object detection problem such as the adaptive algorithm for text detection from natural scenes by Gao and Yang [5]. They developed a prototype system that can recognize Chinese sign inputs. The algorithm is designed in hierarchical structure with different conditions regulated in each layer. The algorithm can be applied to other languages text by modifying the layout constraints. Nevertheless, it is difficult to design an algorithm that can be used to detect different object without changing the architecture of the system. Shapiro et al. [6] have presented image-based car license plate recognition (CLPR) system. The system consists of few processes such as the license plate localization that is used to locate the license plate for an image. They use heuristic scheme for estimating the plate’s vertical boundaries which are required during the vertical projection. In the system, they have assume that the plates are oriented horizontally and are characterize with frequent intensity alterations between the characters foreground and the

plate’s background. With heuristic scheme, LPD arguably is less tolerant than learning-based approach when it comes to varying conditions (illuminations, scales, significant blur, occluded license plate, etc).

3. System Architecture We developed a framework that consists of two main components, i.e. the training and the testing stage. We train the LPD using Adaboost learning algorithm to select the best feature as the weak classifier and combine all the selected classifiers in cascade as shown in Figure 1. Our license plate detection architecture has few stages; the initial step is to get the original image from the user and the system converts the image to gray-scale format. After converting the image, the next stage is to pre-process the gray-scaled image using an edge filter. We implement canny edge detection in our image preprocessing. After using canny edge detection to filter the noises from the image, we use the trained classifier to determine whether the given region is license plates or not. Original Image Convert to gray-scale image

canny edge detector in our license plate detection. Canny edge detector is one of the most popular edge detection methods, because it provides optimal detection with no false detection, better localization with minimum difference within the actual edge position and the detected edge. It is capable of having a single response to remove multiple responses to a single edge [8].

4. Haar-like Features Rather than operating on pixels directly, license plate detection classifiers may act on simple features. Two motivations of using simple features are (1) features often contain salient domain-knowledge information than pixels essential for learning and (2) system operates much faster with features than pixels. In their paper [3], Viola et al. proposed to use simple Haar-like basis functions as features. In general, these Haar-like basis functions are simple 2D wavelet constructs consisting of at least two nonoverlapping rectangular regions, depicted as white or black. Feature can be computed from the subtraction of pixels summation within the black region and from the pixels summation within the white region. In our work, we use three types of features as originally proposed by Viola et al. as shown in Figure 2.

Canny Edge-detection License Plate Detector Result

Fig 1. LPD System Architecture

3.1 Canny Edge Detection Although the architecture we implement is a learning based algorithm, we also pre-process the original image before the detection. Also we use canny edge detector to further process the training samples, so we will test our system using edge detected images. Edge detection is one of the most important processes in image analysis [7]. An edge represents the boundary of an object which can be used to identify the shapes and area of the particular object. When there is contrast difference between the object and the background, then after applying edge detection, the object edges will be more obvious. In our license plate detection architecture, applying canny edge detector can improve the detection rate since it will remove some of the noises and make the license plate text edges more visible. There are many edge detection methods; we are implementing the

Fig 2. Simple Haar-like basis functions used as features in license plate detection scheme.

5. License Plate Learning with Adaboost Adaboost stands for adaptive boosting, a machine learning technique introduced by Y. Freund et al. [9]. It is used primarily to boost the classification performance of a simple algorithm (for e.g. a simple perceptron) by combining collection of weak classifiers to form one stronger classifier. A weak classifier means any simple classifier that delivers performance slightly better than chance and preferably with low computation time requirement. Y. Freund et al [9] discovered that a committee of weak classifiers when combined properly often outperforms strong classifiers such as Support Vector Machines (SVM) and Neural Networks. Boosting algorithm comes with many variants such as Discrete Adaboost, Real Adaboost, and Gentle Adaboost [10].

Haar-like features are overcomplete in certain sense because for an associated window, the number of rectangle features is far larger than the number of pixels, for instance a 24x24 window has 45,936 possible Haar-like rectangular features [3] compared to 576 (24x24) pixels. Even though these rectangle features can be calculated efficiently using integral image, Viola et al. [3] postulated that only small number of these features can be combined to form one good classifier. The challenge now is to find these features. In our license plate detection system, we used Gentle Adaboost to train strong classifier. Gentle Adaboost is chosen because it outperforms other variants in an empirical analysis carried out by A. Kuranov et al. [11]. Note that, boosting algorithms only differ in the procedure on how to re-weight training examples after the training iteration. To boost the performance of a strong classifier, AdaBoost algorithm search over a pool of weak classifiers to find one with the lowest classification error for the subsequent combination. This learning method is also known as greedy feature selection process. While training a classifier, it is called upon to classify training examples so that these examples can be re-weighted to emphasize those which were incorrectly classified for the next training iteration. After training, we have a weighted combination of weak classifiers in the form of perceptrons and a simple binary threshold value determined automatically. Every weak classifier has an associated weight where good classifiers assigned larger weight while poorer classifiers assigned smaller weight. Figure 3 shows the learning algorithm [3]. •

Given example image ( x1 , y1 ), where

, (xn , yn ) ,

y i = 0,1 for negative and positive examples

respectively. •

1 1 for y i = 0 ,1 , 2 m 2l

Initialise weights w = 1,i

respectively, where m and l are the number of negative and positive respectively. •

For t = 1 , 1.

,T :

For each feature, j, choose a classifier, ht by minimizing a weighted squared error

ε

j

=

wi h j ( xi ) − y i i

•

2.

Update the classifier

3.

Update weights by

F ( x) ← F ( x) + ht ( x)

wi ← wi e − yi ht ( xi )

Final strong classifier is

F ( x) = sgn

T

α i • hi ( x ) + b

i =1

Fig 3. Gentle Adaboost procedure used to construct a strong classifier. T is the number of weak classifiers. The final strong classifier is a weighted linear combination of the T weak classifiers hi(x) with biased offset b.

6. Training a Cascade of Strong Classifiers A cascade of classifiers can be constructed in subsequent stages with Adaboost algorithm to achieve high detection rate while radically reducing computation time. Cascaded classifiers can achieve fast detection speed because in the initial stage of cascading classifiers, majority of non-license plate windows are quickly rejected while almost all positive windows are detected. This mechanism is effective in lowering false positives because within the image, majority of the windows are negative. In subsequent stages, complex classifiers are only called upon on to focus their attention on small fractions of candidate windows. The overall detection process can be depicted as a degenerated decision tree. See Figure 4.

Fig 4. Schematic diagram of a 4 stages cascaded classifiers. Every processing node is a strong classifier. Sub-windows within the image are filtered by the processing nodes. Initially, large number of negative windows is rejected with very small processing time. Subsequent nodes eliminate additional negative windows with additional processing time. After several stages, only small amount of candidate windows are considered by complex classifiers for final decision. The construction of cascaded classifiers is driven by a set of detection and performance goals. In our experiment, stage classifier was trained to achieve high hit rates of 0.995 for a frontal license plate patterns and very low false positive rates of 0.5; a total of 16 stages of cascaded classifiers were trained. Theoretically, our cascaded classifiers can obtain optimum performance at false positive rates about

0.516 ≅ 1.5 × 10 −5 and hit rates of 16 about 0.995 ≅ 0.92 . For detailed discussion on

how to determine performance rates given cascaded classifiers, readers can refer to [3]. In the previous section, Adaboost procedure attempts only to minimise errors, but not designed to optimise performance tradeoffs to obtain highest hit rates at the lowest possible false positive rates. One simple scheme to trade off performance over error rates is to adjust threshold of the classifier created by Adaboost. Given a stage classifier denoted below F ( x) = sgn

T i =1

αi • hi ( x | ti ) + b with b = 0

Any stage classifier can be post-optimised for a given hit rate. The free parameters are ti: threshold, b: offset, while i must be chosen according to the Adaboost loss function to preserve the properties of Adaboost [12]. Parameters ti and b are selected in a gradient-descent manner by slowly increasing ti value while ensuring performance does not degrade. However, true gradient cannot be implemented since F(x) is not continuous.

7. Detection Architecture Our LPD runs detection window across the image at multiple scales and locations. Detection window is scaled at 1.1, means that window size increased at 10% rate between subsequent scans, starting with minimum size of 40 × 10. The features can be easily determined by scaling the base window features by current scale factor, this operation can be done at any scales with the same cost. LPD also runs across different locations. Subsequent locations are shifted in some number of pixels, . The shifting amount is dependent on the scale of the detector, for instance if the current scale is s, then the window is shifted by round ( s∆ ) , where round is the round up to the nearest integer operation. Two integral images are independently determined from the given image and they correspond to intensity integral, ii and squared intensity integral, ii2 respectively. Recall that, integral images are used rather than pixel intensity because features can be evaluated very rapidly, sometimes in an order of hundreds times faster. The detection window may proceed only if its size is smaller than the maximum window size, maxWinSize, which can be computed as (width-10) x (height x10). For instance, if the image size is 720x486, maxWinSize would be 710x476. Detection window must always be smaller than the image size to avoid out of boundary scanning. The output of the final detection often contains multiple license plates detected around each license plate since LPD is insensitive to small translations

and/or scale changes. These extraneous license plates instances can be categorised as one kind of false positive. In practise, multiple detections are combined in a simple manner to return the final detection. To do this, overlapping windows must be recognised accurately before combining them into a single detection. In certain cases, this combination scheme decreases number false positive rates when many windows overlapped.

8. Experiments We prepared 2000 artificial license plates as our positive samples, and 4000 non-license plate as our negative samples. One of the main reasons of using artificial license plates is because the real world license plate databases are simply not sufficient. We rescaled the license plates to resolution size of 40 x 10, and added some noises, brightness effects, inplane and out-of-plane rotation, and etc. In our work, we conducted two experiments, in the first experiment original artificial license plate images are used as training samples, and in the second experiment, edge-detected images are used as training samples. We used canny edge detector to pre-process our original artificial license plates into edged images, Figure 8 shows the original artificial license plates and Figure 9 shows the edged images of our training samples. During the AdaBoost training, we set the targeted number of cascade classifiers to 15 stages and both of the experiments are using the same settings so that we can compare the results fairly.

Fig 8. Artificial License Plate without using canny edge detection.

Fig 9. Artificial license plate using canny edge detection. To compare the performance, both LPDs trained with and without edge-detected training images are tested on our test sets. Our test sets consist of 83 images collected from real-world situation – varying

illumination, in-plane and out-of-plane rotations, different license plate configurations (fonts, spatial arrangement, and etc). For each test images, we manually labeled the ground-truth location of license plates so that we can quantify the LPD performance by judging how much its output deviate from the expected ground truth position and sizes, as shown in Figure 10.

while non-edge trained LPD only managed to achieve TPR of 0.1. Another interesting observation is that LPD trained with more stages sometimes deteriorates than improving TPR, as evident in Figure 13, where 14 stages LPD consistently outperformed LPD of 17 stages when FAR is fixed around at 0.5 and above. One possible explanation is longer cascade classifiers behave more strictly than its shorter counterpart due to the longer filtering pipeline. This behaviour may cause correct license plate area rejected at these extraneous trained stages thus leading to lower TPR. Some LPD detection results on our test images are shown in Fig. 14. Our LPD costs ~ 80 msec to process an image size of 320 x 240, with AMD Athlon 64x2 Dual Core Processor 2.01GHz, 2GB RAM.

Fig 10. Some images from our test set.

9. Results and Discussion

Fig 11. Top-left: Intensity image. Bottom-left: License plate in intensity image. Top-right: Edgedetected image (Canny). Bottom-right: License plate in edge detected image. License plate area has large densities of vertically and horizontally connected edges (See Figure 11). Whereas, non-license plate area are noise found during edge detection process. Unlike license plate area, usually non-license plate areas have random directions. This finding implies that edge information contains salient information important to distinguish the license plate from the non-license plate. Such finding agrees with the previous method found in [6]. As illustrated in Figure 12, true positive rate (TPR) indicates fraction of test images that achieves correct detection with regards to positions and scales, whereas false alarm rate (FAR) indicates a fraction of area incorrectly identified as license-plate. Evidently, LPD trained with edge-information greatly outperformed LPD trained without edge information. For instance, from the Figure 12, with FAR fixed at ~ 1.0, edge-trained LPD obtained TPR near to 0.7

Fig 12. Performance comparison of LPDs. Edgetrained LPD consists of 14 stages cascaded classifiers. Gray trained LPD consists of 8 stages of cascaded classifiers.

Fig 13. Performance comparison of LPD with different number of stages trained on canny edgedetected images.

[2]. Papageorgiou, C. P., & Poggio, T. (2000). A Trainable System for Object Detection. International Journal of Computer Vision 38(0), 15-33. [3]. Viola, P. & Jones, M. (2004). Robust Real-Time Face Detection. International Journal of Computer Vision 57(2), 137-154. [4]. Chen, X.R., & Yuille, A.L. (2004). Detecting and Reading Text in Natural Scenes. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04).

Fig 14 Detection results for Canny edge-trained LPD.

10. Conclusion and Future Works In this paper, we have proposed Adaboost learning based method to construct LPD on Haar-like features. Rather than intensity image, we proposed to learn Haar-like features on edge information. Edge information has demonstrated better discriminative power compared to intensity information since license plate area is highly responsive to horizontal and vertical edges. Our experiments have shown that our approach achieves significant improvements in accuracy over intensity approach. The total time costs only ~ 80 ms to process an image of 320 x 240. However, our approach is in its preliminary development, further improvement is necessary to reduce false alarm rate as well as increasing true positives. One possible approach is to visualize features selected by Adaboost learning to get intuitive idea whether Haar-like features is adequate in license plate detection problems. Another possible approach is to investigate whether alternative features are more appropriate for learning license plate.

11. Acknowledgements This research is partly funded by Malaysian MOSTI ScienceFund SF-01-02-11-SF0019.

12. References [1]. Papageorgiou, C. P., & Poggio, T. (1999). A Trainable Object Detection System: Car Detection in Static Image, A.I. Memo No 1673, C.B.C.L Paper No 180. Massachusetts Institute of Technology.

[5]. Gao, J., & Yang, J. (2001). An Adaptive Algorithm for Text Detection from Natural Scenes. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04). [6]. Shapiro, V., Gluhchev, G., & Dimov, D. (2006). Towards a Multinational Car License Plate Recognition System. Machine Vision and Applications (2006), 17:173183. [7]. Parker, J.R. (1997). Algorithms for Image Processing and Computer Vision. Wiley Computer Publishing, John Wiley & Sons, Inc, Professional, Reference and Trade Group, United States of America. [8]. Nixon, M., & Aguado, A. (2002). Feature Extraction & Image Processing. (First edition).British Library Cataloguing in Publishing Data. [9]. Freund, Y., Schapire, R. E. (1995). A decisiontheoretic generalization of on-line learning and application to boosting. Computational Learning Theory: Eurocolt’95. Springer-Verlag. pp. 23–37. [10]. Freund, Y., Schapire, R. E. (1996) Experiments with a new boosting algorithm. Machine Learning: Proceedings of the Thirteenth International Conference. San Francisco: Morgan Kaufman. pp. 148–156. [11] . Kuranov, A., Lienhart, R., et al. (2003) Empirical analysis of detection cascades of boosted classifiers for rapid object detection. In: Lecture Notes in Computer Science. Heidelberg: Springer-Verlag. 294–304. [12]. Lienhart, R., Maydt, J. (2002) An extended set of Haar-like features for rapid object detection. IEEE International Conference on Image Processing. September, pp. 900–903.

Two-Stage License Plate Detection Using Gentle ...