A Robust Lip Center Detection in Cell Phone Environment Thanh Trung Pham *, Min Gyu Song*, Jin Young Kim*, Seung You Na*, Sung Taek Hwang** * Dept. of ECE, Chonnam National University ** Telecommunication R&D Center, Samsung Electronics * Buk-Gu Yongbong-Dong 300, Gwangju, 500-757, South Korea ** 416 Metan-Dong, Yeongtong-Gu, Suwon-si, Gyeonggi-do, 443-747, South Korea Emails: {
[email protected],
[email protected]}
Abstract- In this paper we present a new approach for the detection of lip centers based on eye localization that is adopted into a lip reading sytem in mobile environments. First, the centers of left eyes and right eyes are localized directly. Then we use the geometry characteristics of faces to extract rough lip regions. Next, we use 2 steps of threshold adaptation to binarize lip images. The first threshold adaptation is used to estimate standard lip threshold for each image; and the second one is applied to compute thresholds for left and right lip image based on standard lip threshold. Finally we apply Sobel edge map based filter with projections to detect precise lip centers. Experimental study shows that our algorithm can work well under various illumination conditions that is one of the typical difficulties of image processing and computer vision problems. Keywords - eye localization, lip-reading, lip detection.
I.
INTRODUCTION
During the last few years, lip reading or speech reading problems have attracted lots of attentions to enhance speech recognition performances [1-5]. The reason is to exploit the visual information of lip movements that contain the information of speech articulation. The first step in a lip reading system is to detect lip regions in face images. A common approach is the methods based on face detection. That is, the face region is detected by skin color information and then the lip detection is performed using the information of facial structure. However, the color information is so vulnerable due to illumination changes that exact face detection is not easy in dynamic lighting conditions of indoor and outdoor environments. However, in mobile applications of lip reading only one face is assumed to be located near the image center. Thus, we can detect eye centers directly without face detection. The eye center is one of very good features to estimate a proper face region including a lip. In this paper, we propose a robust approach for lip center detection in various lighting conditions, especially in mobile environments. For last few years, there have been serveral researches about lip detection. One common approach is to use skin color information to dectect the face region in an image and then extract a lip region. In [6] the authors introduced a new color transformation method based on RGB color space for lip segmentation. This method can enhance the discrimination between skin and lips. R. Stiefelhagen et al. [7] studied real time lip tracking for lip reading. They used a stochastic skin color model to detect face regions. Then two eyes are localized by using iterative thresholding. The lip
978-1-4244-3555-5/08/$25.00 ©2008 IEEE
search-region is then extracted based on the eye center location. Finally, lip corners are estimated using horizontal and vertical intergral projections of a gray-scale image. In those methods, the color information is taken into account. However color information is very sensitive with illumination changes. This could lead to low performance in dynamic lighting conditions of indoor and outdoor environments. Our proposed lip detection method is based on the eye localization (figure 1). The centers of left eye and right eye are localized by using facial geometry propeties and GMM validation in Y component image of YCbCr color space. Then a rough lip region can be easily extracted based on the positional relation of eyes, nose and mouth. In a lip candidate region, we apply the adaptive thresholding technique to estimate threshold values used for binarizing lip region. The vertical and horizontal projections integrated with Sobel edge map based filter are used to detect lip center exactly. Facial Image
Eye Localization
Lip Detection
Lip Reading System Fig. 1 Overview system.
In the next parts of this paper, section 2 explains in details the algorithm for lip detection. Experiment results will be discussed in section 3. II.
PROPOSED LIP DETECTION ALGORITHM
Lip detection is one of preprocessing steps in lip reading system. The objective is to detect the lip features such as lip corners, lip width, lip center and so on. So far various approaches have been proposed for lip detection. A common approach is the methods based on face detection, in which
390
face region is detected by skin color information and then the other facial features such as eye, nostril are also detected. Finally lip detection is performed using the information of facial structure. However, skin color is very sensitive with lighting condition varying from indoor to outdoor environment. This can degrade the detection rate in mobile applications. Our proposed method localize positions of left eye and right eye directly using Gaussian probability model, then we can easily get the rough lip region (LR) by using geometry properties of a face. In the lip region, the light may not be the same from left to right, from top to bottom because of the dynamic change of illumination. So we binarize left lip region (LLR) and right lip region (RLR) separately using 2 steps threshold adaptation. The standard lip threshold is calculated based on median value of lip region and this value is adaptively used to estimate appropriate threshold for left and right lip region. From binarized lip image, we can use vertical and horizontal projection to detect lip center or other lip features. However, the rough lip region may include other objects such as: nose, chin and also shadow. Our experience shows that the shadow of chin always causes wrong detection. So we apply Sobel edge map based filter to reject distribution of shadow in vertical projection. Figure 2 shows overview of lip detection process.
and keep possible ones. Finally, we use GMM to validate eye candidates and pick up the eye couple with the highest probability. Because the eye detection is applied to a lip reading system, near-exact eye centers are acceptable instead of precise eye centers. So a center point of eye regions can be considered to be the location of eyes. The eye localization method is summarized as in figure 3. Input RGB Image
Y Image
Eye Candiates Segmentation
Eye validation using GMM
Eye Center Fig. 3. Flow chart of eye localization.
Eye centers
2. Rough Lip Region
Lip Region Binarization
Lip Center Estimation
In mobile lip-reading applications, the face is assumed locating at the center of a screen or center of an image. So we can calculate a center mask (figure 5) and assume that this mask region contains skin information of face. The threshold (LipTh) is computed based on the intensity median value within this mask region. In our study, the experience shows that appropriate thresholds and median values are distributed correllately (figure 4). We approximate this relation by introducing a linear function as equation 1. 140
Lip Center & Lip Width
1.
Eye localization
In our previous work, we proposed a new robust eye localization algorithm [8] applicable to a lip reading system. The image is converted to the YCbCr color space first. Then it is binarized using a threshold technique. As the result, we segment eye candidate regions such as: eyes, eyebrows, nostrils, lips, spectacles, hair, ears, and gaps. Then geometry characteristics of eyes are applied to reject non-eye regions
978-1-4244-3555-5/08/$25.00 ©2008 IEEE
120 Lip Threshold
Fig. 2. Lip center detection process.
Get threshold for lip region by threshold adaptation
100 80 60 40 20 0 0
50
100
150
200
250
Median Fig. 4. Correlative distribution of median value and lip threshold.
391
LipTh = 0.584 ∗ Median + 15.10
(1)
Fig. 5. Center mask.
3.
Crop rough lip region based on facial geometry properties
In faces the left eye and right eye are symmetrical through a vertical axis that usually goes through nose and mouth. The inner distance between two eyes equals approximately to that of a lip width; also, the outer distance equals approximatetly to the distance from the center of two eyes to the mouth center (figure 6). So we can take advantage of these characteristics to determine the potental region containing a mouth. Figure 7 shows a typical rough lip region.
The illumination change is one of difficulties in image processing or computer vision problems. In our propposed approach, an adaptive threshold technique is introduced to binarize the lip region (LR). This method can deal with various lighting conditions. Especially in outdoor environments, lighting can come from the left direction, for example, so the left part of a face is usually brighter than the right one. A lip region is divided into two parts: the left region (LLR) and the right region (RLR). The threshold for each part (LTh and RTh) is computed based on the threshold LipTh acquired in the previous step. Then we use these threshold values to binarize the left lip region and right lip region by a simple logical operator (equation 2,3,4). LBL = LLR < 𝐿𝑇ℎ (LBL: Left Binaried Lip) (2) RBL = RLR < 𝑅𝑇ℎ (RBL: Right Binaried Lip) (3) BL = LBL ∪ RBL (BL: Binarized Lip) (4) We make this binarization operator iteratively until we catch the stop condition. The criteria for stop condition is measured using the ratio of the number of pixel 1, called A, and the total pixels in lip region, called T. Our experience shows that the ratio should belong to the range fron 0.02 to 0.5 in order to get good binarization result (figure 10). Figure 8 and 9 explains our proposed binarization algorithm. LR LLR
RLR
Get threshold for left and right lip
LTh
RTh
Binarization Operator
Fig. 6. Distance relation of eye and mouth.
LBL
RBL BL <0.05
>0.2 LipTh-10
Fig. 7. Rough cropped lip region.
Ratio=A/T
LipTh+10
Binarized Lip Region Fig. 8. Adaptive lip region binarization algorithm.
4.
Binarize lip region using adaptive threshold technique
978-1-4244-3555-5/08/$25.00 ©2008 IEEE
392
Compute median value of left (LM) and right (RM) lip reigon. If abs(LM-RM) less than 20, then LTh and RTh equal to LipTh; Else, If LM> RM, then LTh= LipTh-a; RTh= LipTh+a ; EndIf If LM< RM, then LTh= LipTh+a; RTh= LipTh-a ; // a: constant EndIf EndIf Fig. 9. Algorithm for getting threshold for left lip and right lip.
(a)
(c)
(b)
Fig. 11 a: lip region with chin; b: binarized lip; c: vertical projection.
The edge map of lip image is obtainted by using Sobel edge detector. As the result, edge map just contains boundary of mouth and other edges such as nostril, chin and so on (figure 13) .The chin-edges can be removed from edge map by using curvature information of edges. Normally, the curvature of lip’s curve is smaller than chin’s curve, so the near neighbors from left side and right side of chin-edges usually are background pixels. We can take advantage of this property to remove pixels that their neighbors are background. Figure 12 shows defined neighbor pixels in our study. For each edge pixel, we consider left and right neighbor pixels. If those pixels are not edge pixels, the current pixel will be changed to background. As the result, we have a better edge map without chinedge. This edge map is projected vertically and normalized to create a filter, called edge map based filter (figure 14).The product of vertical projection of orginal binarized lip image and this filter creates a new projection that gives a better meaning to detect interesting points of lips (figure 16). The vertical position of lip center is considered the maximum point of filtered vertical projection. And we can use this position to crop a potential lip region to detect two lip corners by horizontal projection and also lip width (figure 17). Edge pixels
Neighbor pixels
Fig. 10. Lip and binarized lip samples. Neighbor pixels
5.
Estimate lip center using Sobel edge map based filter
The common approach for detecting lip features such as lip conner, upper lip, lower lip... is that using vertical and horizontal projection of binarized lip region. But in some cases the rough lip region could contain chin and its shadow (figure 11). This can lead to wrong detection because the shadow of chin distributes much in both projections. So we propose edge map filter to remove this kind of unnecessary information.
978-1-4244-3555-5/08/$25.00 ©2008 IEEE
Fig. 12. Neighbor pixels.
Fig. 13. Edge map image.
393
videos, the lighting changes very much and comes from many directions, for examples: left side, right side, up to down and so on (figure 18). Table 1 shows the detection rates and figure 19-21 show typical detection results with different lip status and different environments.
(a)
(b)
Fig. 14 a: edge map; b: filter
Fig. 15. Normal vertical projection.
Fig. 18. Some outdoor samples with different lighting conditions. (b) (c) (a) Fig. 16. a: filter; b: normal projection; c: filtered projection.
Table 1. Detection rate.
Standard DB 97.14%
Indoor DB 95.24%
Opened mouth
Outdoor DB 94.29%
Number of samples 105
Opened mouth
Fig. 17. Horizontal projection of cropped lip region
III.
EXPERIMENT RESULTS
In our research, we recorded videos of 105 different persons in standard, indoor and outdoor environments. For each person we recorded 110 videos associated with 110 speaking words. This data is used to construct training data in building eye GMM model and also validate our approach for both eye localization and lip detection. For outdoor recorded
978-1-4244-3555-5/08/$25.00 ©2008 IEEE
Closed mouth
Semi Opened mouth
Fig. 19. Detection results - Standard DB.
394
region. An feasible threshold adaptation technique was introduced to binarize the lip region, left and right separately. We finally estimate the lip center by using Sobel edge map based filter integrated with vertical and horizontal projections. This technique can overcome the drawbacks of illumination changes. The experiment shows that our approach gives good detection rate with various lighting conditions and many kinds of skin color. REFERENCES Closed mouth
Closed mouth
Opened mouth
Semi opened mouth
Fig. 20. Detection results - Indoor DB.
Closed mouth
Closed mouth
[1] K. Mase, A. Pentland, Automatic Lipreading by Optical Flow Analysis, Proceedings of System and Computers in Japan, pp. 67-76, 1991. [2] E. D. Petajan, Automatic Lipreading to Enhance Speech Recognition, Proceedings of IEEE Communications Society Global Telecom, 1984. [3] D. G. Stork, G. Wolff, E. Levine, Neural Network Lipreading System for Improved Speech Recognition, Proceedings of IJCNN, 1992. [4] P. Duchnowski, U. Meier, A. Waibel, See Me, Hear Me: Intergrating Automatic Speech Recognition and Lipreading, Proceedings of ICSLP, 1994. [5] C. Bregler, H. Hild, S. Manke, A. Waibel, Improving Connected Letter Recognition by Lipreading, Proceedings of IEEE International Conference on Acoustic, Speech, Signal Processing, pp. 557-560, 1993. [6] N. Eyeno, A. Caplier, P. Y. Coulon, A New Color Transformation for Lips Segmention, Proceedings of IEEE Fourth Workshop on Multimedia Signal Processing, pp. 3-8, 2001. [7] R. Stiefelhagen, U. Meier, J. Yang, Real-Time Lip-Tracking for Lip Reading, Proceedings of Eurospeech 97, 5th European Conference on Speech Communication and Technology, 1997. [8] T. T. Pham, J. Y. Kim, S. Y. Na, S. T. Hwang, Robust Eye Localization for Lip Reading in Mobile Environment, Proceedings of SCIS&ISIS in Japan, 2008.
Opened mouth
Semi opened mouth
Fig. 21. Detection results - Outdoor DB.
IV.
CONCLUSION
In this study, we proposed a simple algorithm for lip center detection applicable for lip reading system in mobile environments. The eye centers are firstly localized and the geometry properties of face are then taken to extract rough lip
978-1-4244-3555-5/08/$25.00 ©2008 IEEE
395