Video-Based Face Detection Using Dynamic Template Matching A Thesis Submitted to the Council of the Faculty of Science and Science Education, School of Science, University of Sulaimani, in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science
By
Yusra Ahmed Salih Higher Diploma in Computer Science, 2008 Supervised By
Dr. Aree Ali Mohammed Lecturer
January 2012
Rêbendan 2711
Dedication I dedicate this work to: My lovely family My dear parents My loving sisters and brothers My faithful friends, the companions of the long road … With my love Yusra A. Salih
Acknowledgements Praise be to Allah for His grace in allowing me to finish this work. First of all, I would like to express my deepest gratitude and appreciation to my supervisor, Dr. Aree Ali Mohammed, for his valuable advice, academic guidance, and patience throughout my thesis preparation at the University of Sulaimani. I would like to thank the Dean of Computer Science Institute, Dr. Soran A. Mohammed and the Head of Network Department, Mr. Harith Raad. I would like to thank the Dean of the Faculty of Science and Education Sciences, University of Sulaimani and the Head of Computer Science Department, Dr. Kamaran Hama Ali. I would like to express my sincere gratitude to Prof. Dr. Astrid Laubenheimer, for her invaluable guidance, encouragement and providing me with important feedback throughout this thesis. A great deal of thanks goes to Dr. Joachim Lembach, Director of International Office, Mr.Hans Wünstel, and Mr. Rebwar Omer for their kindness and support through my staying period in Germany. I wish to express my gratitude to the members of LFM Laboratory and students in Information Technology Department, Karlsruhe University of Applied Sciences, Germany, for participating in our test videos and face database. It is impossible to remember all, and I apologize to those I have inadvertently left out. Lastly, thank you all!
Abstract Human face processing techniques based video, including face detection, tracking, and recognition, has attracted a lot of research interest because of its value in various applications, such as video surveillance, structuring, indexing, retrieval, and summarization. To improve the outcomes of face detection method which developed by Viola-Jones in the video, different algorithmic strategies are designed and implemented. The first improvement is done by minimizing false positive alarms type I using manual thresholding algorithm. On other hand to reduce the miss detection (false negative) of the faces, a face tracking based template matching is used. Finally a dynamic thresholding algorithm is applied to minimize the false positive alarms type II. An optimization of the detection time is achieved for the hybrid detection and tracking phases by embedding and implementing a region of interest approach. In this thesis, hybrid face detection and tracking scheme based on dynamic template matching is proposed. At first, the detection is performed in each frame and then the tracker is applied on the detected faces in next frames. The conducted test results of the proposed face detection method on the two different video scenarios indicate that the performance of the hybrid face detector and tracker is much better than the Viola-Jones detector in terms of detection rate and detection time. For optimal parameters the detection rate of our proposed hybrid method is (97.2%) and the average detection time is 151.217 ms. The disadvantage of the proposed FDM is that it is not applicable to the different video scenes including fast person movement and lighting environments. It is also not used for multiple facial positions. I
Table of Contents Chapter One : General Introduction 1.1
Introduction………………………………………………….. 1
1.2
Face Detection…………………...…......………………….. 2
1.3
Object Tracking……...……………………………………… 4
1.4
Face Recognition…………...………………………………. 5
1.5
Literature Survey……………………………………………. 5
1.6
Aim of the Thesis..........................................................
1.7
Research Limitation………………………………………… 9
1.8
Thesis Layout…………………..…………………………… 10
9
Chapter Two : Face Detection, Tracking and Recognition Concepts 2.1
Introduction......................................................................
11
2.2
Still Image Face Detection…………………………………
11
2.3
Face Detection Methods…………………………………… 12
2.3.1 2.4
Appearance-Based Methods……………………………...
13
The Viola - Jones Face Detector Method…………….
14
2.4.1
Integral Image…...………………………….………………. 14
2.4.2
The Modified AdaBoost Algorithm………………………... 16
2.4.3
The Cascaded Classifier ……………………….…………. 19
2.4.4
Implementation of the Viola – Jones Detector using 21 OpenCV………………………………………………………
2.5
Face Tracking……………………………………………….. 22
2.5.1
Basic Mathematical Framework…………………………... 23
2.5.2
Face Tracking Methods……………………………………. 24
2.5.2.1
Template Matching Approach……………………………... 25 II
2.6
Face Recognition…………………………………………… 27
2.6.1
Face Recognition Techniques…………………………….. 27
2.6.1.1
Principal Component Analysis (PCA)…………………….. 28
2.6.1.2
Mathematical Theory…………......................................... 29
2.6.2
Face Recognition Applications.……………………….......
33
Chapter Three : Design and Implementation of the Proposed Face Detection Method 3.1
Introduction………………………………………………… 34
3.2
Face Detector Scheme…………………………………… 35
3.3
Face Tracking Scheme…………………………………… 38
3.3.1
False Positive Reduction…………………………………. 39
3.3.1.1
Manual Threshold Algorithm……………………………..
3.3.1.2
Dynamic Threshold Algorithm…………………………… 44
3.3.2 3.4
False Negative Reduction………………………………..
40
47
Face Recognition Scheme……………………………….. 49
Chapter Four : Test Evaluation 4.1
Introduction………………………………………………… 50
4.2
Video Scenarios Description…………………………….. 50
4.3
Test Results……………………………………………….. 53
4.3.1
Effect of Involved Parameters (Scenario 1)……………
53
4.3.1.1
Window Size……………………………………………….
53
4.3.1.2
Neighbor Threshold……………………………………….
54
4.3.1.3
Scale Factor……………………………………………….. 56
4.3.1.4
Result Discussion…………………………………………
57
Effect of Involved Parameters (Scenario 2)……………
59
4.3.2
III
4.3.2.1
Window Size………………………………………………
59
4.3.2.2
Neighbor Threshold……………………………………….
60
4.3.2.3
Scale Factor……………………………………………….. 61
4.3.2.4
Scenario 2 Results Evaluation…………………………... 62
4.4
Improved Detection Rates (Scenario 1)………………… 62
4.4.1
Reduce
False
Positive
Type
I
Using
Manual
Thresholding.................................................................. 63 4.4.2
Reduce False Negative Using Template Matching........ 63
4.4.3
Reduce False Positive Type II Using Dynamic Thresholding.................................................................. 64
4.4.4
Improvement of Detection Time..................................... 64
4.4.4.1
Face Detection Time..................................................
64
4.4.4.2
Face Tracking Time....................................................
65
Face Recognition………………………………………….
65
4.5
Chapter Five : Conclusions and Suggestions for Future Work 5.1
Conclusions...................................................................
5.2
Suggestions for Future Work…………………………….. 68
References
………………………………………………………………
IV
67
69
List of Figures
Figure
Figure Name
Page
No.
No.
1.1
Video based face recognition system………………………
2
2.1
General scheme for face detection methods……………… 13
2.2
The integral image……………………………………………. 14
2.3
Sum calculation……………………………………………..... 15
2.4
The different types of features………………………………. 15
2.5
The classifier cascade……………………………………….. 20
2.6
Steps of the Haar training process………………………..... 21
2.7
Classification of face recognition methods………………… 28
2.8
a) An example of dominant Eigenfaces b) An average face…………………………………………… 30
2.9
Calculating Eigenfaces………………………………………
2.10
Face verification procedure using M
3.1
General block diagram of the proposed FDM……………... 35
3.2
Face detection scheme………………………………………
36
3.3
Face detection process………………………………………
37
3.4
Face tracking process………………………………………..
39
3.5
False positive alarms………………………………………… 40
3.6
False positive alarm between two successive
t
Eigenfaces………… 32
frames………………………………………………………….
V
32
41
3.7
Window's face width values after running the face detector………………………………………………………… 42
3.8
Average window's face width……………………................. 42
3.9
Manual threshold process…………………………………… 43
3.10
Unsolved types of false positive alarms……………………. 44
3.11
Euclidian distance measure…………………………………. 45
3.12
Dynamic threshold algorithm………………………………... 46
3.13
Template matching based tracking…………………………. 48
3.14
The framework of face recognition process……………….. 49
4.1
Video scenario 1……………………………………………… 51
4.2
Video scenario 2……………………………………………… 52
4.3
Neighbor threshold setting to zero………………………….. 55
4.4
DR versus WS…………………………………….................. 58
4.5
DR versus NT…………………………………………………. 58
4.6
DR versus SF…………………………………………………. 58
4.7
Training phase in face recognition system………………… 66
4.8
Test phase in face recognition system……………………..
VI
66
List of Tables Table No.
Table Name
Page No.
4.1
Video scenarios description……………………………… 52
4.2
Effect of involved parameters……………………………. 53
4.3
Effect of window size on detection rate and time (Scale factor = 1.2 and Neighbor threshold = 1)………. 54 Effect of window size on detection rate and time (Scale factor = 2.2 and Neighbor threshold = 2)………. 54
4.4 4.5
Effect of window size on detection rate and time (Scale factor = 3.2 and Neighbor threshold = 3)………. 54
4.6
Effect of neighbor threshold on detection rate and time (Scale factor = 1.2 and Window size =20×20)…………. 55
4.7
Effect of neighbor threshold on detection rate and time (Scale factor = 2.2 and Window size =30×30)…………. 56
4.8
Effect of neighbor threshold on detection rate and time (Scale factor = 3.2 and Window size =40×40)…………. 56
4.9
Effect of scale factor on detection rate and time (Neighbor threshold = 1 and Window size=20×20)……
56
Effect of scale factor on detection rate and time (Neighbor threshold = 2 and Window size=30×30)……
57
Effect of scale factor on detection rate and time (Neighbor threshold = 2 and Window size=40×40)……
57
4.10 4.11 4.12
Effect of window size on detection rate and time (Scale factor = 1.2 Neighbor thresholds = 1)…………... 59
4.13
Effect of window size on detection rate and time (Scale factor = 2.2 Neighbor thresholds = 2)…………... 60 VII
4.14
Effect of window size on detection rate and time (Scale factor = 3.2 Neighbor thresholds = 3)…………... 60
4.15
Effect of neighbor threshold on detection rate and time (Scale factor = 1.2 and window size (20×20)…………..
60
Effect of neighbor threshold on detection rate and time (Scale factor = 2.2 and window size (30×30)…………..
61
Effect of neighbor threshold on detection rate and time (Scale factor = 3.2 and window size (40×40)…………..
61
4.16 4.17 4.18
Effect of scale factor on detection rate and time (Neighbor threshold = 1 and window size (20×20)……. 61
4.19
Effect of scale factor on detection rate and time (Neighbor threshold = 2 and window size (30×30)……. 62
4.20
Effect of scale factor on detection rate and time (Neighbor threshold = 3 and window size (40×40)……. 62
4.21
False positive type I reduction…………………………… 63
4.22
False negative reduction…………………………………. 63
4.23
False positive type II reduction…………………………..
4.24
Detection time with and without ROI……………………. 65
4.25
Tracking time with and without ROI……………………..
VIII
64
65
List of Abbreviations Abbreviation ATM CCTV DNA
Acronyms Automated Teller Machine Closed Circuit Television Deoxyribo Nucleic Acid
DR
Detection Rate
DT
Detection Time
EGM
Elastic Graph Matching
FDM
Face Detection Method
FN
False Negative
FP
False Positive
GDT
Generalized Distance Transform
HMM
Hidden Markov Model
ICA
Independent Component Analysis
LDA
Linear Discriminant Analysis
NT
Neighbor Threshold
OM
Orientation Map
OpenCV
Open Source Computer Vision
PCA
Principal Component Analysis
PIN
Personal Identification Number
ROI
Region of Interest
SAD
Sum Absolute Difference
SDM
Square Difference Measure
SF
Scale Factor
SIM
Subscriber Identification Module
SSD
Sum Squared Difference
SVM
Support Vector Machine
TP
True Positive
IX
WS
Window Size
XML
Extensible Markup Language
X
Chapter 1
General Introduction
Chapter One General Introduction 1.1 Introduction The use of different biometric systems is becoming usual in our society. In their objective of determining the identity of one person, several characteristics can be analyzed. For instance, there are physical features as fingerprints, iris, retina, face, hand geometry, hand veins, DNA and psychological features as gait or signature. Face recognition has received significant attention during the last two decades and many researchers study various aspects of it. There are at least two reasons for this; the first one is a wide range of commercial and security applications and the second is the availability of feasible computer technology to develop and implement applications that demand strong computational power. Today, automatic recognition of human faces is a field that gathers many researchers from different disciplines such as image processing, pattern recognition, computer vision, graphics, and psychology [Mar, 09]. Human face research mainly includes human face recognition and human face detection. Human face recognition is defined as identifying or verifying a person from a digital still image or video image, while human face detection is defined as determining the locations and sizes of human faces in images.
In order to do face recognition, people first need to build a database
which includes all kinds of human face images. To verify an image from the image database, the system needs to detect the human face within the image and then analyze all the similarity factors among all the images within the database. And up until now, no one has developed a very mature human face
1
Chapter 1
General Introduction
recognition system with high accuracy and speed. But, in any human face processing system, the first step is to detect human faces. So face detection actually began as an independent topic itself because of the requirement of human face recognition system [GU, 08]. Figure (1.1) shows the whole process of video based face recognition system.
Fig (1.1) Video based face recognition system.
1.2 Face Detection Face detection systems identify faces in images and video sequences using computers. An ideal face detection system should be able to identify and locate all faces regardless of their positions, scale, orientation, lightning
2
Chapter 1
General Introduction
conditions, and expressions and so on. Due to the large intra-class variations in facial appearances, face detection has been a challenging problem in the field of computer vision [Jor, 06]. Face detection is the first stage of a face recognition system. A lot of research has been done in this area, most of that is efficient and effective for still images only. So it could not be applied to video sequences directly. In the video scenes, human face can have unlimited orientations and positions, so its detection is of a variety of challenges to researchers. Generally, there are three main processes for face detection based on video. At first, it begins with frame based detection. During this process, lots of traditional methods for still images can be introduced such as statistical modeling method [Mog, 97], neural
network - based
BOOST method [Vio, 01], color-based
method [Row, 98],
face detection [Hsu, 02], etc.
However, ignoring the temporal information provided by the video sequence is the main drawback of this approach. Secondly, integrating detection and tracking, it implies face detection in the first frame and then tracking it through the whole sequence. Since detection and tracking are independent and information from one source is just in use at one time, loss of information is unavoidable. Finally, instead of detecting each frame, temporal approach exploits temporal relationships between the frames to detect multiple human faces in a video sequence. In general, such method consists of two phases, namely detection
and
prediction
by update - tracking. This helps to stabilize
detection and to make it less sensitive to thresholds compared to the other two detection categories [Wan, 09].
3
Chapter 1
General Introduction
1.3 Object Tracking Object tracking is an important task within the field of computer vision. The proliferation of high-powered computers, the availability of high quality and inexpensive video cameras, and the increasing need for automated video analysis have generated a great deal of interest in object tracking algorithms. There are three key steps in video analysis: detection of interesting moving objects, tracking of such objects from frame to frame, and analysis of object tracks to recognize their behavior. In its simplest form, tracking can be defined as the problem of estimating the trajectory of an object in the image plane as it moves around a scene. Additionally, depending on the tracking domain, a tracker can also provide object-centric information, such as orientation, area, or shape of an object. Object tracking approaches can be separated into three main groups [Yil, 06]: 1. Point Tracking. An object can be represented by a number of points; those correspondences over consecutive frames are tracked. The points are combined in a kind of model of the object and the correspondences are evaluated over a number of constraints, such as motion models. An example for point tracking is Kalman filter [Bro, 86]. 2. Kernel Tracking. Kernel refers to the object shape and appearance. For example, the kernel can be a rectangular template or an elliptical shape with an associated histogram. This group of object tracking methods computes the motion of an object in order to track it from one frame to the next. The appearance models can be separated into template based or density based models. Popular examples for this method are TemplateTracking [Hag, 96], Meanshift-Tracking [Com, 03]. 3. Silhouette Tracking. The object is tracked via estimation of the object region in each frame. This can be done by shape matching or contour
4
Chapter 1
General Introduction
evolution. Recent approaches of this group are [Yil, 04] for contour evolution based methods or [Kang, 04] for shape matching.
1.4 Face Recognition Face recognition is the ability to establish the person identity based on facial characteristics. Automated face recognition requires various techniques from different research fields, including computer vision, image processing, pattern recognition, and machine learning. In a typical face recognition system, face images from a number of subjects are enrolled into the system as gallery data, and the face image of a test subject (probe image) is matched to the gallery data using a one-to-one or one-to- many schemes. The one-to-one and one-to-many matching are called verification and identification, respectively. Face recognition has a wide range of applications, including law enforcement, civil applications, and surveillance systems. Most of the algorithms demonstrate promising research while dealing with still images,
which
include
Principal
facial
Component Analysis (PCA), Linear
Discriminated Analysis (LDA), and Elastic Graph Matching (EGM) and so on. Compared with still images, video can provide more information, such as temporal information. Therefore, video-based face recognition has gained more attention recently [Par, 09].
1.5 Literature Survey Several researches on face detection and recognition have been published within the last 10 years. Some of the relevant published works are listed and annotated below:
5
Chapter 1
General Introduction
1. P.Viola and M. Jones [Vio, 01] presented a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection. 2. Z. Jin et al. [Jin, 05] proposed a face detection algorithm which combines template matching technique with skin color information in segmenting eye-pair candidates via a linear transformation and in making a face/nonface decision. Experiments showed that the proposed algorithm was effective and efficient in detecting frontal faces with different races and under different lighting conditions beside that the proposed algorithm fails in detecting faces with big pose changes and fails in detecting very small faces because of the failures to segment eye-pairs. 3. H. Lee and D. Kim [Lee, 07] proposed a robust face tracking method based on the condensation algorithm and importance sampling. They presented two separate trackers which used skin color and facial shape information as the measured cues, respectively. They also proposed an adaptive color model based on the condensation algorithm to handle an illumination change during face tracking. Experiments showed good robustness even other skin-colored objects or other faces appeared in the image, and even with a cluttered background or changing illumination. Compared with other face tracking methods, which use only skin color as the measurement cue, it tracks better and more robustly. 4. E. Ben-Israel [Isra, 07] experimented with basic computer vision algorithms related to tracking, and used them to implement a simple human tracker based on OpenCV mean-shift tracker. Advances from this
6
Chapter 1
General Introduction
work can be done in two major aspects: (1) use of automatic generation of
the
human
template for
the
mask generation and
(2)
improvement of the tracking phase by choosing a smaller tracking region, possibly based
on
the
segmentation
data
from
the
initialization phase. 5. K. Nallaperumal et al. [Nal, 08] proposed a new face detection technique, which used mixed Gaussian color model, adaptive threshold, and template matching techniques. The proposed face detection algorithm involves three stages: Applying mixed Gaussian color model, adaptive threshold algorithm, and template matching. Experimental results showed that the algorithm was quite practical and faster in comparison to the techniques such as neural networks and other techniques. 6. H. Ryu et al. [Ryu, 08] proposed a face tracking approach based on template matching for applying a video indexing application. First, the face template is represented by two projection histograms of the face region and matching methods are used to determine the candidate face region. Next, the facial features are extracted from the candidate face region and a dissimilarity measure. The template to determine the face region in the next frame is dynamically updated using the refined face region. Thus, the proposed method can adapt to scale any changes with less computational cost. 7. Y.W. Wu and X.Y. Ai [Wu, 08] proposed a method to improve the performance for detecting faces in color images. This improvement is achieved by integrating the AdaBoost learning algorithm with skin color information. The complete system is tested on a variety of color images and compared with other relevant methods. Experimental results show that
7
Chapter 1
General Introduction
proposed system leads to competitive results and improves detection performance substantially. 8. N. D.Thanh et al. [Tha, 09] proposed a novel weighted template matching method. It employed a generalized distance transform (GDT) and an orientation map (OM). Based on the matching method, a two-stage human detection method consisting of template matching and Bayesian verification was developed. Experimental results showed that the proposed method can effectively reduce the false positive and false negative detection rates. 9. D. Chen et al. [Che, 10] proposed an approach using Skin-Color model to integrate the AdaBoost algorithm, and then using subsidiary discriminated function to improve the detection process. They used the theory of Haarlike features, Integral image and AdaBoost algorithm, which was proposed by Paul Viola, and then researched its improvement. The experimental results showed that improved algorithm can detect the real-time face better, having higher success rate and lower false alarm rate. 10. Q. Zhao and H. Cai [Zha, 10] used the
method
of
difference
in
background images and the Kalman filter to track and extract the region of human body firstly, and then used the AdaBoost algorithm to detect human face in the region. Finally the improved Hidden Markov Model which is named as Pseudo-two-dimensional Hidden Markov Model was used for
face
image
feature
extraction and recognition of face image.
Experiments demonstrated that the method could extract the motive human
body
in
the
video and then detect and recognize the face
effectively and also showed
that
the
effect of recognition
was
impacted by the lighting when the video was captured. In addition, the
8
Chapter 1
General Introduction
speed of human motion is one of the influencing factors. Too fast or too slow also would lead to the failure in extraction of the body contour. 11. K. Ramirez et al. [Ram, 11] proposed a face recognition and verification algorithm based on histogram equalization to standardize the faces illumination reducing in such way the variations for further features extraction; using the image phase spectrum and principal components analysis, which allow the reduction of the data dimension without much information loss. Evaluation results showed the desirable features of proposal scheme reaching recognition rate over 97% and a verification error lower than 0.003%.
1.6 Aim of the Thesis The aim of this research is to develop and implement an efficient real time video based face detection method. At first a face is detected in each frame based on machine learning algorithm and then the face is tracked using dynamic template matching. The moving objects (faces) within video scenes must be detected and tracked with minimum false positive and false negative alarms. This minimization leads to improve the face detection rate.
1.7 Research Limitation Although video based face detection is a difficult problem in general, it can be solved by imposing some limited conditions as follows: 1) A user is cooperative for identification: the user walks up to a camera and standing toward, it is required for creating image database for single face recognition.
9
Chapter 1
General Introduction
2) Only some limited variations are considered in tilt and in-depth rotation from a frontal face.
1.8 Thesis Layout Apart from the current chapter, this thesis is organized into five chapters which are: Chapter two, “Face Detection, Tracking and Recognition Concepts”, presents the theoretical background of face recognition and face detection. Moreover, object tracking, Harr-Like Feature Method, Principal Component Analysis algorithm and dynamic template matching algorithm are detailed. Chapter three, “Design and Implementation of the Proposed Face Detection Method”, describes the design steps of face detection in video through block and flowchart diagrams. The designed system is then applied in real time application. Chapter four, “Test Evaluation”, presents the test results to reflect the accuracy of the proposed face detection method. The test material was consisting of two videos produced with invariant illumination and tilt to in depth rotation from a frontal face. Chapter five, “Conclusions and Suggestions for Future Work”, is devoted to a presentation of some of the conclusions derived from the test results. The chapter also contains some suggestions for future work.
10
Chapter Two Face Detection, Tracking and Recognition Concepts 1.1 Introduction Face sequence detection can be divided into two main phases: face detection and face tracking. Most of the related studies are looking for only one of the two topics or combine them together depending on the previous researches. Face detection is usually made over still images and many tracking algorithms are presented with manually given, initial object locations and focus on following the track of a single object. Thus, combining these two approaches is not as straightforward as it might seem and has some special demands that need to be taken into account when designing a full sequence detection algorithm [Sar, 10]. This chapter describes the methods and theory that have been used to carry out the implementation of our video –based face recognition system in three parts. The first part describes the methods used to implement ViolaJones detector. The second part describes the principle of template matching method and the third part concerns the mathematical theory behind the principal component analysis for face recognition.
2.2 Still Image Face Detection Given an arbitrary image, the aim of face detection is to find a face in the image and, if present, return the location of the image and extent of each 11
Chapter 2
Face Detection, Tracking and Recognition Concepts
face [Yang, 02]. Face detection has a number of applications, namely it can be part of face recognition, a surveillance system, or video based computer machine interface. Efficient face detection at frame rate is an impressive goal; it is analogue to face tracking that requires no knowledge of previous frames. Fast face detection has an apparent application to practical face tracking in the sense that it can be used to initialize tracking [Sil, 05]. Given a single image, the goal of face detection is to identify all image regions which contain a face regardless of its position, orientation and the lightning condition. Such a problem is challenging because faces are nonrigid and have a high degree of variability in scale, location, orientation (upright, rotated) and poses (frontal, profile). Facial expression, occlusion and lightning conditions also change the overall appearances of faces. Two important characteristics for a trained face detector are its detection and error rates. The detection rate is defined as the ratio between the number of faces correctly detected and the number of faces determined by a human. In general, two types of errors can occur: False negatives in which faces are missed, resulting in low detection rate. False positives in which an image region is declared to be a face, but it is not. The detection and false positive rates are normally related since one can often tune the parameters of the detection system to increase the detection rates while also increasing the number of false detections [Jor, 06].
2.3 Face Detection Methods The existing techniques to detect faces from a single intensity or color image are divided into four major categories: 1. Knowledge-based methods.
12
Chapter 2
Face Detection, Tracking and Recognition Concepts
2. Feature invariant approaches (Facial features, Texture, Skin color and Multiple features). 3. Template matching methods (Predefined face templates and Deformable templates). 4. Appearance-based
methods
(Eigenface,
Distribution-based,
Neural
Network, Support Vector Machine (SVM) and Hidden Markov Model (HMM). Figure (2.1) summaries these methods.
Fig (2.1) General scheme for face detection methods.
2.3.1 Appearance-Based Methods The “templates” in appearance-based methods are learned from examples in images. They rely on techniques from statistical analysis and machine learning to find the relevant characteristics of face and non-face images. The learning characteristics are in the form of distribution models that are consequently used for face detection [Sil, 05].
13
Chapter 2
Face Detection, Tracking and Recognition Concepts
A popular method for appearance-based face detection is to use Haar-like feature and a trained classifier. The idea is to scan a sub-window capable of detecting faces across a given input image. The standard image processing approach would be to rescale the input image to different sizes and then run the fixed size detector through these images. Viola - Jones have devised a scale invariant detector that requires the same number of calculations whatever the size.
2.4 The Viola - Jones face detector Method This is a widely used algorithm with a strong mathematical base. Paul Viola was the first to develop the Haar cascade object detector [Vio, 01] that was improved by Rainer Lienhart [Lie, 02]. The idea here is to first train a classifier with a number of sample views of an object. The next sections elaborate on this detector. 2.4.1 Integral Image The first step of the Viola-Jones face detection algorithm is to turn the input image into an integral image. This is done by making each pixel equal to the entire sum of all pixels above and to the left of the concerned pixel. This is demonstrated in Figure (2.2).
Fig (2.2) The integral image
14
Chapter 2
Face Detection, Tracking and Recognition Concepts
This allows for the calculation of the sum of all pixels inside any given rectangle using only four values. These values are the pixels in the integral image that coincide with the corners of the rectangle in the input image. This is demonstrated in Figure (2.3). Sum of grey rectangle = D - (B + C) + A
Fig (2.3) Sum calculation.
Since both rectangle B and C include rectangle A the sum of A has to be added to the calculation. It has now been demonstrated how the sum of pixels within rectangles of arbitrary size can be calculated in constant time.
The Viola-Jones face detector analyzes a given sub-window using
features consisting of two or more rectangles. The different types of features are shown in Figure (2.4).
Type 1
Type 2
Type 3
Type 4
Fig (2.4) The different types of features [Ara, 08].
15
Type 5
Chapter 2
Face Detection, Tracking and Recognition Concepts
Each feature results in a single value which is calculated by subtracting the sum of the white rectangle(s) from the sum of the black rectangle(s). Viola-Jones have empirically found that a detector with a base resolution of 24*24 pixels gives satisfactory results. When allowing for all possible sizes and positions of the features in Figure (2.4) a total of approximately 160.000 different features can then be constructed. Thus, the amount of possible features vastly outnumbers the 576 pixels contained in the detector at base resolution. These features may seem overly simple to perform such an advanced task as face detection, but what the features lack in complexity they most certainly have in computational efficiency. One could understand the features as the computer’s way of perceiving an input image. The hope being that some features will yield large values when on top of a face. Of course operations could also be carried out directly on the raw pixels, but the variation due to different pose and individual characteristics would be expected to hamper this approach. The goal is now to smartly construct a mesh of features capable of detecting faces and this is the topic of the next section.
2.4.2 The Modified AdaBoost Algorithm As stated above there can be calculated approximately 160.000 feature values within a detector at base resolution. Among all these features some few are expected to give almost consistently high values when on top of a face. In order to find these features Viola-Jones use a modified version of the AdaBoost algorithm developed by Freund and Schapire in [Bis, 06]. AdaBoost is a machine learning boosting algorithm capable of constructing a strong classifier through a weighted combination of weak classifiers. (A weak
16
Chapter 2
Face Detection, Tracking and Recognition Concepts
classifier classifies correctly in only a little bit more than half the cases.) To match this terminology to the presented theory each feature is considered to be a potential weak classifier. A weak classifier is mathematically described
h
1 if pf ( x ) p ( x, f , p , ) 0 otherwise
(2.1)
Where x is a 24x24 pixel sub-window, f is the applied feature, p the polarity and
the threshold that decides whether x should be classified as a positive
(a face) or a negative (a non-face). Since only a small amount of the possible 160.000 feature values are expected to be potential weak classifiers the AdaBoost algorithm is modified to select only the best features. Viola-Jones' modified AdaBoost algorithm is presented in pseudo code [Vio, 04]. Given examples images ( x1 , y1 )… ( xm , y n ) where
y
i
= 0, 1 for
negative and positive examples. Initialize weights where m
w
1 ,i
1 1 , 2m 2l
y
for
i
= 0, 1 respectively,
and l are the numbers of negative and positive
respectively. For t 1,.........., T : 1. Normalize the weights,
w
t ,i
w w t ,i
so that
n
j 1
w
t
is
t ,i
probability distribution. 2. For each feature,
j
, train a classifier
h
j
which is restricted to
using a single feature. The error is evaluated with respect to wt ,
17
Chapter 2
Face Detection, Tracking and Recognition Concepts
j
w h (x ) y
i
j
i
i
i
3. Choose the classifier, ht , with the lowest error 4. Update the weights:
w Where
t 1 ,i
w
t ,i
e 0 if example x i
Otherwise, and
t
1
t
T
An
important
part
of
t
ei
e 1 i
. t
T
t
1
log
the
.
t
1 t h t ( x ) 1 2 t 1 t (x) 0 otherwise
Where
t
is classified correctly,
i
The final strong classifier is:
h
1
modified
t
AdaBoost
algorithm
is
the
determination of the best feature, polarity and threshold. There seems to be no smart solution to this problem and Viola-Jones suggest a simple brute force method.
This means that the determination of each new weak classifier
involves evaluating each feature on all the training examples in order to find the best performing feature. This is expected to be the most time consuming part of the training procedure. The best performing feature is chosen based on the weighted error it produces. This weighted error is a function of the weights belonging to the training examples. As seen in part 4 of the above algorithm the weight of a correctly classified example is decreased and the
18
Chapter 2
Face Detection, Tracking and Recognition Concepts
weight of a misclassified example is kept constant. As a result it is more ‘expensive’ for the second feature (in the final classifier) to misclassify an example also misclassified by the first feature, than an example classified correctly. An
alternative
interpretation is that the second feature is forced to focus harder on the examples misclassified by the first. The point being that the weights are a vital part of the mechanics of the AdaBoost algorithm. With
the
integral
image,
the
computationally
efficient features and the modified AdaBoost algorithm in place it seems like the face detector is ready for implementation[Jen, 08].
2.4.3 The Cascaded Classifier The basic principle of the Viola-Jones face detection algorithm is to scan the detector many times through the same image – each time with a new size. Even if an image should contain one or more faces it is obvious that an excessive large amount of the evaluated sub-windows would still be negatives (non-faces). This realization leads to a different formulation of the problem: Instead of finding faces, the algorithm should discard non-faces. The thought behind this statement is that it is faster to discard a non-face than to find a face. With this in mind a detector consisting of only one (strong) classifier suddenly seems inefficient since the evaluation time is constant no matter the input. Hence the need for a cascaded classifier arises. The cascaded classifier is composed of stages each containing a strong classifier. The job of each stage is to determine whether a given sub-window is definitely not a face or maybe a face. When a sub-window is classified to be a non-face by a given stage it is immediately discarded. Conversely a sub-window classified as a
19
Chapter 2
Face Detection, Tracking and Recognition Concepts
maybe-face is passed on to the next stage in the cascade. It follows that the more stages a given sub-window passes, the higher the chance the subwindow actually contains a face. The concept is illustrated in Figure (2.5).
Fig (2.5) The classifier cascade
In a single stage classifier one would normally accept false negatives in order to reduce the false positive rate. However, for the first stages in the staged classifier false positives are not considered to be a problem since the succeeding stages are expected to sort them out. Therefore Viola-Jones prescribe the acceptance of many false positives in the initial stages. Consequently the amount of false negatives in the final staged classifier is expected to be very small. Viola-Jones also refers to the cascaded classifier as an attention cascade. This name implies that more attention (computing power) is directed towards the regions of the image suspected to contain faces. It follows that when
20
Chapter 2
Face Detection, Tracking and Recognition Concepts
training a given stage, say n, the negative examples should of course be false negatives generated by stage n-1. The majority of thoughts presented in the ‘Methods’ sections are taken from the original Viola-Jones paper [Vio, 04]. 2.4.4 Implementation of the Viola – Jones Detector using OpenCV OpenCV is an open source computer vision library originally developed by Intel [Web, 01]. As discussed in previous section, the “Boosted Cascade of Simple Features” objects detection algorithm introduced by Viola and Jones was built into OpenCV and will be utilized in this thesis. Steps shown in Figure (2.6) were followed to train the Harr classifier.
Fig (2.6) Steps of the Haar training process.
21
Chapter 2
Face Detection, Tracking and Recognition Concepts
The first step is to select positive/negative samples out of the training image database. Negative samples are taken from arbitrary images to avoid the camera variation when testing is performed on different databases. These images do not contain face representations. Negative samples are passed through background description file where each line contains the filename (relative to the directory of the description file) of the negative sample image. This file was created once manually and was used across databases for face detection. When selecting positive samples, samples were selected from different reflections, illuminations and backgrounds of different people to mimic real world scenes. First and foremost, the face was localized to include eyes, eyebrow, nose, mouth and find the number of object instances, locations, face width and height. The face was localized so that the rectangle is close to the object border.
2.5 Face Tracking Face tracking is a crucial part of most face processing systems. It requires accurate target (i.e. face) detection and motion estimation when an individual is moving. Generally, this process is required to facilitate the face region localization and segmentation necessary prior to face recognition. Accurate face tracking is a challenging task since many factors can cause the tracking algorithm to fail. Some of the major challenges encountered by face tracking systems are robustness to pose changes, lighting variations, and facial deformations due to changes of expression and face occlusion. These factors might cause the algorithm to lose track of the subject’s face and drift (i.e. lose face detection for initialization) [Li, 08].
22
Chapter 2
Face Detection, Tracking and Recognition Concepts
2.5.1 Basic Mathematical Framework Here we provide an overview of the basic mathematical framework that explains the process in which most trackers work. Let p R P denote a parameter vector that is the desired output of the tracker. It could be a 2D location of the face in the image, the 3D pose of the face, or a more complex set of quantities that also include lighting and deformation parameters. We define a synthesis function f: R 2 R p R 2 that can take an
image pixel v R 2 at time t 1 and transform it to f v , p at time t. For a 2D tracker, this function f could be a transformation between two images at two consecutive time instants. For a 3D model-based tracker, this can be considered as a rendering function of the object at pose p in the camera frame to the pixel coordinates v in the image plane. Given an input image I v , we want to align the synthesized image with it so as to obtain:
pˆ arg min g f ( v , p ) I ( v ) p
(2.2)
where pˆ denotes the estimated parameter vector for this input image I (v). The essence of this approach is the well-known Lucas-Kanade tracking, an efficient and accurate implementation of which has been proposed using the inverse compositional approach [Bak, 04]. Depending on the choice of v and p, the method is applicable to the overall face image, a collection of discrete features, or a 3D face model. The cost function g is often implemented as a
L 2 norm, i.e., the sum of the squares of the errors over the entire region of interest. However, other distance metrics may be used. Thus a face tracker is often implemented as a least-squares optimization problem.
23
Chapter 2
Face Detection, Tracking and Recognition Concepts
Let us consider the problem of estimating the change, P t ˆ m t in the parameter vector between two consecutive frames,
mˆ
t
arg min g m
f ( v , pˆ
t 1
m)
v
I
t
I
(v) and
t
I
t 1
(v) as:
(v ) 2
(2.3)
and
pˆ
t
pˆ
t
mˆ
(2.4)
t
The optimization of the above equation can be achieved by assuming a current estimate of m is known and iteratively solve for increments m such that
2 f ( v , pˆ t 1 m m ) I t ( v )
(2.5)
v
is minimized.
2.5.2 Face Tracking Methods
Various methods have been proposed to overcome face tracking challenges according to [Wu, 04]; face tracking methods could be classified into three main groups: low level feature approaches, template matching approaches and statistical inference approaches. The low level feature approaches make use of low level face knowledge, such as skin color, background knowledge (background subtraction or rectangular features) or motion information to track faces. The template matching approaches involved tracking contours with snakes, 3D face model matching, shape and face matching and wavelet networks matching. The third tracking category, statistical inference approaches, includes Kalman filtering techniques for unimodal Gaussian representations, Monte Carlo approaches for non- Gaussian nonlinear target tracking [Cor, 07].
24
Chapter 2
Face Detection, Tracking and Recognition Concepts
2.5.2.1 Template Matching Approach
Template matching is a technique used in digital image processing for comparing portions of images between them, sliding the patch or portion over the input image using different methods. Once the patch has been tested in all the possible locations in a pixel-by-pixel basis, a matrix containing a numerical index according to how good the patch matches in each location is created [Cas, 09]. Template matching can be subdivided between two approaches: feature-based and template-based matching. The feature-based approach uses the features of the search and template image, such as edges or corners, as the primary match-measuring metrics to find the best matching location of the template in the source image. The template-based, or global, approach, uses the entire template, with generally a sum-comparing metric (using SAD, SSD, cross-correlation, etc.) that determines the best location by testing all or a sample of the viable test locations within the search image that the template image may match up to. A. Template-Based Approach
For templates without strong features, or for when the bulk of the template image constitutes the matching image, a template-based approach may be effective. As aforementioned, since template-based template matching may potentially require sampling of a large number of points, it is possible to reduce the number of sampling points by reducing the resolution of the search and template images by the same factor and performing the operation on the resultant downsized images (pyramid, image processing), providing a search window of data points within the search image so that the template does not have to search every viable data point, or a combination of both.
25
Chapter 2
Face Detection, Tracking and Recognition Concepts
B. Template Matching Measurements
The "matching error" between the patch and any given location inside the mage where this is being searched can be computed using different methods. This section gives a brief description of each of them. In the following mathematical expressions, I denote the input image, T the template and R the result. 1. Square difference matching. These methods match the squared
difference, which means that the perfect match would be 0 and bad matches would lead to large values. 2 T x , y I ( x x . y y ) R sq _ diff x , y x , y
(2.6)
2. Correlation matching. These methods multiplicatively match the
template against the image, which means that a perfect match would be the largest.
T x , y . I ( x x . y y ) 2 R ccorr x , y x , y
(2.7)
3. Correlation coefficient matching. These methods match a template
relative to its mean against the image relative to its mean. The best match would be 1 and the worst one would be -1. Value 0 means that there is no correlation [Cas, 09].
T ' x , y . I ' ( x x . y y ) R ccoeff ( x , y ) x , y
26
2
(2.8)
Chapter 2
Face Detection, Tracking and Recognition Concepts
T '( x ', y ') T ( x ', y ')
1 ( w .h ) T ( x " , y " ) x" , y"
I '(x x', y y') I (x x', y y')
1 (w.h) I (x x", y y") x", y"
(2.9)
(2.10)
2.6 Face Recognition Face recognition has received considerable interest as a widely accepted biometric, because of the ease of collecting samples of a person, with or without subject’s intension. Face recognition refers to an automated or semi-automated process of matching facial images. This type of technology constitutes a wide group of technologies which all work with face but use different scanning techniques. Most common by far is 2D face recognition which is easier and less expensive compared to the other approaches [Kim. 01].
2.6.1 Face Recognition Techniques
All available face recognition techniques can be classified into four categories based on the way they represent face; 1. Appearance based which uses holistic texture features. 2. Model based which employs shape and texture of the face, along with 3D depth information. 3. Template based face recognition. 4. Techniques using neural networks. Figure (2.7) summaries these types:
27
Chapter 2
Face Detection, Tracking and Recognition Concepts
Fig (2.7) Classification of face recognition methods.
2.6.1.1 Principal Component Analysis (PCA)
PCA is a way of identifying patterns in data and expressing the data in such a way to highlight their similarities and differences.The purpose of PCA is to reduce the large dimensionality of the data space (observed variables) to smaller intrinsic dimensionality of feature space (independent variables) which are needed to describe the data economically[Sar, 05].
28
Chapter 2
Face Detection, Tracking and Recognition Concepts
2.6.1.2 Mathematical Theory
Principal Component Analysis (PCA) has been widely adopted to capture the face space in a low-dimensional feature space: the Eigen space. Principal Component Analysis (PCA) is a classical technique for multivariate analysis. Let a face image I(x, y) be a two-dimensional N by N array of intensity values, or a vector of dimension N 2 . An image of 100 by 100 sizes can be represented by a vector of dimension 10,000, or, equivalently, a point in 10,000 dimensional spaces. An ensemble of images, then, maps to a collection of points in this huge space. Images of faces, being similar in overall configuration, will not be randomly distributed by a relatively low dimensional subspace. The main idea of the principal component analysis is to find the vectors which best account for the distribution of face images within the entire image space. These vectors define the subspace of face images, which we call “face space”. Here, a description of how to perform PCA in the context of face detection is given.
Consider a data set X x1 , x 2, x 3,....., x M , of N 2 -dimensional vectors. This data set might for example be a set of M face images. The mean, , and the covariance matrix, , of the data are given by:
1 M
M
m
X m 1
1 M x m x m T M m 1
29
(2.11)
(2.12)
Chapter 2
Face Detection, Tracking and Recognition Concepts
Where is an N × N symmetric matrix. This matrix characterizes the scatter of the data set. A non-zero vector
uk
u
for which
k
k uk
(2.13)
is an eigenvector of the covariance matrix. It has the corresponding eigenvalue
u
k
. If
,
1
2
, ........,
k
are the K largest, distinct eigenvalues
then matrix U u 1 ,u 2 , u 3 ,......, u k
represents the K dominant
eigenvectors. These eigenvectors are mutually orthogonal and span a Kdimensional subspace called the principal subspace. Figure (2.8) is an example of eigenvectors. If U is the matrix of K dominant eigenvectors,
Fig (2.8) a) An example of dominant Eigenfaces. b) An average face
an N- dimensional input vector
x
can be linearly transformed into a K-dimensional
by:
30
Chapter 2
Face Detection, Tracking and Recognition Concepts
U
T
x
(2.14)
After applying the linear transformationU T , the set of transformed vectors
1, 2, 3,....., M has scatterU the determinant,
U U T
T
U . PCA chooses U so as to maximize
, of this scatter matrix. In other words, PCA
retains most of the variance. An original vector x can be approximately reconstructed from its transformed vector
~ x
K
k 1
as:
k uk
(2.15)
In fact, PCA enables the training data to be reconstructed in a way that minimizes the squared reconstruction error,
total
M 1 2 m 1
each reconstruction error,
total
x m
total
, over the data set, where,
~x m
2
(2.16)
xm ~ xm , indicates how well the image
patch is fitted to the face space. This distance from face space is used as a measure of "faceness". The much reconstruction error means that the image patch appears to be a non-face [Kim, 01]. Both calculating eigenfaces and face verification using eigenface procedures are represented in Figures (2.9) and (2.10) respectively.
31
Chapter 2
Face Detection, Tracking and Recognition Concepts
Fig (2.9) Calculating Eigenfaces.
Fig (2.10) Face verification procedure using
32
M
t
Eigenfaces.
Chapter 2
Face Detection, Tracking and Recognition Concepts
2.6.2 Face Recognition Applications
Every day, we are facing new products of technology, prompting us to enter our PIN code or passwords such as money transactions in the internet or to get cash from ATM, even to use our cell phone SIM card, a dozen of others to access internet and so on. Therefore, the need for reliable methods of biometric personal identification is obvious. In fact, there are such reliable methods like fingerprint analysis, retinal or iris scans, however these methods rely on the cooperation of the participant. Face recognition systems, on the other hand, can perform person identification without the cooperation or knowledge of participant which is advantageous in some applications such as surveillance, suspect tracking and investigation [Sez, 05]. Typical application of face recognition systems can be listed in four main categories: a. Entertainment: Video Game, Virtual Reality, Training Programs, Human-Robot-Interaction, Human-Computer-Interaction. b. Smart Cards: Driver’s Licenses, Entitlement Programs, Immigration, National ID, Passports, Voter Registration, Welfare Fraud. c. Information Security: TV Parent Control, Personal Device Logon, Desktop Logon, Application Security, Database Security, File Encryption, Intranet Security, Internet Access, Medical Records, Secure Trading Terminals. d. Law Enforcement and Surveillance: Advance Video Surveillance, CCTV Control, Portal Control, Post-Event Analysis, Shoplifting, Suspect Tracking and Investigation.
33
Chapter 3
Design and Implementation of the Proposed FDM
Chapter Three Design and Implementation of the Proposed Face Detection Method 3.1 Introduction This work is aimed to improve the outcomes of face detection method which developed by Viola – Jones in the sense of video – based applications. The whole recognition system consists of three main modules. The first module is stared by loading a video file which is captured by a webcam and then extracting into frames. These frames are subjected to the Viola – Jones face detector to find out the single faces in the video.
The second and the
most important module of the work is the face tracking phase. The new contribution is appeared throughout two steps. Firstly, the detected face in the current frame is used to initialize the tracker in the next frame. This process is continued for the rest frames in the video. This contribution leads to eliminate the entire false negative (miss detection) that was one of the drawbacks of the Haar-Like Feature detector in the sense of video. Secondly, to reduce the false positive alarms, a dynamic thresholding algorithm is applied to each detected face before initializing the tracker in the next frame. Finally the tracked faces are recognized using image-based face recognition method (PCA). The detection rate is highly increased when the hybrid detector and tracker method is applied. This work is tested on a special video scenario where one or more persons are moving in front of the webcam, for example, ATM in a bank and a building security control. In the next sections, the proposed hybrid FDM is described and detailed using flowchart diagram or pseudo code algorithm. The process steps
34
Chapter 3
Design and Implementation of the Proposed FDM
are also presented for each part (face detection, face tracking and face recognition). Figure (3.1) shows the general diagram of the proposed FDM. The used programming language for implemented system is visual C++ 2010 with OpenCV 2.2 library which is used as development tools.
Fig (3.1) General block diagram of the proposed FDM.
3.2 Face Detector Scheme The method used to detect a single frontal face in a video is taken from OpenCV 2.2 library (face detector) developed by Viola – Jones. This method is essentially developed for still image based face detection. In this research Viola – Jones scheme is implemented for video application. Face detector steps can be summarized as follows: Step 1: Load video Load video file which is compatible with OpenCV library and obtained by real time capturing. The color conversion algorithm is applied on each
35
Chapter 3
Design and Implementation of the Proposed FDM
frame after extracting them from the video. This conversion changes the color space from color to gray that is more sensitive to the human vision system. Step 2: Initialize detector The Haar cascade classifier is loaded and then the data stored in an XML file is used to decide how to classify each image location. Step 3: Running the detector The face detector examines each image location and classifies it as "Face" or "Not Face." Classification assumes a fixed scale for the face, say 50x50 pixels. Since faces in an image might be smaller or larger than this, the classifier runs over the image several times, to search for faces across a range of scales. Step 4: Detector results A specific function in OpenCV is used to display the detected faces on the window with different resolution. This is due to the distance of objects (face) from the webcam. Therefore the faces are displayed with different sizes like 24x24 default value of the classifier, 26x26, 31x31, 44x44 and so on. Figure (3.2) illustrates the face detection steps.
Fig (3.2) Face detection scheme
36
Chapter 3
Design and Implementation of the Proposed FDM
In Figure (3.3) the flowchart of the face detection is shown including the above mentioned steps.
Fig (3.3) Face detection process.
37
Chapter 3
Design and Implementation of the Proposed FDM
3.3 Face Tracking Scheme The face detection scheme used in this work does not give satisfied results in terms of detection rate. This drawback encourages us to develop a hybrid detection method that combines the Viola – Jones detector and template based tracking algorithm. The reasons behind using these algorithms are: 1. Obtaining the template of the face in the face detector part. 2. Reusing this template to initialize a tracker. 3. Template matching algorithm is very efficient and simple to implement. 4. Robust against scale invariant and lighting conditions. 5. Fast because of using Region of Interest to reduce searching process within the frame. The detected face that is obtained from the current frame is used to initialize the tracker. Due to the slow change of the face position between two successive frames, the template position of the face is firstly found from the current frame and a ROI is applied to the next frame in order to reduce the search area. The ROI is selected according to the properties of the face template (top left location, width, height). The size of the ROI is chosen to be twice than the size of the face template in condition that the ROI window does not exceed the frame size. The ROI size depends on the size of the template in the current frame (i.e., growth and shrink dynamically). Face tracking process is followed by the face detector so that the objects (face) in the next frames are simply founded to avoid any miss detection. Each tracked face template can be considered as a reference for initializing the tracker itself in the next frames in case the detector is failed to detect (false negative alarm).
38
Chapter 3
Design and Implementation of the Proposed FDM
Figure (3.4) depicts the face tracking process between two successive frames.
Fig (3.4) Face tracking process.
3.3.1 False Positive Reduction In image-based face detection a false positive is a number of detected objects that are not faces while in the sense of video is a number of detected faces that are representing the face of another person. The scenarios which are used to run the face detector and tracker aims to track a nearest person who is walking toward the webcam. This situation leads to have both types of false positive alarms. Therefore, the main objective of using tracking is to reduce the false positive alarms to get a high detection rate. In Figure (3.5) the occurrence of false positive alarms is shown.
39
Chapter 3
Design and Implementation of the Proposed FDM
Fig (3.5) False positive alarms.
3.3.1.1 Manual Threshold Algorithm
After running the face detector the possibility of false positive occurrences between the frames as shown in Figure (3.6), can be reduced before running the tracker as described in the following steps.
40
Chapter 3
Design and Implementation of the Proposed FDM
Fig (3.6) false positive alarm between two successive frames
Step 1: Set the standard average window size for face to be recognized equal to (120 x 90) pixels. Another assumption is to develop a new scheme depending on the width of the window. The best value was found by trial and errors were equal to 110. Step 2: Initialize two arrays to store the width of window's face as shown in Figure (3.7) and the average width of the window's face in each frame. Step 3: Calculate the average width. Step 4: Accumulate the average width of window's face.
41
Chapter 3
Design and Implementation of the Proposed FDM
Step 5: Store the data results (width and average width) in a text file.
Fig (3.7) Window's face width values after running the face detector.
Step 6: Graph a curve between frames numbers and the average width as illustrated in Figure (3.8). Step 7: Select the threshold from the curve, which is representing the best value, which is near from the standard average width of window's face (here, it is taken 110 pixels). Step 8: Each value greater than 110 is discarded and considered as a non recognized face. This selection is an important step to initialize the face tracker.
Fig (3.8) Average window's face width
42
Chapter 3
Design and Implementation of the Proposed FDM
The flowchart of the threshold algorithm before tracking is shown in Figure (3.9).
Fig (3.9) Manual threshold process.
43
Chapter 3
3.3.1.2
Design and Implementation of the Proposed FDM
Dynamic Threshold Algorithm
In the manual thresholding process some false positive alarms are not totally reduced as shown in Figure (3.10). To solve these types of alarms, a dynamic threshold algorithm is developed and implemented to apply with a face tracker.
Fig (3.10) Unsolved types of false positive alarms.
44
Chapter 3
Design and Implementation of the Proposed FDM
The algorithm steps are described below: Step 1: Get frame. Step 2: Run face detector. Step 3: Apply manual thresholding as mentioned in section 3.3.1.1. Step 4: Determine the center of the detected face. Step 5: Get next frame. Step 6: Run face tracker. Step 7: Determine the center of the tracked face. Step 8: Determine Euclidian distance between the centers of the detected and tracked faces. The graph is shown in Figure (3.11).
Fig (3.11) Euclidian distance measure.
Step 9: Accumulate five values of the obtaining distance and find the median of them. Step 10: Normalize the distance by dividing the distance over the face width. Step 11: Find a threshold. If the normalized distance is greater than ten times of the median value, the last tracked face is tracked by the tracker otherwise the last detected face is tracked. Step 12: Visualize the face.
45
Chapter 3
Design and Implementation of the Proposed FDM
The flowchart of the dynamic threshold algorithm that involves all the above steps is shown in Figure (3.12).
Fig (3.12) Dynamic threshold algorithm.
46
Chapter 3
Design and Implementation of the Proposed FDM
3.3.2 False Negative Reduction The second contribution is the reduction of the false negative alarms of the undetected faces between frames. The template matching based on Square Difference Matching metric is used to implement the face tracker. This strength of the proposed tracker is efficient and simple. Moreover, it is important to say that it is scale, pose and rotation independent, and theoretically it will keep tracking the face even when this one is not in a frontal position. Initially, a face detector based on the Viola-Jones object detection, is executed in order to detect the face of the first frame of the sequence to track. This will be repeated until at least two successful detected faces can be found. Every time the last detected face in a current frame was used to track a face in the next frame, this process leads to update the template position which can be used to initialize the tracker. The steps of the algorithm are explained as follows: Step 1: Memory allocation for creating a template with the same size as indicated in the face detection part. Step 2: Save the detected face in the template located on the next frame. Step 3: Memory allocation for creating a tracking result. Step 4: Optimal matching result can be found when the template is sliding over the next frame using SDM. Step 5: Normalize obtained results (values between 0 and 1) to separate the perfect match (0) from the non match (1). Step 6: Visualize the perfect match results on the next frame to get the location of the tracking by using minimal location of the best match. Figure (3.13) shows the flowchart of the template matching based tracking method.
47
Chapter 3
Design and Implementation of the Proposed FDM
Fig (3.13) Template matching based tracking.
48
Chapter 3
Design and Implementation of the Proposed FDM
3.4 Face Recognition Scheme The face recognition is coming after performing both face detection and tracking process. Some preprocessing operations (resizing window's face size and histogram equalization) are required to apply on the tracked faces before starting the recognition process. The framework of recognition process is presented in Figure (3.14).
Fig (3.14) The framework of face recognition process.
In this part a special OpenCV library developed by Robin Hewitt [Hew, 07] is used. The obtaining tracked faces are subjected to PCA algorithm in order to get Eigen faces.
49
Test Evaluation
Chapter 4
Chapter Four Test Evaluation 4.1 Introduction In this research work, the established FDM which was designed and implemented in previous chapter is tested using special video scenarios. The proposed FDM is the result of combining the face detector and tracker processes which give an improvement in detection rate and detection time. The performance test results of the proposed FDM were evaluated by tuning some of the involved parameters such as scale factor, neighbor threshold and window size. These parameters affect the system accuracy (detection, tracking and recognition). The efficiency of the proposed FDM was evaluated using detection rate and detection time measures. The programming language, Microsoft Visual Studio 2010 (VC++) with OpenCV 2.2 Library, was used as development tools to construct the required program. The programs were tested on Windows 7 Enterprise ((Intel(R) Core(TM)) CPU, 2.77GHZ processor and 4 GB RAM).
4.2 Video Scenarios Description The experiments were carried out on 5 persons. While one or more persons were moving behind at random, each person walked up to a camera and stood toward it. Two different scenarios (taken from the Department of Information Technology/ Faculty of Engineering and IT/ Karlsruhe University of Applied Sciences/ Germany) were used to evaluate the performance of the proposed FDM the tested videos are illustrated in Figures (4.1) and (4.2)
50
Test Evaluation
Chapter 4
respectively. In this work, the first video scenario was captured in a natural environment without any light condition by using Logitech webcam.
Fig (4.1) Video scenario 1
51
Test Evaluation
Chapter 4
Fig (4.2) Video scenario 2.
The other scenario was captured from an environment (high movement of the person) in which lighting condition affects the accuracy performance of Viola – Jones detector. In table (4.1) the properties of the video scenarios (1 and 2) are described. Table (4.1) Video scenarios description
Video No.
Video Description Length=00:01:07, Size=4.71 MB, Frame width=640,
Scenario 1 Frame height =480, Frame Rate=15 Frame/ Second, Type:
DivX Video, System color: RGB. Length=00:00:35, Size=3.87 MB, Frame width=640, Scenario 2
Frame height =360, Frame Rate=30 Frame/ Second.
52
Test Evaluation
Chapter 4
4.3 Test Results The implemented methods (Viola – Jones, Manual thresholding and Dynamic thresholding) were experimented on the video scenarios applied on 1000 frames-scenario 1and 900 frames-scenarios 2 in real time environment. At first, the involved parameters which play an important role in face detection process were tuned to get the optimal parameters. This optimization leads to increase the success of detection rate and fix the parameters for further tests. In the next sections the results of each method were presented by testing them on the two considered different video scenarios. 4.2.1 Effect of Involved Parameters (Scenario 1) There are several parameters (Window size, Neighbor Threshold, Scale factor) to tune that affect the (Viola-Jones) face detector as presented in the following subsections. Table (4.2) presents the parameters that affect the performance of the (Viola-Jones) face detector. Table (4.2) Effect of involved parameters.
Parameters Window size Neighbor threshold Scale factor
Viola-Jones face detector performance Detection rate Detection time No Yes Yes No Yes Yes
4.3.1.1 Window Size
It is the size of the smallest face to search. The optimal selection of window size is to minimize the number of face detections. In table (4.3) the
53
Test Evaluation
Chapter 4
results of detection rate and time by tuning three different window sizes are shown with the fixed parameters (scale factor and neighbor thresholds). Tables (4.4) and (4.5) present the same effect but with different fixed parameters. Table (4.3) Effect of window size on detection rate and time (Scale factor = 1.2 and Neighbor threshold = 1)
Window size 20×20 30×30
Detected as faces 983 969
TP
FP
FN
DR
DT/ms
652 683
331 286
17 31
65.2 68.3
237.569 210.652
40×40
957
669
288
43
66.9
190.820
Table (4.4) Effect of window size on detection rate and time (Scale factor = 2.2 and Neighbor threshold = 2)
Window size 20×20 30×30 40×40
Detected as faces 890 889 889
TP
FP
FN
DR
DT/ms
800 806 793
90 83 96
110 111 111
80.0 80.6 79.3
162.680 121.202 146.712
Table (4.5) Effect of window size on detection rate and time (Scale factor = 3.2 and Neighbor threshold = 3)
Window size 20×20 30×30
Detected as faces 531 551
TP
FP
FN
DR
DT/ms
466 505
66 46
469 449
46.6 50.5
263.383 114.119
40×40
551
505
46
449
50.5
123.702
4.3.1.2 Neighbor Threshold
It is the way to find the average rectangle from a raw detection (i.e., replace a group of rectangles into one rectangle). If this value is set to 0, it means all raw detections are appeared as illustrated in Figure (4.3).
54
Test Evaluation
Chapter 4
Fig (4.3) Neighbor threshold setting to zero
In table (4.6) the results of detection rate and time by tuning three different neighbor thresholds are shown with the fixed parameters (window size and scale factor) Tables (4.7) and (4.8) present the same effect but with different fixed parameters. Table (4.6) Effect of neighbor threshold on detection rate and time (Scale factor = 1.2 and Window size =20×20)
Neighbor
threshold 1 2 3
Detected TP FP FN as faces 983 652 331 17 969 882 147 31 964
913
51
55
36
DR
DT/ms
65.2 88.2
237.569 236.696
91.3
243.090
Test Evaluation
Chapter 4
Table (4.7) Effect of neighbor threshold on detection rate and time (Scale factor = 2.2 and Window size =30×30)
Neighbor
Detected TP FP FN as faces 926 848 83 83 889 806 83 111 843 763 80 157
threshold 1 2 3
DR
DT/ms
84.3 80.6 76.3
159.627 121.202 143.880
Table (4.8) Effect of neighbor threshold on detection rate and time (Scale factor = 3.2 and Window size =40×40)
Neighbor
Detected TP FP threshold as faces 788 693 95 1 659 594 65 2 505 46 551 3
FN
DR
DT/ms
212 341
69.3 59.4
132.577 126.316
449
50.5
123.702
4.3.1.3 Scale Factor
When this parameter is set higher, the detector runs faster because of increasing the scale of the face in each pass. In table (4.9) the results of detection rate and time by tuning three different neighbor thresholds are shown with the fixed parameters (window size and neighbor thresholds). Tables (4.10) and (4.11) present the same effect but with different fixed parameters. Table (4.9) Effect of scale factor on detection rate and time (Neighbor threshold = 1 and Window size =20×20)
Scale factor
1.2 2.2 3.2
Detected TP FP FN as faces 983 652 331 17 937 842 95 63
DR
DT/ms
65.2 84.2
237.569 165.117
691 113 196
80.4
198.424
804
56
Test Evaluation
Chapter 4
Table (4.10) Effect of scale factor on detection rate and time (Neighbor threshold = 2 and Window size =30×30)
Scale factor
1.2 2.2 3.2
Detected TP FP FN as faces 959 814 145 41 889 806 83 111 659
584
75
341
DR
DT/ms
81.4 80.6
216.164 121.202
58.4
125.870
Table (4.11) Effect of scale factor on detection rate and time (Neighbor threshold = 3 and Window size =40×40)
Scale factor
1.2 2.2
Detected as faces 946 843
3.2
551
TP
FP
FN
DR
DT/ms
898 48 54 743 100 157
89.8 74.3
227.373 134.554
505
50.5
123.207
46
449
4.3.1.4 Results Discussion
The obtained results of Viola – Jones detector with tuning some parameters which are presented in the above tables can be concluded as follows: 1.
The window size does not affect the detection rate while it increases the detection time when the window size becomes smaller.
2.
Detection rate is directly proportional to neighbor threshold in case of choosing minimum value of window size and scale factor, see table (4.6). However detection time is inversely proportional to NT.
3.
High success of detection rate is obtained when the minimum value of scale factor is used in condition that the neighbor threshold is high, see table (4.11). According to the test results, the Viola –Jones detector still suffers from
the problem of miss detection (false negative and positive alarms). To
57
Test Evaluation
Chapter 4
overcome this problem the optimal parameters are used from the above tables to reduce false alarms in the next sections. In Figures (4.4 - 4.6) the detection rate versus involved parameters of face detector are shown.
Fig (4.4) DR versus WS WS=20X20,SF=1.2
WS=30X30,SF=2.2
WS=40X40,SF=3.2
Detection Rate
95 90 85 80 75 70 65 60 55 50
1
2
Neighbor Threshold
Fig (4.5) DR versus NT
Fig (4.6) DR versus SF
58
3
Test Evaluation
Chapter 4
4.2.2 Effects of the Involved Parameters (Scenario 2) This scenario is totally different from scenario 1 according to the following conditions: 1. Light condition (i.e., no natural environment): increases the false positive type I. 2. The frame rate is fast (30 f/s): increases the false negative ratio. 3. Fast movement of the person and moving in different directions which lead to increase false negative alarms. 4. Large distance between the webcam and walked person which affects the aspect ratio of the faces window size. For that reason, the manual thresholding cannot be applied on the whole video frames. The same parameters as mentioned in section 4.3.1 are tuned to get the optimal parameters. After conducting the test on 900 frames, we found that the best parameters are still the same (i.e., WS=20x20, SF=1.2 and NT=3). 4.3.2.1 Window Size
In table (4.12) the results of detection rate and time by tuning three different window sizes are shown with the fixed parameters (scale factor and neighbor thresholds). Tables (4.13) and (4.14) present the same effect but with different fixed parameters. Table (4.12) Effect of window size on detection rate and time (Scale factor = 1.2 and Neighbor threshold = 1)
Window size 20×20
Detected as faces 814
TP
FP
FN
DR
DT/ms
663
151
86
73.66
347.284
30×30
803
644
159
97
71.53
300.661
40×40
563
723
140
337
47.00
275.951
59
Test Evaluation
Chapter 4
Table (4.13) Effect of window size on detection rate and time (Scale factor = 2.2 and Neighbor threshold = 2)
Window size 20×20 30×30 40×40
Detected as faces 216 216 216
TP
FP
FN
DR
DT/ms
207 207 207
9 9 9
684 684 684
23.00 23.00 23.00
281.897 109.933 160.659
Table (4.14) Effect of window size on detection rate and time (Scale factor = 3.2 and Neighbor threshold = 3)
Window size 20×20 30×30 40×40
Detected as faces 165 165 165
TP
FP
FN
DR
DT/ms
165 165 165
0 0 0
735 735 735
18.22 18.22 18.22
80.183 59.534 59.620
4.3.2.2 Neighbor Threshold
In table (4.15) the results of detection rate and time by tuning three different neighbor thresholds are shown with the fixed parameters (window size and scale factor). Tables (4.16) and (4.17) present the same effect but with different fixed parameters. Table (4.15) Effect of neighbor threshold on detection rate and time (Scale factor = 1.2 and Window size =20×20)
Neighbor
Detected Threshold as faces 814 1
TP
FP
FN
DR
DT/ms
663
151
86
73.66
347.284
2
802
748
54
98
83.11
581.982
3
773
754
19
127
85.88
558.817
60
Test Evaluation
Chapter 4
Table (4.16) Effect of neighbor threshold on detection rate and time (Scale factor = 2.2 and Window size =30×30)
Neighbor
Detected TP FP FN as faces 668 49 183 717 216 207 9 684 179 5 716 184
threshold 1 2 3
DR
DT/ms
74.22 23.11 19.88
164.422 281.897 160.813
Table (4.17) Effect of neighbor threshold on detection rate and time (Scale factor = 3.2 and Window size =40×40 )
Neighbor
Detected TP FP FN as faces 177 21 879 198 179 175 4 721 165 0 735 165
threshold 1 2 3
DR
DT/ms
19.66 19.44 18.22
79.200 78.373 80.183
4.3.2.3 Scale Factor
In table (4.18) the results of detection rate and time by tuning three different neighbor thresholds are shown with the fixed parameters (window size and neighbor thresholds). Tables (4.19) and (4.20) present the same effect but with different fixed parameters. Table (4.18) Effect of scale factor on detection rate and time (Neighbor threshold = 1 and Window size =20×20)
Scale factor
1.2 2.2 3.2
Detected TP FP FN as faces 841 663 151 86 303 281 22 597 198
179
19
61
702
DR
DT/ms
73.66 31.22
347.284 240.086
19.88
159.980
Test Evaluation
Chapter 4
Table (4.19) Effect of scale factor on detection rate and time (Neighbor threshold = 2 and Window size =30×30)
Scale factor
1.2 2.2 3.2
Detected TP FP FN as faces 712 658 54 188 216 207 9 684 179
175
4
721
DR
DT/ms
73.11 23.00
431.02 281.893
19.44
78.619
Table (4.20) Effect of scale factor on detection rate and time (Neighbor threshold = 3 and Window size =40×40)
Scale factor
1.2 2.2 3.2
Detected TP FP FN as faces 393 18 489 411 184 179 5 716 165
165
0
735
DR
DT/ms
43.66 19.88
236.933 155.619
18.22
80.183
4.3.2.4 Scenario 2 Results Evaluation
Based on the obtained results, the occurrences of the false negative alarms were still high because of the properties of the video as described in section (4.3.2). In this scenario the negative alarms normally occurred between the frames with a large interval of time (frame #20, frame #40 and so on). Then the face detector failed for this scenario and could not be approved for detection rate. The tracker needs to be initialized by the detector from the previous frames (the template position updating depends on the detected faces). The face tracker based on template matching method also failed to work with this scenario. Therefore the proposed FDM is only applied on scenario 1.
4.4
Improved Detection Rates (Scenario 1) The face tracker which is followed by Viola – Jones face detector
62
Test Evaluation
Chapter 4
mainly depends on the detected face in the previous frame in the video. That is, if the false positive alarms type I, see section (3.3.1.1), are directly fed to the tracker, the tracker would be subject to the error and would highly affect the performance of the proposed FDM. Therefore, the FP type I, occurring before tracking, is reduced by applying manual thresholding. The other type II of false positive alarms which occur after tracking (template matching) is minimized using dynamic thresholding algorithm (Euclidian distance). The template matching is also used to minimize the false negative alarms. 4.4.1 Reduce False Positive Type I Using Manual Thresholding The optimal parameters from the face detector are WS=20x20, SF= 1.2 and NT= 3. The results of manual thresholding algorithm tested on video scenario 1 with the optimal parameters are shown in table (4.21). Table (4.21) False positive type I reduction
Methods Viola –Jones Manual Thresholding
Detected as faces 964 914
TP
FP
FN
DR
913 898
51 6
36 86
91.3 89.8
4.4.2 Reduce False Negative Using Template Matching The undetected faces (FN) can be reduced by tracking the faces between current and previous frames. The tracker process uses the template Table (4.22) False negative reduction
Methods Viola -Jones Manual Thresholding Template Matching
Detected as faces 964 914 976
63
TP
FP
FN
DR
913
51
36
91.3
898 966
6 10
86 24
89.8 96.6
Test Evaluation
Chapter 4
matching method. The results after tracking are shown in table (4.22). 4.4.3 Reduce False Positive Type II Using Dynamic Thresholding As indicated in table (4.22), the false negative alarms still exist because the Viola – Jones face detector with the optimal parameters starts the detection from frame number =23. On the other hand, the false positive alarms type II is reduced from 10 to 4 by using dynamic thresholding algorithm as presented in table (4.23). This minimization is to increase the performance of the recognition of tracked faces. Table (4.23) False positive type II reduction
Methods Viola -Jones Manual Thresholding Template Matching Dynamic Thresholding
Detected as faces 964 914 976 976
TP
FP
FN
DR
913 898 966 972
51 6 10 4
36 86 24 24
91.3 89.8 96.6 97.2
4.4.4 Improvement of Detection Time The Viola – Jones face detector according to the optimal parameters that obtained in section (4.3.1.2) is very slow because of the value of window size (20x20) and scale factor (1.2). This problem is due to the trade off between improving detection rate and detection time. For scenario 1, the detection time is calculated for face detector and tracker with and without region of interest algorithms. 4.4.4.1 Face Detection Time The Viola-Jones detector is tested on 1000 frames to determine the detection time using ROI method. In table (4.24) the comparison between the detection time with and without ROI is shown.
64
Test Evaluation
Chapter 4
Table (4.24) Detection time with and without ROI
Detected As faces 964 964
Viola-Jones Without ROI With ROI
TP
FP
FN
DR
DT/ms
913 913
51 51
36 36
91.3 91.3
243.090 78.807
4.4.4.2 Face Tracking Time The tracker based template matching method is tested on 1000 frames to determine the tracking time using ROI method. In table (4.25) the comparison between the tracking time with and without ROI is shown. Table (4.25) Tracking time with and without ROI
Template Matching Without ROI With ROI
Detected as faces 976 976
TP
FP
FN
DR
DT/ms
966 966
10 10
24 24
96.6 96.6
222.023 72.410
From tables (4.24) and (4.25), the total time of the proposed FDM is: Total time = detection time + tracking time.
4.5 Face Recognition The face recognition process is carried out through two phases (training and testing). The training faces are obtained after applying Viola-Jones face detector .To get the Eigenfaces for each person, the PCA algorithm is applied and then they are stored in an XML file. In this work the training images are taken from 10 different persons with 15 faces for each one. The test phase is recognition of an unidentified person's face. FDM is used to detect the face. PCA is also applied to the test face to find the Eigen face of it. The matching algorithm which depends on the Euclidian distance measure is performed between Eigen values of the training and testing Eigen
65
Test Evaluation
Chapter 4
faces. Figures (4.7) and (4.8) present both training and testing phases for face recognition.
Fig (4.7) Training phase in face recognition system.
Fig (4.8) Test phase in face recognition system.
66
Chapter 5
Conclusions and Suggestions for Future work
Chapter Five Conclusions and Suggestions for Future Work 5.1 Conclusions In this research, a hybrid method (face detection and tracking) is developed for face recognition system. The face detector uses Viola – Jones algorithm while the face tracker is based on the dynamic template matching algorithm. The proposed FDM aims for effective reduction of false positive and negative alarms which occurre in video frames. It is tested on two video scenarios. Based on the test results presented in the previous chapter, some conclusions related to the performance of FDM have been investigated: 1-
The test results of Viola-Jones video based detection with best suitable parameters showed high number of false alarms.
2-
The Viola-Jones detector improved in terms of detection rate and time using proposed FDM. a. DR = 91.3% improved to DR= 97.2%. b. DT = 465.113 ms improved to DT= 151.217 ms (using ROI algorithm).
3-
Manual threshold algorithm reduced the false positive alarms type I from 56 objects (face and non face) to 6 objects.
4-
Face tracking based template matching algorithm reduced the false negative alarms from 86 (undetected face) to 24.
5-
Face tracking based dynamic template matching algorithm reduced the false positive alarms type II from 10 objects (face and non face) to 4 objects.
6- The proposed FDM was robust for complex background when two
67
Chapter 5
Conclusions and Suggestions for Future work
persons were moving towards the webcam. That is the frontal face was uniquely detected. 7- The main advantages of the proposed FDM were: a. Reduction of false positive and negative alarms. b. Reduction of time complexity by using ROI. c. Proposed FDM can be considered as a simple detector for frontal face recognition. 6-
The drawback of the proposed FDM is the difficulty of choosing the best scenario that consider illumination, fast object movement and pose invariant (Scenario 2).
5.2 Suggestions for Future Work The following suggestions can be taken into consideration for future research work: 1. Developing our proposed FDM to detect and track multiple faces instead of a single face. 2. FDM can be also improved in terms of recognition rate. 3. Using motion estimation techniques to overcome the problem of fast movement object. This lead the proposed FDM more robust against different video scenarios. 4. Using other different algorithms than template matching for face tracking (Kalaman filter, optical flow, etc.). 5. Combining eye detection with Viola – Jones face detector to reduce false positive alarms. In this case the tracker might be more robust to pose invariant. 6. For video scenario 2, a preprocessing step can be added through histogram equalization to solve the problem of the illumination. 7. The proposed FDM can be adapted for different applications by using other biometric techniques. 68
References [Ara, 08] S. R. Arachchige, "Face Recognition in Low Resolution Video Sequences using Super Resolution", Kate Gleason College of Engineering, Department of Computer Engineering, Master’s Thesis, August 2008. [Bak, 04] S. Baker, and I. Matthews, " Lucas-kanade 20 years on: A unifying framework", International Journal of Computer Vision, Vol. 56, pp. 221– 255, 2004. [Bis, 06] C. M. Bishop, “Pattern Recognition and Machine Learning”, ISBN-10: 0387310738, 1st Edition, Springer 2006. [Bro, 86] T. J. Broida, and R. Chellappa," Estimation of Object Motion Parameters from Noisy Images", IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 8, No.1, pp.90-99, 1986. [Cas, 09] D.Casas,"Real- Time Face Tracking Methods," Master's Thesis, University Autónoma de Barcelona, 2009. [Che, 10] D. Chen, J. Wang, and Y. Zhou, "Face Detection Method Research and Implementation Based on AdaBoost", Intelligent Information Processing
69
and Trusted Computing (IPTC), ISBN: 978-1-4244-8148-4, pp.643-646, 2010.
[Com, 03] D. Comaniciu, V. Ramesh, and P. Meer, "Kernel-based object tracking", IEEE Transaction on Pattern Analysis and Machine Intelligence 25, pp.564– 575, 2003.
[Cor, 07] P. Corcoran, M. C. Jonita, and J. Bacivarov,"
Next Generation Face
Tracking Technology Using AAM Techniques", Signals, Circuits and System, ISSCS (2007), ISBN: 1-4244-0969-1, pp.1-4, 2007.
[GU, 08] Q. GU, "Finding and Segmenting Human Faces", Uppsala University, Master Thesis, February 2008. [Hag, 96] G. D. Hager, and P. N. Belhumeur," Real-time tracking of image regions with changes in geometry and illumination", IEEE Conference on Computer Vision and Pattern Recognition, pp. 403, USA, 1996.
[Hew, 07] R. Hewitt, “Seeing with opencv, part 5: Implementing eigenface”. Servo Magazine, pp.50, May 2007.
70
[Hsu, 02] R.L. Hsu, M. Abdel-Mottaleb and A.K. Jain, "Face Detection in Color Images", IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 24, No. 5, pp. 696-706, May 2002. [Isra, 07] E. Ben-Israel," Tracking of Humans Using Masked Histograms and Mean Shift", Introductory Project in Computer Vision, March 2007. [Jen, 08] O. H. Jensen "Implementing the Viola-Jones Face Detection Algorithm", Master Thesis in Informatics and Mathematical Modeling, Technical University of Denmark, 2008. [Jin, 05] Z. Jin, Z. Lou, J. Yang, and Q. Sun," Face Detection Using Template Matching and Skin Color Information", International Conference on Intelligent Computing, PP.23-26, China, August 2005. [Jor, 06] A. Jorgensen, "AdaBoost and Histograms for Fast Face Detection", Master Thesis of Computer Science, Stockholm, Sweden 2006. [Kang, 04] J. Kang, I. Cohen, and G. Medioni, " Object Reacquisition using Invariant Appearance Model", Proceedings of 17th International Conference on Pattern Recognition, Vol.4, pp. 759-762, USA, 2004.
71
[Kim, 01] J. Kim, "Face Localization for Face Recognition in Video", Department of Electrical and Electronic Engineering, Yonsei University, Master's Thesis, 2001.
[Lee, 07] H. Lee, and D. Kim, "Robust face tracking by integration of two separate trackers: Skin Color and facial shape", Pattern Recognition, Vol.40, No. 11, pp: 3225 – 3235, 2007.
[Li, 08] Y. Li, H. Ai, T.Yamashita, S. Lao, and M. Kawade, ," Tracking in Low Frame Rate Video: A Cascade Particle Filter with Discriminative Observers of Different Life Spans", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.30, No.10, 2008.
[Lie, 02] R. Lienhart, and J. Maydt, “An extended Set of Haar-like Features for Rapid Object Detection”, In Proceedings of International Conference on Image Processing , vol. 1, No. 9, pp. 900-903, 2002.
[Mar, 09] J. G. Martil, "Face Recognition in Controlled Environments using Multiple Images", Master's Thesis, March 2009.
72
[Mog, 97] B. Moghaddam and A. Pentland, "Probabilistic Visual Learning for Object Representation", IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 696-710, July 1997.
[Nal, 08] K. Nallaperumal, R. Subban, R. K. Selvakumar, A. L. Fred, C. N. KennadyBabu, S.S. Vinsley, and C. Seldev, "Human Face Detection in Color Images using Mixed Gaussian Color Models", International Journal of Imaging Science and Engineering (IJISE),GA,USA,ISSN:19349955,Vol.2, No.1, January 2008.
[Par, 09] U. Park, " Face Recognition: face in video, age invariance, and facial marks", A Dissertation of Doctor of Philosophy in Computer Science, 2009.
[Ram, 11] K. Ramirez, D. Cruz, and H. Perez, "Face Recognition and Verification using Histogram Equalization", ISSN: 1792-4863, ISBN: 978-960-474231-8, 2011.
[Row, 98] H.A. Rowley, S. Baluja, and T. Kanade, "Neural Network-Based Face Detection", IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol.20, No. 1, pp. 23-38, 1998.
73
[Ryu, 08] H. Ryu, M. Kim, V. Dinh, S. Chun, and S. Sull," Robust Face Tracking Based on Region Correspondence and its Application for Person Based Indexing System " ,International Journal of Innovative Computing, Information and Control ICIC ,ISSN 1349-4198,Vol.4, No. 11, November 2008.
[Sar, 10] J. Sarvanko, "Face Sequence Detection for a Web-Based Annotation Application" University of Oulu, Department of Electrical and Information Engineering, Master’s Thesis, 62 p, 2010.
[Sil, 05] S. Silva, "Remote Surveillance and Face Tracking with Mobile Phones (Smart Eyes)", University of the Western Cape, Department of Computer Science. Master’s Thesis, 106 p, May 2005,
[Sez, 05] O. G. Sezer, "Supper resolution Techniques for Face Recognition from Video", Sabanci University, Master's Thesis, spring 2005.
[Tha, 09] N. D.Thanh, W. Li, and P. Ogunbona," A Novel Template Matching Method for Human Detection," In Proceedings of 16th IEEE International Conference on Image Processing (ICIP), ISSN: 1522-4880, pp.2549-2552, Cairo, 2009.
74
[Vio, 01] P. Viola, and M. Jones, "Rapid Object Detection Using a Boosted Cascade of Simple Features", In Proceedings of International Conference on Computer Vision and Pattern Recognition, pp. 511-518, 2001.
[Vio, 04] P. Viola, and M. Jones, "Robust Real-Time Face Detection", International Journal of Computer Vision 57(2), 2004.
[Wan, 09] H. Wang, Y. Wang, And Y. Cao, "Video-based Face Recognition: A Survey ", World Academy of Science, Engineering and Technology, Vol. 60, pp. 239, 2009.
[Wu, 04] H. Wu, and J. S. Zelek,"The Extension of Statistical Face Detection to Face Tracking", In Proceedings of the 1st IEEE Canadian Conference on Computer and Robot Vision, DOI:10.1109/CCCRV.1301415 , pp. 10-17, Washington, DC, USA, 2004.
[Wu, 08] Y.W. Wu, and X.Y. Ai, "An improvement of face detection using AdaBoost with color information", ISECS International Colloquium on Computing,
Communication,
Control,
10.1109/CCCM.366, pp.317-321, 2008.
75
and
Management,
[Yang, 02] M. H. Yang, D. J. Kriegman, and N. Ahuja, “Detecting Faces in Images: A Survey”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 1, January 2002.
[Yil, 04] A.Yilmaz, X. Li, and M. Shah," Contour-based object tracking with occlusion handling in video acquired using mobile cameras", IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 26, No.11, pp.1531-1536, 2004.
[Yil, 06] A. Yilmaz, O. Javed, M. Shah, " Object tracking: A survey", ACM Computing Surveys, Vol.38, No. 4, Article 13, 2006.
[Zha, 10] Q. Zhao, and H. Cai, "The Research and Implementation of Face Detection and Recognition Based on Video Sequences", In Proceedings of 2nd IEEE International Conference on Future Computer and Communication, DOI: 10.1109/ICFCC 5497778, pp.318-321, Wuhan, 2010.
[Web, 01] Open Source Computer Vision Library, http://www.intel.com/technology/computing/opencv/.
76
ُ الخـــالصـــة ! "# $ ) * +, - ./ % ( ! ! & %! &' 3 4$ ' 4 $ " 0 " $ 1 2 . 7 1 5 6/ $ ( Viola-Jones )9 %! &' % 6 < 2 %! , % 6 ."$ 1 " 6 " 6; $ - : - 6 3 >6 -1 6 I 5 " 9 " = = ( 9 6) $ &' % ? % 1, > ! . < 2 - ) 1? . ! -# > <2 -1 6 - B .II 5 " 9 " = = % 3 >6 C < 2 %! 2 1 &' ; - &' % 6 . " "2 > 6? $ <2 36? ! %! &' C *
E ) . D < 2 - -0 % (frame) ' &' F; - " . > ." (frames) 4 ! &' - , ! > D (FDM) %! &' > ! ? 1/ C# > &' F? %? ; $ % $ 1 % . 6 ! &' + % (Viola-Jones)&' % 0' 9? F ? . " " 2 1 2 & ' " 0 .& ' ." 0 1.217 &' 26 (G97.2) ! <2 / ? D (FDM) %! &' > H6 0 ' .IF9= # " 6 71 "' "$ 1 $ . .I 9 -1 6 / ?
التعرف على الوجه من خالل الفيديو باستخدام مطابقة القالب الديناميكي – – !
" $% & # ' 2008
)(: #, . -
2012./ .0
1433 1%
ón‚íq@ @H@óåŽîìI@Žíî‡ïÄ@õbàóåi@Šó"óÜ@@ÄŽì‹à@@õòíŽï’@çbî@H@õìbš@ì@ãò†I@@Žßó óÜ@熋ØóÜóàbà@ôäbØóÙïåØóm @çbØòìóåîím@óÜ@ڎìŒ@ô−Šó"@õó(Žïu@ómòíiL@òìóåï"bä@ì@çìíša†aì†ói@ì@熋Øa‹Ù’b÷@@ôäbØóÙïåØóm@•óäaìóÜ @õ†bïäíi@ô䆋ÙîŠbî†@L@@ì@@õŽíî‡ïÄ@ô䆋Ùî‹Žî†ìbš@@íØòì@@ŠŽíuaŠŽíu@õòíŽï’ói@ôån/‚ŠbØói@ì@ôîóØbèói@ŠóióÜ@•óàó÷ .òìó䆋Ùn‚íq@ìóäaŠó LÛŽí(äóèŠóÐLßóÙîóè @a†óØŽíî‡ïÄ@óÜ@(Viola-Jones)@çóîýóÜ@òìaŠ‡Žïq@õó’ó @óØ@ìbš@ì@ãò†@@òìóåì†@õò‰ŽîŠ@ô䆋ْbi@Žíi @ç†<ŽïqìòŠói@çbî@熋؊aíà@óè@ãóØóîN@òìa‹ÙŽïuójŽïu@ì@Hòìa‹äa†I@òìa‹Ø@ça6î†@Œaìbïu@ôîòŠbàˆ@ì@þàŠŽíÐ@õ‰ïma9"L @@õò‹Žïàˆ@@ôn"b÷@õŠòíŽïq@ôäbåŽïèŠbØói@ói@@I@@õŠŽíuóÜóè@ôÅïmòŒŽíq@õ@Ša6åï÷@õòìó䆋ÙàóØ@õóäbïàóÜ@òìaŠ†@ãb−ó÷ @õH@óÜóè@ôÅïmó(ŽïäI@ìí›n"ò†óÜ@ô䆋ÙÑ’óØ@ì@熋ÙïåïjŽïm@õòìó䆋ÙàóØ@ôn"óióàói@òìò‹m@ôØóîý@óÜN@ôn"ò† @óÜN@oŽî‹åŽïèò†ŠbØói@çbØójÜbÔ@ô䆋ØòíŽï’ìbè@õbàóåi@Šó"óÜ@ìbš@ì@ãò†@ôåmìóØ@æŽîí’@N@çbØòìbš@ì@@ãò†@ì@òíŽï’ @o"슆bä@ôÅïmòŒŽíq@õ@II@õŠŽíu@õŠa6åï÷@õòìó䆋ÙàóØŽíi@oŽî‹‚ò†ŠbØói@ìì‡åîŒ@@õò‹Žïàˆ@ôn"b÷@õŠòíŽïq@a†Šbuaì† NHóÜóèI @ôäìíšaì†ói@ì@ýóÙŽïm@@@õòìóåì†@Žíi@oŽî‹åŽïèò‡n"ò†ói@òìóåì†@@ômbØ@@ãó÷@@ô䆋Ø9’bi@ì@ôä‡äb−í N@a†ŠbØ@õóšìbä@óÜ@熋َïuójŽïu@ì@熋٪ò†@@õóäbïàóÜ@çbØóÌbäŽíÔ @@ô䆋َî†ìbš@L@a†‹m@ôÙŽïn’@‡äóš@Žßó óÜ@pbØò†Šbïå“Žïq@@•óiìbè@õìbš@ì@ãò†@@õòìóåì†@@ôÙŽïán/ï"@@a†òŠbØ@ãóÜ @çb’bqL@ oŽî‹Øò‡Ñ’óØ@ òìóØóîý@ ìíàóèóÜL@ òìbmòŠó"óÜNìì‡åîŒ@ ôjÜbÔ@ @ ô䆋ØòíŽï’ìbè@ õbàóåi@ Šó"óÜ N@òìò‹m@ôäbØóäóîý@óÜ@óØòìa‹ÙÑ’óØ@òìbš@@ì@ãò†@ŠóÜ@oŽî‹‚ò†ŠbØói@óØòŠóÙî‹Žî†ìbš @õíŽîŠbåï"@‹"óÜ@HFDMI@ìbš@ì@ãò†@ììHŠ@ô䆋ÙÑ’óØ@ôjÜbÔ@Šó"óÜ@@çaŠ‡àb−ó÷@óØ@ôäìíàŒó÷@ôäbØóàb−ó÷Šò† õŠò‡äb“ïä@ @ @ óÜ@ ò9’bi@ ŠŽìŒ@ @ ŠóÙî‹Žî†ìbš@ ì@ •óiìbè@ õììHŠ@ õŠóÙî‹Žî†ìbš@ @ óØ@ çóØò†@ òìói@ òˆbàb÷@ Žíî‡ïÄ@ ìì† @ôîò‰ŽîHŠ@ óØ@ çbØòŠòíŽïq@ æî9’bi@ N@ ça‡äb“ïä@ ì@ †‹ÙÑ’óØ@ ômbØ@ ì@ @ ò‰ŽîHŠ@ ôîbå’ŽìHŠ@ ŠóióÜ@ @ (Viola-Jones) NóØ‹š@151.217熋ÙÑ’óØ@ì@ça‡äb“ïä@@ômbØ@ôîaH‹ÙŽïm@òì@HE97.2I@@•óiìbè@õ†Žínïà@õìa‹ØŠbïå“Žïq
@çbØòŒaìbïu@ óïîŽíî‡ïÄ@ óäóº†@ Žíi@ @ oŽî‹‚bäŠbØói@ óØ@ óîòìó÷@ @ ìa‹ØŠbïå“Žïq@ @ HFDMI@ ôäbØ@ óïmŠíØ@ ã@ óØ@
@óåŽîí’@ óÜ@ oŽî‹åŽïèbäŠbØói@ bèòìŠóè@ òìLNÛbäììHŠ@ @ õó(åîˆ@ ì@ @ ÚŽï"óØ@ ôîa‹Žï‚@ ôäýìíu@ a‡“ïäbïäaíŽïäóÜ N@ìbš@ì@ãò†@çbî@ììHŠ@ôäbØòŠŽíuìaŠŽíu
ŽíŽî‡ïÄ@õóŽîŠóÜ@@ììŠ@ìbš@ì@ãò†@õòìóåïbä @ @ìì‡åîŒ@ôjÜbÔ@ô䆋ØòíŽï’ìbè@@ôäbåŽïèŠbØói @@
óØóîóàbä@ @õüÙäaŒ@–@o)äaŒ@ôÜíÙ@–@çbØón)äaŒ@ò†ŠòìŠóqì@o)äaŒ@ônÝØbÐ@ói@@òìa‹Ø@•óÙ“Žïq @a†ŠómíïràüØ@ôn)äaŒóÜŠónbà@õóÝq@ôäbåŽïénò†óiüi@ŠóØìaìóm@ôÙŽï’ói@Ûòì@ôäbáŽïÝ@ @
@çóîýóÜ@
|Üb–@‡¼a@@õ‹)î 2008L@ŠómíràŽíØ@ôn)äaŒ@óÜ@ýbi@ôàŽíÝi† @ôn’ŠóqŠó@ói@
‡áz:::à@ôÝ:È@õŠb::÷N†@ bnŽíàbà@ 2711@ça‡äójŽîŠ@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@2012@@ãòìì†@@ôäíäbØ