An Improved Likelihood Model for Eye Tracking

Viewer
Transcript

An Improved Likelihood Model for Eye Tracking

Dan Witzner Hansen a and Riad I. Hammoud b a

IT University Copenhagen, Rued Langaardsvej 7, 2300 Copenhagen S, Denmark

b

Delphi Electronics and Safety, One Corporate Center, P.O. Box 9005, Kokomo, IN 46904-9005, USA

Abstract While existing eye detection and tracking algorithms can work reasonably well in a controlled environment, they tend to perform poorly under real world imaging conditions where the lighting produces shadows and the person’s eyes can be occluded by e.g. glasses or makeup. As a result, pixel clusters associated with the eyes tend to be grouped together with background-features. This problem occurs both for eye detection and eye tracking. Problems that especially plague eye tracking include head movement, eye blinking and light changes, all of which can cause the eyes to suddenly disappear. The usual approach in such cases is to abandon the tracking routine and re-initialize eye detection. Of course this may be a difficult process due to missed data problem. Accordingly, what is needed is an efficient method of reliably tracking a person’s eyes between successively produced video image frames, even in situations where the person’s head turns, the eyes momentarily close and/or the lighting conditions are variable. The present paper is directed to an efficient and reliable method of tracking a human eye between successively produced infrared interlaced image frames where the lighting conditions are challenging. It proposes a log likelihoodratio function of foreground and background models in a particle filter-based eye tracking framework. It fuses key information from even, odd infrared fields (dark and bright-pupil) and their corresponding subtractive image into one single observation model. Experimental validations shown good performance of the proposed eye tracker in challenging conditions that include moderate head motion and significant local and global lighting changes. The paper presents also an eye detector that relies on physiological infrared eye responses and a modified version of a cascaded classifier. Key words: Eye position tracking, Filtering, Observation models, Structured infrared light, Non-visible spectrum, Eye tracking systems and countermeasures, HCI.

Email addresses: [email protected] (Dan Witzner Hansen), [email protected] (Riad I. Hammoud).

Preprint submitted to Elsevier Science

6th December 2005

1 Introduction

Vision systems frequently entail detecting and tracking a person’s eyes in a stream of images generated by a video camera. In motor vehicles, for example, a camera can be used to generate an image of the driver’s face, and portions of the image corresponding to the driver’s eyes can be analyzed to assess drive gaze or drowsiness [12,19, 36, 46,47, 49]. In early stages, eye tracking has been employed by psychologists to conduct human factor experiments [9,44,58]. More recently a remarkable attention has been given to this technology where two broad categories are noticed: diagnostic and interactive [10]. Diagnostic eye trackers provide an objective and quantitative method to record the viewer’s point of regard. Such data are useful in human attention analysis such as examining what and when people look at commercials, instruments in plane cockpits and user interface [1,15,22,45,56]. By contrast, interactive user interfaces based on eye tracking react to the user’s gaze either by being used for control [2, 3, 23, 35, 55] or being gaze-contingent [10, 28, 29, 50]. Thus the system tends to adapt its behavior according to gaze input which reflects person’s desire. This property makes eye tracking systems a unique and effective tool for disabled people where eye movements could be their only means for communication and interaction with the computer [32]. We will in this paper only address vision-based eye trackers and will use the terms vision-based eye tracker and eye tracker interchangeably. A vision-based eye tracking system includes methods for eye -detection, eye-position tracking, and eye-analysis [17, 19]. In general, the eye detection method identifies the regions of a video image that correspond to the person’s eyes, the eye tracking method tracks the eye location from one video image to the next, and the eye analysis method characterizes the state of the person’s eyes (degree of eye openness and gaze vector, for example). Video-based eye detection and tracking methods either employ passive or active light. While passive methods rely on ambient light, active methods utilize a specific infrared light scheme. Based on the fact that good light conditions lead to greater success and less effort on algorithm research and development, most existing eye trackers explore the benefits of active near infrared light. These systems are particularly efficient in indoor and dim environment where ambient light is suppressed. The use of infrared light is very convenient since it is not visible to human eye and therefore it doesn’t distract the user. Adjustment of infrared illumination levels, appropriate sensors and light filters can be used to dominate bright ambient light. Available and affordable commercial image sensors are sensitive to infrared wavelength range of 780 − 880 nm. This range is still below the threshold of human sight safety. When IR light falls on the cornea of the eye, part of it is reflected back in a narrow ray pointing directly towards the light source. Several reflections (glints) occur on the boundaries of the lens and the cornea, the so-called Purkinje images [10]. If a user looks directly at the light source, the distance between the glint and the center of the pupil is small, when the users looks away the distance is increased. Several light sources and thus sev2

eral reflections may limit the need for explicit head pose estimation [59]. If a light source is located close to the optical axis of the camera (on-axis light), the captured image shows a bright pupil. This effect is similar to the red-eye effect when using flashlight in photography. When a light source is located away from the optical axis of the camera (off-axis), the image shows a dark pupil. An example of dark and bright pupil images are shown in figure 4. Several objects in the background may generate patterns similar to the dark and bright pupil images. The bright and dark pupil effects rarely occur simultaneously for other objects than eyes. Detection and tracking of eyes based on active remote IR illumination often utilize the difference of dark and bright pupil images to identify potential eye candidates and define observations. Addressed problems. While existing eye detection and tracking algorithms can work reasonably well in a controlled environment, they tend to perform poorly under real world imaging conditions where the light produces shadows and the person’s eyes can be occluded by glasses, or makeup. As a result, pixel clusters associated with the eyes tend to be grouped together with background features and discarded when subjected to appearance-based or energy minimization-based testing. This problem occurs both in eye detection methods that initially locate the eyes, and in eye tracking methods that track the eye from one image frame to the next. Problems that especially plague eye tracking include head movement, eye blinking and light changes, all of which can cause previously detected eyes to suddenly disappear. In case of large light changes the iris becomes overly light or dark, the pupil disappears. Furthermore the contrast between the sclera and surrounding skin-regions becomes especially low in the infrared spectrum. The usual approach in such cases is to abandon the tracking method and re-initialize the eye detection method, which of course places a heavy processing burden on the system and slows the system response when eye-tracking data is used as input to computer fatigue and distraction countermeasures. Accordingly, what is needed is an efficient method of reliably tracking a person’s eyes between successive video image frames, even in situations where the person’s head turns, the eyes momentarily close and/or the lighting conditions varies. One major limitation of existing IR eye position trackers is their use of predefined thresholds. Defining thresholds can be difficult to set generically as the light conditions and head poses may influence the image observations of the eye. Paper contribution. The present paper is directed to an efficient and reliable method of tracking a human eye between successively produced infrared interlaced image frames where the lighting conditions are challenging. It proposes a log likelihood-ratio function of foreground and background models in a particle filter-based eye tracking framework. It fuses key information from even, odd infrared fields (dark and bright-pupil) and their corresponding subtractive image into one single observation model. Experimental validations shown good performance of the proposed eye tracker in extreme conditions that include fast head motion and 3

significant local and global lighting changes. The paper also presents an eye detector that relies on physiological infrared eye responses and a modified version of a cascaded classifier. Paper organization. The paper is organized as follows: Section 2 provides a brief overview of the system and the proposed detection and tracking methods. Section 3 reviews current eye tracking methods. In section 4 a detection method based on a cascaded classifier in dark and bright pupil images is presented and section 5 describes the tracking method proposed in this paper in detail. The new eye region likelihood model for dark and bright pupil images is described in section 6. Experimental results are presented and discussed in section 7. Finally, section 8 concludes this paper.

2 System setup and approach overview

The implemented system utilizes a near infrared illumination scheme at wavelength of 780 nm that generates interlaced dark/bright pupil images. A sketch of the infrared camera setup is depicted in figure 1. Dark Pupil LEDs

Bright Pupil LEDs Camera

Figure 1. Illustration of the LEDs setup as implemented in [24] to generate dark and bright pupil images. The first LED (bright pupil LEDs) is located at an angle less than 2.5 degrees of the imaging axis, while the second light source is arranged at an angle greater than 4.5 degrees from the imaging axis for illuminating the eye. The first light produces bright pupil, while the second light produces dark pupil. When the light source is placed off-axis (top), the camera does not capture the light returning from the eye.

The proposed system here consists of an auto-initialization module (i.e. eye detec4

tion ) and eye position tracking. In this paper we will describe these two components in detail. Both proposed detection and tracking methods process bright/dark pupil images. These infrared pupil responses substantially help in identifying potential eye candidates and observations. We employ a boosted Haar wavelets-based classifier [52] to reduce the number of false positives, but to meet the properties of the eye’s appearance a set of additional Haar features [37] is proposed in this paper. We propose a novel log likelihood-ratio function of the foreground and background models in a particle filter-based eye tracking framework. Particle filters are attractive in the sense of being capable to represent multi-modal distributions and has proven to be robust in clutter. However, the absence of bright pupils and the relatively large difference in appearance of the bright pupil images between subjects [42] pose problems to the existing eye tracking methods using IR. These problems are caused by the user distance to the camera, changing light conditions, small out-of-plane face rotations, ethnic background as well as open and un-occluded eyes. One reason for this lies in the frequent use of thresholds: the large difference in the dark and bright pupil images makes it tempting to use threshold values and connected components in the difference image. Appropriate threshold values may be difficult or impossible to set and may additionally throw away useful information. Relying solely on the observations from the difference image is not necessarily sufficient as the bright pupil effects may disappear. To improve robustness when the bright pupil image is not reliable, we include the intensity distributions of the eye regions in conjunction with the information from the difference image in one unified model. Our tracking method is capable of tracking the eye under changing light conditions and when the bright pupil effect disappears under moderate head movements.

Figure 2. System overview. The eye is initially found through a cascaded Haar classifier. The eyes are then tracked through particle filtering. The weighted mean of the particle set is used for initializing a local mean shift mode optimization.

5

3 Current Eye Tracking Methods

Methods based on active light is the most predominate in both research [11, 18, 41, 59, 61] and in commercial systems [36, 47, 49]. Eye detection and tracking based on the active remote IR illumination is simple and effective. The eye can be tracked efficiently by tracking the bright areas in the difference image resulting from subtracting the dark pupil image from the bright pupil image. Eye tracking and detection rely on an eye model. Eye models fall broadly within three categories, namely deformable templates, appearance-based and feature-based methods. Deformable template [21,30,60] and appearance-based [34,39,54] methods rely on building models directly on the appearance of the eye region while the feature-based methods [13, 26, 43, 57] rely on extraction of local features of the region. The latter methods are largely bottom up while template and appearancebased are generally top-down approaches. That is, feature-based methods rely on fitting the image features to the model while appearance and deformable templatebased methods strive to fit the model to the image. In general appearance models detect and track eyes based on the photometry of the eye region using fixed templates [16, 39], probabilistic principal components [25] or more advanced learning-based techniques such as support vectors and Gaussian mixtures [18, 53, 61]. Featurebased methods extract particular features such as skin-color, color distribution of the eye region. These features could be the region between the eyes [33], filter banks [14, 26] and facial symmetries [38]. Eye tracking methods committed to using explicit feature detection (such as edges) rely on thresholds. Defining thresholds can generally be difficult since light conditions and image focus change. Therefore methods relying on explicit feature detection may be vulnerable to these changes. Deformable template-based method [7, 21, 60] rely on a generic template which is matched to the image for example through energy minimization. In particular deformable templates [60], construct an eye model in which the eye is located through energy minimization. In the experiments it is found that the initial position of the template is critical. While the shape and boundaries of the eye are important to model so is the texture within the regions. Active Appearance Models [6] incorporate both shape and texture in one model and is also used for eye modeling [21]. Temporal filtering and geometric constraints are often incorporated to remove spurious candidates and to group the positive candidates. Most of these methods require distinct bright/dark pupil images to work well and thus strongly depend on the brightness and size of the pupils. Consequently, these methods are affected by factors such as occlusion from eye closure and head orientation, interferences from external illumination, and the distances of the subjects to the camera. Ebisawa et al. [11] use a novel synchronization scheme in which the difference between images obtained from on axis and off axis light emitters are used for tracking. Kalman filtering and Mean shift tracking are more recently applied in similar approaches [61]. Efforts are made to focus on improving eye tracking under various light conditions. 6

Sun light and glasses can seriously disturb the reflective properties of IR light. Methods using IR can therefore be less reliable in these situations and methods exist that address these issues [61].

4 Eye Detection Through AdaBoost

The first step toward computing the gaze vector is to localize the position of the eye in 2D input video frames. The localization should be automatic, avoid user interference and face calibration, exhibit real-time performance, and yet be accurate. In case of false detection of the eye or miss-auto-initialization of the system, the eye tracker would follow the false pattern, leading to non useful gaze results and false alarms. With that in mind, several methods have been proposed for robust and accurate eye position detection. They differ mainly in defining the search space and the modeling procedure, as detailed below. The search space could be for instance the entire input frame while each pixel is considered as potential eye candidate, or the search space could be limited to the upper part of the face provided the face is detected ahead. The eye model shall be sufficiently generic to handle a significant variability in eye appearance among subjects (Asian, European, African, male, female, with/without glasses, eye makeup, half opened eye, etc.).

4.1 General steps of eye detection

The common steps among all automatic eye detection methods could be briefly summarized as follows: 1) Extraction of potential eye candidates. A compact search space reduces the chances of false eye detection. Simple heuristics and detection methods may be employed to reduce the search space for the possible locations of the target eye. Often, these methods exploit the reflective and dynamical properties of the eye such as the dark and bright pupil difference. Other methods construct the initial candidate set based on anthropomorphic constraints. Several robust methods obtain a set of potential eye candidates obtained through the infrared responses of the pupil (often the difference image [25, 31]). Similarly, regions of eye motion can be identified by subtracting two successive frames of the subject [33]. If the head is kept still, the difference could either be eye blinking, iris shifts or noise. Another strategy commonly adopted in eye detection consists of detecting the face first, then each pixel in the upper block of the face is considered a potential eye candidate [40, 48]. Low resolution of the eye regions is one of the limitations of using full facial models in single camera systems for gaze-based interaction. 7

2) Determination of initial set of eyes. Given an initial set of potential candidates the false candidates have to be filtered away and a more committed model is therefore needed. Each eye candidate is to be verified according to an eye model [4, 25, 31, 53]. The steps of using the dark/bright difference image and a more advanced eye detector (boosted classifier [52]) is shown in figure 4. 3) Selection of final set of eyes. The detection methods in the previous step may eliminate a majority of false eye candidates. Depending on the complexity of the scene, it may happen that the initial set of eyes contains more than two good candidates. One method to make the final filtering is through temporal coherence: by tracking all hypothetical eye regions simultaneously and then eliminate the inconsistent candidates over time [25]. Other approaches make use of key information like anthropomorphic constraints and eye blinks [16].

4.2 Improved eye detection algorithm

We follow the common strategy for the initial eye region detection, just as described above. We use a modified version of the boosted classifier proposed by Viola and Jones [51] for step two of the detection strategy (see section 4.1). Here the feature set is extended (compared to [51]) with center surround like features as to direct the classification towards the problem of eye detection (i.e. similar to the image of the pupil in on and off IR illumination).

Figure 3. The set of Haar features originally proposed and the center surround feature added for eye detection.

Figure 3 shows the base set of simple feature classifiers used in the method. The regions in the image which jointly exhibit a large differences in the dark/bright difference image as well as being accepted by the boosted classifier are assumed 8

to be feasible candidates for the initial eye regions. We employ a simple temporal coherence filter to ensure additional stability over time. That is, if a hypothetical region of a given scale and position has been detected sufficiently often within a given time frame the eye is assumed to be located. Figure 4 illustrates an example of eye detection using our proposed detection algorithm.

Figure 4. Illustration of the eye detection results (e) using our improved detection algorithm. The input dark and bright images and their corresponding subtractive image are illustrated in figures (a), (b) and (c) respectively. These images were captured by an wide angle infrared camera. Figure (d) shows few false potential eye candidates and (e) the final set of detected eyes.

5 Tracking Method

The proposed method is based on recursive estimation of the state variables of the eye region through the well known particle filter. Given a sequence of T frames, at time t only data from the previous t − 1 images are available. The states and measurements are represented by xt and yt respectively, and the previous states and measurements are represented xt = (x1 , . . . , xt ) and yt = (y1 , . . . , yt ). At time t the observation yt is assumed independent of the previous state xt−1 and previous observation yt−1 given the current state xt . Particle filtering is used for the estimation, as it allows for maintaining multiple hypotheses which in turn make it robust in clutter and capable of recovering from occlusion. The aim of particle filtering is to approximate the filtering distribution (n) (n) (n) N is the P (xt | yt ) by a weighted sample set SN t = {(xt , πt )}n=1 , where xt (n) th n instance of a state at time t with weight πt . This sample set evolves into a new sample set SN t+1 , representing the posterior pdf (probability density function) 9

P (xt+1 | yt+1 ) at time t + 1. The object location in the particle filter is usually represented by the sample mean. Particle filtering is particularly suitable for pupil tracking, because changes in pupil position are fast and do not follow a smooth and predictable pattern. The robustness of particle filters lies in maintaining a set of hypothesis. Generally, the larger the number of hypotheses, the better the chances to get accurate tracking results, but the slower the tracking speed. Therefore, there is a trade-off between tracking accuracy and speed. Using particle filters in large images may require a large set of particles to sufficiently sample the spatial parameters. Adding particles to the particle set may only improve accuracy slowly, due to the sampling strategy. This added accuracy may become costly in terms of computation time. On the other hand, Mean Shift [5] is an efficient method to estimate the local mode of a distribution using a gradient-based optimization. This method combines particle filtering with the Mean Shift algorithm. In the approach particle filtering is used to obtain an estimate of the pupil position. The Mean Shift algorithm is then applied to find the local mode using the sample mean estimate from the particle filer for initialization. In this way the particle filter samples the posterior more effectively, while the Mean Shift reaches the local maximum.

6 Likelihood Model

In this section we describe our contribution of the likelihood model used in the particle filter-based eye tracker. The model uses the distribution of the eye regions in the dark and bright pupil images. The log ratio likelihood of the foreground and background models are used to avoid the use of thresholds. The use of thresholds should be minimized if possible as they may throw away useful information and can be difficult to define to be applicable in a wide variety of condition. The problem with thresholds becomes apparent for eye tracking in cases where the bright pupil disappears or when light condition change.

6.1 Likelihood of the image

The observations depend on the eye region position in the image. This means that the likelihoods computed for different locations are not comparable, as they are likelihoods of different observations. A better evaluation function is given by the likelihood of the entire image I given a region at location µ, as a function f ∗ (I | µ) of the contour location µ. Denote by fa (I) the likelihood of the image given no contour and by fR (I | µ) the ratio f ∗ (I | µ)/fa (I), then the log-likelihood of the entire image can be decom10

posed as follows: log f ∗ (I | µ) = log fa (I) + log fR (I | µ)

(1)

The first term on the right-hand side of equation 1 involves complex statistical dependencies between pixels, and is expensive to calculate as all image pixels must be inspected. Most importantly, the estimation of this term is needless as it is an additive term which is independent on the presence and location of the eye region. Consequently, in order to fit the eye model to the image, we consider only the loglikelihood ratio log fR (I | µ). This derivation is fairly standard in the field of active contours, see e.g. [8]. Note that fR (I | µ) is the ratio between the likelihood for the hypothesis that the target is present; and the null hypothesis that the target is not present (equation 6). Hence the likelihood ratio can also be used for testing the hypothesis of the presence of a the eye region.

6.2 Example of an Eye Tracker

In this section we describe the components of a particle filter-based eye region tracker. The model uses the information of the distribution of the eye regions in the dark and bright pupil images as well as the dark-bright pupil difference image. The use of thresholds should be minimized if possible as they may throw away useful information and can be difficult to define to be applicable in a wide variety of condition. The problem with thresholds becomes apparent for eye tracking in cases where the bright pupil disappears or when light condition change. The log ratio likelihood of the foreground and background models are used to avoid the use of thresholds in the likelihood model [20].

6.3 Eye region Model Each pixel within a hypothesized eye region λj is considered. A region is defined by its location µj and scale Σj and we assume that the probability of an image coordinate u given the region is a Gaussian function of the distance to the mean µ:

gj (u | λj ) =

1

2π |

q

1 exp(− ∆uTj Σ−1 j ∆uj ) 2 Σj |

(2)

where ∆u is the distance of u from the centroid µ Target regions can a priori have any distribution of gray levels and may change over time. This change may be due to head rotations and external disturbances from 11

other light sources. In other words, the object region can be represented through its distribution, without being overly committed. The Gaussian probability of the coordinates reflects the importance (or the reliability) of that particular area of the object. The obvious reason for employing this kernel is that the peripheral areas are assumed to be the least reliable due to background clutter. The localization and positiveness of the kernel enables the calculation of the local mean and furthermore regularizes the distribution. The desire for differentiability is due to the computational efficiency. A differentiable kernel yield a differentiable similarity measure. The differentiability of kernels may, in turn, be used for finding the mode of the distribution, through gradient-based optimization (mean shift).

Similarity of Target and Candidate Distributions The model consists of a target model q and a candidate q(u) evaluated in the coordinate u. The similarity between the two distributions is expressed as measurements derived from the Bhattacharyya distance, which is defined by: q

ξ(y) ≡

1 − ρ(y)

(3)

where

ρ(y) ≡ ρ[p(y), q] =

m q X

pz (y)qz

(4)

z=1

To make the observation model exploit both feature probabilities from both the dark and bright pupil images, the target and candidate distributions are constructed using the joint feature histogram of the both images within the region λi

Background Model The background model is defined to be those pixels where the eye is not present. We assume the background model only depends on the dark/bright difference image. It is well known that the the pdf of gray-level differences is well approximated by a Laplacian [27] µ

fL (∆I) =

1 ∆I exp − | | ZL σ

¶

(5)

where ∆I is the gray level difference, σ is related to width of the laplacian function. If there is no known object in the region λj , the pdf of the gray levels follows the laplacian distribution in equation 5. Assuming statistical independence between gray level differences in a hypothesized region λh the pdf of the observation in the absence of an eye is given by 12

fa (I) ≡

Y

fL [∆I(i)]

(6)

i∈λh

Note that the absence of an eye does not imply the absence of high value intensities: there can be regions within the background that exhibit similar properties as the bright and dark pupil effects. Even though the background is occluded by the eye it is still present at every location. By explicitly modeling the background thresholds become needles.

Likelihood ratio We are now in a position to formulate the likelihood ratio fR (I | µ). The likelihood ratio is given by equation 7: p(I | x) =

fe ξ(y) =Q fa i fL [∆I(i)]

(7)

Notice that the likelihood term fuses information from both dark and bright pupil images and their difference image into one model. In contrast to other eye tracking methods using dark and bright pupil images, this leads to a model which avoids explicit feature detection.

Eye State Model and Dynamics The state of the eye region is modeled as a rectangle with position (x, y) and scale s. The dimensions of rectangle is fixed once initialized. The state to be estimated is therefore given by X = (x, y, s)T

(8)

Pupil movements can be very rapid from one image frame to another and thus acceleration is therefore less important to model. As no a priori knowledge of the movements is available, the dynamics is modeled as a first order auto regressive process using a time dependent Gaussian noise model: The actual dynamical model being used is therefore:

xt+1 = xt + νt ,

νt ∼ N (0, Σt )

(9)

where Σt is the covariance matrix of the noise vt at time t. Time dependence is included as to reflect that size changes may also change the apparent movements of the eye region. For this reason the elements in the covariance matrix corresponding to the two first components in the state model (x and y) are changed according to a linear function of the the size of the sample mean in the previous time step. 13

Method

Detection rate

False negatives

false positives(frame)

Haar

94.67%

6.35%

8.10

Validation

91.17%

2.93%

0.07

Figure 5. Detection results: Comparing the Haar classifier with (bottom row) and without (top row) glint verification

7 Experimental Results

In this section, we present experiments conducted to validate the performance of the initial eye region detector and the eye tracker under different conditions.

7.1 Detection The classifier is trained from 2600 positive examples and 5000 negative examples resulting in a cascade classifier consisting of 19 stages. The detection results on a set of 260 images is shown in table 5. In the table a comparison between a detection with the boosted classifier with (verification) and without the use of dark and bright pupil images. The overall detection rate (in single frames) is decreased when using the dark/bright pupil differences as verification. The reason is that there can be a large difference in the apparent brightness of the bright pupil images between subjects [42] and thus positive candidates may be rejected. More importantly, the number of false positives is dramatically reduced when using the verification method. The limitation of this approach is that this stage relies on thresholding.

7.2 Eye Tracker In the all experiments the number of particles N is set to 100 with one iteration per frame. The noise parameters in the dynamical model is defined manually, but kept constant for the initial frame in each sequence. The noise term νt may change during tracking due to the adaptive dynamical model. Figure 10 shows the results of using the tracker in the presence of glasses, under challenging scenarios with significant head poses and under drastic and challenging light conditions. The bright pupil is reduced or vanishes during these sequences, but even in these cases tracking is maintained. The tracker may be slightly inaccurate when the bright pupil disappears (i.e. during eye closure) and when several or large reflections are present on glasses. However, the tracking is not completely lost. The accuracy of the method is validated against a manually annotated set of images. The Euclidian distance of the center of the rectangle found by the tracker and 14

Figure 6. Tracking Results

Figure 7. Tracking Results

the annotated points is used for the evaluation. The accuracy of the tracker is calculated over 997 frames divided amongst 5 subjects in images of size 320 × 256 pixels. Figure 11 shows the accuracy (in pixels) and the corresponding variance of the tracker when neglecting the local optimization of the Mean Shift method. The accuracy improves with the number of particles used, but it seems to converge around 100 particles and so does the variance. The use of mean shift improves accuracy on average by 1 pixel. The added accuracy comes at the cost of a few additional iterations of the Mean Shift method. Notice that tracking performance could be even further improved by using additional local optimization. This may for instance be done through thresholding and other well known methods that have previously been used for eye tracking. Postponing thresholding as late as possible is important to maintain robustness, but thresholds can be applied locally when sufficient information about the image data is available as to improve accuracy. 15

Figure 8. Tracking Results

Figure 9. Tracking the eye with (the eye region(red) and center of the region(blue) glasses and under challenging light conditions.

8 Conclusion

During the last decade, a tremendous effort has been made on developing robust and cheap eye tracking systems and applications based on video input. Most current methods use IR light and thresholding on the difference image to obtain the eye observations for detection and tracking the eyes. Thresholding is tempting but may lead to tracking errors. In this paper we also use thresholding on the difference image to obtain a set of candidates for the detection method, but we avoid explicit feature detection while tracking. This is an important distinction as it is important that tracking is maintained even though image observations are poor. For initial detection of the eye we extract a set of candidate regions obtained in the difference image of the dark and bright pupil images. These regions are validated through a cascaded Haar-based eye-classifier. Good results are obtained on near frontal face images. To handle the cases where the image observations are poor we 16

Figure 10. Tracking the eye with (the eye region(red) and center of the region(blue) glasses and under challenging light conditions. Acccuracy

Variance of accuracy

20

3

18 2

16

1

12

Variance

Deviation in pixels

14

10

0

8 −1

6

4 −2

2

0

0

50

100

150

200

250

300

350

400

Number of particles

−3

0

50

100

150

200

250

300

350

400

Number of particles

Figure 11. Mean accuracy and variance in pixels of the tracker.

propose a likelihood model to be used in a particle filter setting to track the eye of a person. In contrast to many recent methods this tracking model avoids the use of explicit feature detection. The tracker performs well in challenging sequences where the user undergoes moderate head pose changes and light conditions. Even though thresholding is not used for tracking it is still possible to use when applicable after tracking has been performed and in this way improve accuracy even further.

References

[1] Geerd Anders. Pilot’s attention allocation during approach and landing - eye- and head-tracking research. In 11th International Symposium on Aviation Psychology, 2001. [2] Patrick Baudisch, Doug DeCarlo, Andrew T. Duchowski, and Wilson S. Geisler. Focusing on the essential: considering attention in display design. Commun. ACM, 46(3):60–66, 2003. [3] Richard A. Bolt.

Gaze-orchestrated dynamic windows.

17

In SIGGRAPH ’81:

Proceedings of the 8th annual conference on Computer graphics and interactive techniques, pages 109–119, New York, NY, USA, 1981. ACM Press. [4] C. Burges. A tutorial on support vector machines for pattern revognition. Data Mining and Knowledge Discovery, 2:121–167, 1998. [5] D. Comaniciu, V. Ramesh, and P. Meer. Real-time tracking of non-rigid objects using mean shift. In IEEE Computer Vision and Pattern Recognition (CVPR), pages II:142– 149, 2000. [6] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active appearance models. In Proc. European Conf. on Computer Vision, volume 2, pages 484–498. Springer, 1998. [7] T. F. Cootes and Taylor. Active shape models – ’smart snakes’. In Proc. British Machine Vision Conf., BMVC92, pages 266–275, 1992. [8] J. Coughlan, A. Yuille, C. English, and D. Snow. Efficient deformable template detection and localization without user initialization. Computer Vision and Image Understanding, 78(3):303–319, 2000. [9] Edmund B. Delabarre. A method of recording eye-movements. American Journal of Psychology, 9(4):572–574, 1898. [10] Andrew T. Duchowski. Eye Tracking Methodology. Theory and Practice. Springer, 2003. [11] Y. Ebisawa and S. Satoh. Effectiveness of pupil area detection technique using two light sources and image difference method. In 5th Annual Int. Conf. of the IEEE Eng. in Medicine and Biology Society, pages 1268–1269, 1993. [12] N. Edenborough, R. I. Hammoud, and A. Harbach et al. Driver state monitor from delphi. In Demon session, IEEE Computer Vision And Pattern Recognition Conference, 2005. [13] I.R. Fasel, B. Fortenberry, and J.R. Movellan. A generative framework for real time object detection and classification. 98(1):182–210, April 2005. [14] I.R. Fasel, B. Fortenberry, and J.R. Movellan. A generative framework for real time object detection and classification. Computer Vision and Image Understanding, 98(1):182–210, April 2005. [15] Joseph H. Goldberg and Anna M. Wichansky. Eye tracking in usability evaluation: A practitioner’s guide, pages 493–516. Elsevier Science, Amsterdam, 2003. [16] K. Grauman, M. Betke, J. Gips, and G.R. Bradski. Communication via eye blinks: Detection and duration analysis in real time. In IEEE Computer Vision and Pattern Recognition (CVPR), pages I:1010–1017, 2001. [17] R. I. Hammoud and D. W. Hansen. Physics of the Automatic Target Recognition, chapter Biophysics of the eye in computer vision: methods and advanced technologies. Springer, Eds: F. Sadjadi, 2005. To appear.

18

[18] Riad I. Hammoud. A robust eye position tracker based on invariant local features, eye motion and infrared-eye responses. In SPIE Defense and Security Symposium, Automatic Target Recognition Conference, Proceedings of SPIE Vol. Nb. 5807, 2005. [19] Riad I. Hammoud, Andrew Wilhelm, Phillip Malawey, and Gerald J. Witt. Efficient real-time algorithms for eye state and head pose tracking in advanced driver support systems. In IEEE Computer Vision And Pattern Recognition Conference, 2005. [20] Dan Witzner Hansen and Riad I. Hammoud. Boosting particle filter-based eye tracker performance through adapted likelihood function. In IEEE International Conference on Advanced Video and Signal based Surveillance, pages 111–116, 2005. [21] Dan Witzner Hansen, John Paulin Hansen, Mads Nielsen, Anders Sewerin Johansen, and Mikkel B. Stegmann. Eye typing using markov and active appearance models. In IEEE Workshop on Applications on Computer Vision, pages 132–136, 2003. [22] John Paulin Hansen, A.W. Andersen, and P. Roed. Eye-gaze control of multimedia systems. In K. Ogawa Y. Anzai and H. Mori, editors, Advances in Human Factors/Ergonomics: Symbiosis of Human and Artifact, pages 37–42. Elsevier Science B.V., 1995. [23] John Paulin Hansen, Kristian Tørning, Anders Sewering Johansen, Kenji Itoh, and Hirotaka Aoki. Gaze typing compared with input by head and hand. In Proceedings Eye Tracking Research & Applications Symposium, pages 131 – 138, 2004. [24] Andrew P Harbach, Gregory K Scharenbroch, Gerald J Witt, Timothy J Newman, Nancy Edenborough, and Hammoud Riad I. Imaging system and method for monitoring an eye. United States, Patent, US 2005/0100191 A1, May 2005, MAY 2005. (issued). [25] A. Haro, M. Flickner, and I. Essa. Detecting and tracking eyes by using their physiological properties, dynamics, and appearance. In IEEE Conf. on Comp. Vision and Pattern Recognition, Hilton Head Island, South Carolina, USA, JUNE 2000. [26] R. Herpers, M. Michaelis, K. Lichtenauer, and G. Sommer. Edge and keypoint detection in facial regions. In International Conference on Automatic Face and Gesture-Recognition, pages 212–217, 1996. [27] J. Huang and D. Mumford. Statistics of natural images and models. In IEEE Computer Vision and Pattern Recognition (CVPR), pages I: 541–547, 1999. [28] Aulikki Hyrskykari, P¨aivi Majaranta, and Kari-Jouko R¨aih¨a. Proactive response to eye movements. In Matthias Rauterberg, Marino Menozzi, and Janet Wesson, editors, INTERACT ’03: IFIP TC13 International Conference on Human-Computer Interaction, pages 129–136, Amsterdam, 2003. IOS Press. [29] Aulikki Hyrskykari, P¨aivi Majaranta, and Kari-Jouko R¨aih¨a. From gaze control to attentive interfaces. In Proceedings of the 11th International Conference on HumanComputer Interaction (HCII 2005). IOS Press, 2005. [30] J.P. Ivins and J. Porrill. A deformable model of the human iris for measuring small 3-dimensional eye movements. Machine Vision and Applications, 11(1):42–51, 1998.

19

[31] Qiang Ji and Xiaojie Yang. Real-time eye, gaze, and face pose tracking for monitoring driver vigilance. Real-Time Imaging, 8(5):357–377, 2002. [32] Inger Jordansen, Stina Boedeker, Mick Donegan, Lisa Oosthuizen, M. Girolamo., and John Paulin Hansn. Report on a market study and demographics of user population (cogain) ist-2003511598: Deliverable 7.2). http://www.cogain.org/results/reports/COGAIN-D7.2.pdf, 2005. [33] S. Kawato and N. Tetsutani. Detection and tracking of eyes for gaze-camera control, 2002. [34] Irwin King and Lei Xu. Localized principal component analysis learning for face feature extraction and recognition. In Proceedings to the Workshop on 3D Computer Vision, pages 124–128, Shatin, Hong Kong, 1997. [35] C. Lankford. Effective eye-gaze input into windows. In Eye Tracking Research & Applications Symposium 2000 (ETRA’O0), pages 23–27, 2000. [36] http://www.eyegaze.com. LC Technologies INC., 2004. [37] Rainer Lienhart and Jochen Maydt. An extended set of haar-like features for rapid object detection. In IEEE ICIP 2002, volume 1, pages 900–903, September 2002. [38] G. Loy and A. Zelinsky. Fast radial symmetry for detecting points of interest. PAMI, pages 959–973, August 2003. [39] Y. Matsumoto and A. Zelinsky. An algorithm for real-time stereo vision implementation of head pose and gaze direction measurement. In International Conference on Automatic Face and Gesture Recognition, pages 499–504, 2000. [40] Fan Johnson Messom. Machine vision for an intelligent tutor. [41] C.H. Morimoto, D. Koons, A. Amir, and M. Flickner. Pupil detection and tracking using multiple light sources. IVC, 18(4):331–335, 2000. [42] Karlene Nguyen, Cindy Wagner, David Koons, and Myron Flickner. Differences in the infrared bright pupil response of human eyes. In ETRA ’02: Proceedings of the symposium on Eye tracking research & applications, pages 133–138, New York, NY, USA, 2002. ACM Press. [43] M. Nixon. Eye spacing measurements for facial recognition. Applications of Digital Image Processing, 575(VIII):279–285, 1985. [44] V. Ponsoda, D. Scott, , and J. M. Findlay. A probability vector and transition matrix analysis of eye movements during visual search. Acta Psychologica, 88:167–185, 1995. [45] K. Rayner, C. Rotello, C.M. Steward, A.J. Keir, and A. Duffy. When looking at print advertisements. Journal of Experimental Psychology:Applied, 7(3):219–226, 2001. [46] Seeingmachines. http://www.seeingmachines.com/, 2005. [47] http://www.smarteye.se. Smart Eyes A/B, 2004.

20

[48] K.K. Sung and T. Poggio. Example-based learning for view-based human face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):39– 51, 1998. [49] http://www.tobii.se/. Tobii Technologies, 2004. [50] Roel Vertegaal, Ivo Weevers, and Changuk Sohn. Gaze-2: an attentive video conferencing system. In CHI ’02: Extended abstracts on Human factors in computing systems, pages 736–737, New York, NY, USA, 2002. ACM Press. [51] P. Viola and M. Jones. Robust real-time face detection. In International Conference on Computer Vision, page II: 747, 2001. [52] Paul A. Viola and Michael J. Jones. Robust real-time face detection. International Journal of Computer Vision, 57(2):137–154, 2004. [53] V. Vogelhuber and C. Schmid. Face detection based on generic local descriptors and spatial constraints. In Proceedings of the 15th International Conference on Pattern Recognition, Barcelona, Spain, volume vol. 1, pages 1084–1087, 2000. [54] R. Wagner and H.L Galiana. Evaluation of three template matching algorithms for registering images of the eye. In IEEE Tras. Biomed. Eng., volume 12, pages 1313– 1319, 1992. [55] D. J. Ward and D. J. C. MacKay. Fast hands-free writing by gaze direction. Nature, 418(6900):838, 2002. [56] M. Wedel and R. Peiters. Eye fixations on advertisments and memory for brands: A model and findings. Marketing Science, 19(4):297–312, 2000. [57] Jie Yang, Rainer Stiefelhagen, Uwe Meier, and Alex Waibel. Robust detection of facial features by generalized symmetry. In International Conference on Pattern Recognition, pages I:117–120, 1992. [58] A.L. Yarbus. Eye Movements and Vision. in Plenum Press, 1967. [59] D.H. Yoo and M.J. Chung. A novel non-intrusive eye gaze estimation using cross-ratio under large head motion. 98(1):25–51, April 2005. [60] A. L. Yuille, P. W. Hallinan, and D.S Cohen. Feature extraction from faces using deformable templates. International Journal of Computer Vision, 8(2):99–111, 1992. [61] Z. Zhu and Q. Ji. Robust real-time eye detection and tracking under variable lighting conditions and various face orientations. Computer Vision and Image Understanding, 98(1):124–154, April 2005.

21

An Improved Likelihood Model for Eye Tracking

Dec 6, 2005 - This property makes eye tracking systems a unique and effective tool for disabled people ...... In SPIE Defense and Security Symposium,. Automatic ... Real-time eye, gaze, and face pose tracking for monitoring driver vigilance.

Download PDF

447KB Sizes 4 Downloads 364 Views

Report

An Improved Likelihood Model for Eye Tracking

Recommend Documents