Semantically-based Human Scanpath Estimation with HMMs Huiying Liu , Dong Xu, Qingming Huang, Wen Li, Min Xu, Stephen Lin Institute for Infocomm Research (I2R), Singapore Nanyang Technological University Chinese Academy of Sciences University of Technology, Sydney Microsoft Research, Beijing

Presenter: Huiying Liu

Scanpath estimation What is scanpath Eye gaze sequence

Our purpose Estimate human scanpath

User scanpaths

Estimated scanpaths

Motivation Potential applications Understand humans’ behavior when watching an image Training: Show the trainee how the experienced watch the scene

Medical diagnosis

Motivation Potential applications Understand humans’ behavior when watching an image Training: Show the trainee how the experienced watch the scene

Driving

Driving

Motivation Potential applications Understand humans’ behavior when watching an image Training: Show the trainee how the experienced watch the scene Design: Guide audiences to watch in sequence

Motivation Potential applications Understand humans’ behavior when watching an image Training: Show the trainee how the experienced watch the scene Design: Guide audiences to watch in sequence

Scanpath estimation

Motivation Potential applications Less work Gaze density

Salient region

Scanpath

Information

SUN

Contrast

Itti’s GB VS

SGC Pro obj

Saliency ranking

AIM

Learning

C2OH

Gaze density estimation and salient region detection MTL

Pro obj

Biological method

Itti’s

CRF SVM

Proto object ranking

WW

MIL Rank ing

Scanpath estimation

Saliency ranking

Saliency map

Proto objects

scanpath

Itti saliency ranking Proto object based saliency ranking

L. Itti et al, A model of saliency-based visual attention for rapid scene analysis. T-PAMI, 1998. D. Walther et al, Modelling attention to salient proto-objects. Neural Networks, 2006.

Biologically inspired method

W. Wang et al. "Simulating human saccadic scanpaths on natural images." CVPR, 2011.

Y

Feature saliency

Overview Three factors affecting gaze shift Feature saliency • Salient regions attract more attention Feature • Feature difference saliency • YUV color and Gabor feature

Semantic content • Gaze focus on meaningful contents • HMM • BoVW

Spatial Spatialposition

Semantic content •position Tendency of gaze shift to near position • Levy flight • Cauchy distribution

L

Spatial position

H

Semanti c content

Y

Overview

L

Three factors affecting gaze shift pgt 1 g1 , , gt   py t 1 , zt 1 , ut 1 y1 , z1 , u1 , , y t , zt , ut  Gaze point

Feature saliency

Semantic content

Spatial position

Assume gaze shift to be a Markov process. pgt 1 gt   py t 1 , zt 1 , ut 1 y t , zt , ut 

Further assume the independence between the three factors. pgt 1 gt   py t 1 y t  pzt 1 zt  put 1 ut 

H

Y

Feature Saliency Weights between two regions Wr ,s  y s   y r 

Transfer probability



py

s 

y

r 



Wr , s R

W s 1

r ,s

The region of higher difference will be gazed more frequently. Feature difference maximizes saliency J. Harel. "Graph-based visual saliency." NIPS. 2006.

   , ,  

HMM

H





Transition matrix

Prior distribution

Hidden states Observation described with BoVW Visual words

z1 z2 z3 z4

z2

z1

X1 w1

z3

X3

X2 w2

w3

… w4 … wK

z4

X4

 i ,k  p  wk | zi  Emission matrix

Gaze shift prediction

H

Forward method Gazed region and BoVW representation

x1 , x 2 , , xt

g1 , g 2 , , gt

bi x   px z  i    i ,k  k K

x

k 1

 t ,i  pzt  i, x1 , x t  1,i  bi x1  i

The probability of state i at time t

M

 t ,i  bi x t   j ,i t 1, j , t  1, , T j 1

pxt 1  x x1 , , x t   px1 , , xt , x  The probability of x be gazed next M

M

i 1

j 1

  bi x  j ,i t , j

Model learning Forward

1,i  bi x1  i M

 t ,i  bi xt  j ,i t 1, j j 1

Backward  T ,i  1

H N

1 i  N

 t ,i , j 

n    1,i n 1

 t ,i i , j b j xt 1  t 1, j

M

M

 i 1 j 1

N Tn 1

i , j   t,ni ,j n 1 t 1

M

 t 1,i   i , j bi x t  t , j j 1

 b j xt 1  t 1, j

t ,i i , j

N Tn 1 M

n    t ,i, j n 1 t 1 j 1

M

t ,i   t ,i , j ,T ,i  T ,i j N

Tn

i ,k  t,ni  xt,nk n 1 t 1

K

N

Tn

n  n    t ,i xt ,k k 1 n 1 t 1

Brief discussion about HMM The trained parameters have real meaning The hidden states do represent the semantic topics The prior distribution represents human gaze

H

Brief discussion about HMM The trained parameters have real meaning The hidden states do represent the semantic topics The prior distribution represents human gaze Transition matrix

D. Brockmann et al. "Are human scanpaths Levy flights?." Artificial Neural Networks, 1999.

H

Spatial position

L

The step length satisfies Levy flight Cauchy distribution pu t 1  u u t  





2 u  u t

pd   





2 d 2   d

d

2





3 2 2



3 2 2

 2d

2





3 2 2

Test data

NUSEF 476/758 images Ave. 25 users for each image Two subsets from it NUSEF-portrait NUSEF- face

JUDD 1003 images 15 users for each image

Measurement Similarity between sequences The Smith-Waterman local alignment algorithm

H L1 1L2 1 hi , 0  0,0  i  L1

S1=ACACACTA S2=AGCACACA

h0, j  0,0  j  L2 if wmatch , wS1,i , S 2, j    wmismatch , if 0 h  i 1, j 1  wS1,i , S 2, j  hi , j  max  hi 1, j  wS1,i ,  h  w, S  2, j  i , j 1

S1,i  S 2, j S1,i  S 2, j

mis match deletion insertion

S1’=A- CACACTA S2’=AGCACAC-A

s1,1  s2,1

S1=ACACACTA

0  h  2  2 S2=AGCACACA  0,0 h1,1  max  Match=2 h0,1  1  1 Mismatch=insertion=deletion=-1 h1,0  1  1

O=

A G C A C A C A

- A C A C A C T A 0 0 0 0 0 0 0 0 0 0 ↘ 0 0 0 0 0 0 0

H=

A G C A C A C A

- A C A C A C T A 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0

s2,1  s1,2 0  h  1  0  1  1  1,0 h2,1  max  h1,1  1  1   h2,0  1  1

O=

A G C A C A C A

0 0 0 0 0 0 0 0 0

A C A C A C T A 0 0 0 0 0 0 0 0 ↘

↓ H=

A G C A C A C A

0 0 0 0 0 0 0 0 0

A C A C A C T A 0 0 0 0 0 0 0 0 2 1

Similarity=max(H)=12 S1’=A- CACACTA S2’=AGCACAC-A

O=

A G C A C A C A

0 0 0 0 0 0 0 0 0

A 0 ↘ ↓ ↓ ↘ ↓ ↘ ↓ ↘

C 0 → ↘ ↘ ↓ ↘ ↓ ↘ ↓

A 0 ↘ ↓ → ↘ ↓ ↘ ↓ ↘

C 0 → ↘ ↘ → ↘ ↓ ↘ ↓

A 0 ↘ ↓ → ↘ → ↘ ↓ ↘

C 0 → ↘ ↘ → ↘ → ↘ ↓

T 0 → ↘ → → → → → ↘

A 0 ↘ ↓ → ↘ → ↘ → ↘

H=

A G C A C A C A

0 0 0 0 0 0 0 0 0

A 0 2 1 0 2 1 2 1 2

C 0 1 1 3 2 4 3 4 3

A 0 2 1 2 5 4 6 5 6

C 0 1 1 3 4 7 6 8 7

A C T A 0 0 0 0 2 1 0 2 1 1 0 1 2 3 2 1 5 4 3 4 6 7 6 5 9 8 7 8 8 11 10 9 10 10 10 12

Parameter test-codebook size

Large codebook will over-fit the dataset

Parameter test-number of states

It is stable with number of states

Parameter test-training sample

Increase fast at first then slower down. It is stable with the number of training samples

Effectiveness of gaze factors

Semantic content with HMM is effective and outperforms feature saliency and spatial position The full combination of three factors performs significantly better than the individual ones

Comparison with other methods

Our method significantly outperforms Itti and Proto object based method (t-test, p=0.05) Our method significantly outperforms WW on face, portrait, NUSEF, is comparable with it on JUDD

examples

Conclusion A scanpath estimation method is proposed It considers three factors, feature saliency, spatial position, and semantic content Semantic content is represented through HMM Spatial position is modeled as Cauchy distribution

Experiments have verified the effectiveness of the method It outperforms the existing methods

E-health and Medical imaging Research in I2R

saliency. Semanti c content. Spatial position. Feature saliency. • Salient regions attract more attention. • Feature difference. • YUV color and Gabor feature. Semantic content. • Gaze focus on meaningful contents. • HMM. • BoVW. Spatial position. • Tendency of gaze shift to near position. • Levy flight. • Cauchy distribution. Y.

3MB Sizes 2 Downloads 97 Views

Recommend Documents

Computerized Medical Imaging and Graphics ... - Computer Science
structed mandible thus far. Ideally (if S is contained in R) then adding a fragment to S increases the numerator in the defin- ing Eq. (12) and decreases the denominator. So, if TCf,g decreases when a fragment is added that's compelling evidence that

Minority Women and eHealth: Social Inclusion in Online ...
College of Management. Department of Business Management ... KEYWORDS: eHealth, ethnicity, gender, HIV/AIDS, internet, race, social inclusion ... location of providers, lack of transportation, lack of child care, and other factors. A growing ...

Computerized Medical Imaging and Graphics Focal ...
computer scientists who developed the complex map for identifying the cortical dysplasia. ... fully collected data from international surveys indicate that about.

Read PDF Medical Imaging: Technology and ...
... IEEE Xplore Delivering full text access to the world s highest quality technical ... that the Internet of Things IoT represents the point in time when more devices ...

Spectral Clustering for Medical Imaging
integer linear program with a precise geometric interpretation which is globally .... simple analytic formula to define the eigenvector of an adjusted Laplacian, we ... 2http://www-01.ibm.com/software/commerce/optimization/ cplex-optimizer/ ...