Computer Vision for the Blind: a Comparison of Face Detectors in a Relevant Scenario Marco De Marco Gianfranco Fenu Eric Medvet Felice Andrea Pellegrino Department of Engineering and Architecture University of Trieste Italy

Goodtechs, 30/11–1/12 2016, Venice (Italy) http://machinelearning.inginf.units.it

Blindness

Many assistants proposed to aid blind and visually impaired persons Some of them consists of a smart First Person Video (FPV) device, worn by the blind, for easing social interactions: Is there anybody around? How many people? Is there someone I know? Is there someone approaching me?

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

2 / 15

Blindness

Many assistants proposed to aid blind and visually impaired persons Some of them consists of a smart First Person Video (FPV) device, worn by the blind, for easing social interactions: Is there anybody around? How many people? Is there someone I know? Is there someone approaching me?

Face Detection is an essential step: how effective are current detectors on real FPV images?

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

2 / 15

Real FPV images

Motion blur (mannerism, loppy device) Suboptimal framing Rapidly varying light conditions Occlusions Distortion (wide-angle device)

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

3 / 15

Real FPV images

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

4 / 15

Our work in brief

1

4 relevant video sequences, manually annotated

2

6 recent face detectors experimental comparison

3

are detector effective? what kind of faces do they struggle to detect?

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

5 / 15

Video sequences Name Coffee-shop Library Office Bus-stop

Resolution

Camera

Location

# frames

# faces

1280 × 720 1280 × 720 1920 × 1080 1920 × 1080

GX9 GX9 CUBE CUBE

Indoor Indoor Indoor Outdoor

361 361 558 448

809 1074 206 1610

acquired by a blind person (with all the privacy-related issues correctly addressed) two different worn devices (124◦ and 135◦ ) many interactions

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

6 / 15

Manual annotation

For each frame, each face largest than 20 px bounding box (specific criteria) centers of eyes and mouth occlusion flag lateral flag

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

7 / 15

Faces features Aimed at better characterizing detectors behavior: normalized bounding box area (NBBA): are smaller (farther) faces harder to detect? normalized distance from the center of the image (NDFC): are peripheral (distorted) faces harder to detect? root mean square contrast (RMSC) within the bounding box roll angle: are oblique faces harder to detect? occlusion: are occluded faces harder to detect? lateral (yaw): are lateral faces harder to detect?

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

8 / 15

Faces features

We set a statically chosen threshold on each feature, assuming a trivial relation between feature and easyness of detection e.g., NBBA ≤ τ means small, hence harder to detect e.g., NDFC ≥ τ means distorted, hence harder to detect

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

9 / 15

Contenders

Recent, with available implementation Viola-Jones (VJ), from Matlab Computer Vision Toolbox GMS Vision, from Android JDK + OpenCV for frame grabbing Normalized Pixel Difference (NPD), authors’ code Pixel Intensity Comparison (PICO), authors’ code Face-Id, deep learning, both detection and recognition, authors’ code Visage, commercial solution for 2D/3D face identification, demo tool All with default parameters (fairness)

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

10 / 15

Detector assessment Computed on a sequence: true positives (TP): number of detected faces false positives (FP): number of detections which are not faces false negatives (FN): number of undetected faces Cast as: precision, ratio of detected faces among all detections: recall, ratio of detected faces among all faces: false positives per frame (FPPF):

TP TP+FP

TP TP+FN

FP nf

meaningful for video: how often a wrong detection occurs?

Comparable among sequences

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

11 / 15

Detected vs. undetected face On a single frame: zero or more detections di (regions deemed to contain a face) zero or more faces gi (manually annotated bounding boxes) How to decide/count TP, FP, FN?

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

12 / 15

Detected vs. undetected face On a single frame: zero or more detections di (regions deemed to contain a face) zero or more faces gi (manually annotated bounding boxes) How to decide/count TP, FP, FN? 1

for each i, j, compute Intersection to Union Areas Ratio area(d ∩g ) IUAR(di , gj ) = area(dii ∪gjj )

2

find best matches (using Hungarian algorithm) decide

3

IUAR(di , gj ) > 0.5, di is a TP IUAR(di , gj ) ≤ 0.5, di is a FP gj is not assigned to any di , gj is a FN

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

12 / 15

Results: general Method

Sequence

Precision

Recall

FPPF

Viola-Jones

Coffee-shop Library Office Bus-stop Average

0.129 0.140 0.031 0.222 0.132

0.367 0.267 0.709 0.725 0.513

5.543 4.867 8.197 9.158 7.196

GMS

Coffee-shop Library Office Bus-stop Average

0.364 1.000 0.387 0.202 0.284

0.015 0.004 0.141 0.020 0.021

0.058 0.000 0.082 0.290 0.114

NPD

Coffee-shop Library Office Bus-stop Average

0.228 0.159 0.256 0.687 0.376

0.305 0.222 0.583 0.747 0.489

2.319 3.504 0.625 1.221 1.735

De Marco et al. (UniTs)

Method

Sequence

Precision

Recall

FPPF

PICO

Coffee-shop Library Office Bus-stop Average

0.337 0.030 0.538 0.202 0.589

0.121 0.003 0.413 0.020 0.160

0.535 0.266 0.131 0.290 0.238

Face-Id

Coffee-shop Library Office Bus-stop Average

0.143 0.889 − 1.000 0.611

0.001 0.007 0.0 0.001 0.003

0.017 0.003 0.0 0.000 0.004

Visage

Coffee-shop Library Office Bus-stop Average

0.043 0.045 0.137 0.072 0.087

0.002 0.001 0.068 0.006 0.007

0.125 0.058 0.158 0.286 0.163

CV for Blind: Face Det. Comparison

13 / 15

Results: general Method

Sequence

Precision

Recall

FPPF

Viola-Jones

Coffee-shop Library Office Bus-stop Average

0.129 0.140 0.031 0.222 0.132

0.367 0.267 0.709 0.725 0.513

5.543 4.867 8.197 9.158 7.196

GMS

Coffee-shop Library Office Bus-stop Average

0.364 1.000 0.387 0.202 0.284

0.015 0.004 0.141 0.020 0.021

0.058 0.000 0.082 0.290 0.114

NPD

Coffee-shop Library Office Bus-stop Average

0.228 0.159 0.256 0.687 0.376

0.305 0.222 0.583 0.747 0.489

2.319 3.504 0.625 1.221 1.735

Method

Sequence

Precision

Recall

FPPF

PICO

Coffee-shop Library Office Bus-stop Average

0.337 0.030 0.538 0.202 0.589

0.121 0.003 0.413 0.020 0.160

0.535 0.266 0.131 0.290 0.238

Face-Id

Coffee-shop Library Office Bus-stop Average

0.143 0.889 − 1.000 0.611

0.001 0.007 0.0 0.001 0.003

0.017 0.003 0.0 0.000 0.004

Visage

Coffee-shop Library Office Bus-stop Average

0.043 0.045 0.137 0.072 0.087

0.002 0.001 0.068 0.006 0.007

0.125 0.058 0.158 0.286 0.163

All detectors perform poorly on average

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

13 / 15

Results: general Method

Sequence

Precision

Recall

FPPF

Viola-Jones

Coffee-shop Library Office Bus-stop Average

0.129 0.140 0.031 0.222 0.132

0.367 0.267 0.709 0.725 0.513

5.543 4.867 8.197 9.158 7.196

GMS

Coffee-shop Library Office Bus-stop Average

0.364 1.000 0.387 0.202 0.284

0.015 0.004 0.141 0.020 0.021

0.058 0.000 0.082 0.290 0.114

NPD

Coffee-shop Library Office Bus-stop Average

0.228 0.159 0.256 0.687 0.376

0.305 0.222 0.583 0.747 0.489

2.319 3.504 0.625 1.221 1.735

Method

Sequence

Precision

Recall

FPPF

PICO

Coffee-shop Library Office Bus-stop Average

0.337 0.030 0.538 0.202 0.589

0.121 0.003 0.413 0.020 0.160

0.535 0.266 0.131 0.290 0.238

Face-Id

Coffee-shop Library Office Bus-stop Average

0.143 0.889 − 1.000 0.611

0.001 0.007 0.0 0.001 0.003

0.017 0.003 0.0 0.000 0.004

Visage

Coffee-shop Library Office Bus-stop Average

0.043 0.045 0.137 0.072 0.087

0.002 0.001 0.068 0.006 0.007

0.125 0.058 0.158 0.286 0.163

All detectors perform poorly on average Best is NPD on Bus-stop, but with 1.2 FPPF!

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

13 / 15

Results: general Method

Sequence

Precision

Recall

FPPF

Viola-Jones

Coffee-shop Library Office Bus-stop Average

0.129 0.140 0.031 0.222 0.132

0.367 0.267 0.709 0.725 0.513

5.543 4.867 8.197 9.158 7.196

GMS

Coffee-shop Library Office Bus-stop Average

0.364 1.000 0.387 0.202 0.284

0.015 0.004 0.141 0.020 0.021

0.058 0.000 0.082 0.290 0.114

NPD

Coffee-shop Library Office Bus-stop Average

0.228 0.159 0.256 0.687 0.376

0.305 0.222 0.583 0.747 0.489

2.319 3.504 0.625 1.221 1.735

Method

Sequence

Precision

Recall

FPPF

PICO

Coffee-shop Library Office Bus-stop Average

0.337 0.030 0.538 0.202 0.589

0.121 0.003 0.413 0.020 0.160

0.535 0.266 0.131 0.290 0.238

Face-Id

Coffee-shop Library Office Bus-stop Average

0.143 0.889 − 1.000 0.611

0.001 0.007 0.0 0.001 0.003

0.017 0.003 0.0 0.000 0.004

Visage

Coffee-shop Library Office Bus-stop Average

0.043 0.045 0.137 0.072 0.087

0.002 0.001 0.068 0.006 0.007

0.125 0.058 0.158 0.286 0.163

All detectors perform poorly on average Best is NPD on Bus-stop, but with 1.2 FPPF! Clear trade-off between precision (FPPF) and recall differences among detectors (e.g., Face-Id vs. VJ) De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

13 / 15

Results: recall w.r.t. features Method Face-Id GMS NPD PICO Viola-Jones Visage

NBBA <τ ≥τ 0.001 0.006 0.304 0.054 0.364 0.002

0.001 0.039 0.160 0.143 0.149 0.018

De Marco et al. (UniTs)

NDFC <τ ≥τ 0.001 0.004 0.122 0.046 0.148 0.002

0.002 0.041 0.342 0.151 0.366 0.018

Roll <τ

≥τ

0.001 0.041 0.443 0.190 0.491 0.019

0.000 0.002 0.009 0.005 0.004 0.000

RMSC <τ ≥τ 0.000 0.006 0.277 0.093 0.325 0.003

CV for Blind: Face Det. Comparison

0.002 0.039 0.188 0.104 0.189 0.017

L/NL NL L 0.001 0.044 0.441 0.196 0.486 0.019

0.001 0.001 0.024 0.001 0.027 0.001

O/NO NO O 0.002 0.043 0.441 0.187 0.475 0.020

0.000 0.002 0.023 0.010 0.038 0.000

14 / 15

Results: recall w.r.t. features Method Face-Id GMS NPD PICO Viola-Jones Visage

NBBA <τ ≥τ 0.001 0.006 0.304 0.054 0.364 0.002

0.001 0.039 0.160 0.143 0.149 0.018

NDFC <τ ≥τ 0.001 0.004 0.122 0.046 0.148 0.002

0.002 0.041 0.342 0.151 0.366 0.018

Roll <τ

≥τ

0.001 0.041 0.443 0.190 0.491 0.019

0.000 0.002 0.009 0.005 0.004 0.000

RMSC <τ ≥τ 0.000 0.006 0.277 0.093 0.325 0.003

0.002 0.039 0.188 0.104 0.189 0.017

L/NL NL L 0.001 0.044 0.441 0.196 0.486 0.019

0.001 0.001 0.024 0.001 0.027 0.001

O/NO NO O 0.002 0.043 0.441 0.187 0.475 0.020

0.000 0.002 0.023 0.010 0.038 0.000

occluded/lateral/oblique (roll) faces are much harder to detect

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

14 / 15

Results: recall w.r.t. features Method Face-Id GMS NPD PICO Viola-Jones Visage

NBBA <τ ≥τ 0.001 0.006 0.304 0.054 0.364 0.002

0.001 0.039 0.160 0.143 0.149 0.018

NDFC <τ ≥τ 0.001 0.004 0.122 0.046 0.148 0.002

0.002 0.041 0.342 0.151 0.366 0.018

Roll <τ

≥τ

0.001 0.041 0.443 0.190 0.491 0.019

0.000 0.002 0.009 0.005 0.004 0.000

RMSC <τ ≥τ 0.000 0.006 0.277 0.093 0.325 0.003

0.002 0.039 0.188 0.104 0.189 0.017

L/NL NL L 0.001 0.044 0.441 0.196 0.486 0.019

0.001 0.001 0.024 0.001 0.027 0.001

O/NO NO O 0.002 0.043 0.441 0.187 0.475 0.020

0.000 0.002 0.023 0.010 0.038 0.000

occluded/lateral/oblique (roll) faces are much harder to detect larger faces (NBBA) are easier to detect, except with NPD and Viola-Jones detectors parameters

contrast eases detection, except with NPD and Viola-Jones

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

14 / 15

Results: recall w.r.t. features Method Face-Id GMS NPD PICO Viola-Jones Visage

NBBA <τ ≥τ 0.001 0.006 0.304 0.054 0.364 0.002

0.001 0.039 0.160 0.143 0.149 0.018

NDFC <τ ≥τ 0.001 0.004 0.122 0.046 0.148 0.002

0.002 0.041 0.342 0.151 0.366 0.018

Roll <τ

≥τ

0.001 0.041 0.443 0.190 0.491 0.019

0.000 0.002 0.009 0.005 0.004 0.000

RMSC <τ ≥τ 0.000 0.006 0.277 0.093 0.325 0.003

0.002 0.039 0.188 0.104 0.189 0.017

L/NL NL L 0.001 0.044 0.441 0.196 0.486 0.019

0.001 0.001 0.024 0.001 0.027 0.001

O/NO NO O 0.002 0.043 0.441 0.187 0.475 0.020

0.000 0.002 0.023 0.010 0.038 0.000

occluded/lateral/oblique (roll) faces are much harder to detect larger faces (NBBA) are easier to detect, except with NPD and Viola-Jones detectors parameters

contrast eases detection, except with NPD and Viola-Jones unclear impact of NDFC: easier if far from the center further investigation needed De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

14 / 15

Concluding remarks and future work

Considered detectors perform poorly in this scenario change scenario: e.g., detection of approaching faces use video, rather than a set of still images

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

15 / 15

Thanks!

De Marco et al. (UniTs)

CV for Blind: Face Det. Comparison

15 / 15

Computer Vision for the Blind: a Comparison of Face ...

Camera Location # frames # faces. Coffee-shop ... Outdoor. 448. 1610 acquired by a blind person (with all the privacy-related issues correctly addressed).

560KB Sizes 1 Downloads 101 Views

Recommend Documents

Computer Vision for the Blind: a Comparison of Face ...
Many assistants proposed to aid blind and visually impaired persons. Some of them consists of a smart First Person Video (FPV) device, worn by the blind, ...

Computer Vision for the Blind: a Comparison of Face ...
or on smart devices, as canes [2] or mobile phones [3]. Concerning visual- .... application was created using an Android emulator environment. Unfortunately.

Computer Vision for the Blind: a Comparison of Face ...
Key words: face detection, video sequences, blindness, comparison. 1 Introduction ..... 4 https://visagetechnologies.com/products-and-services/visagesdk/ .... tional Conference on Advanced Concepts for Intelligent Vision Systems, Springer.

Computer vision for the blind: a dataset for experiments on face ...
are integrated in mobile devices such as smartphones or tablets, while devices ... on our specific application (i.e., to develop an effective assistive technology for ...

Computer vision for the blind: a dataset for experiments ...
annotation (e.g., a simple bounding box or several facial landmarks) and in ..... second best algorithm and a modest -3% drop in precision with respect to NPD ...

The Face of Computer Science
research, we need to advocate for ourselves, and we need to convince those in the public eye that ... We fund alternative energy, we save the the planet.

The Face of Computer Science
The NSF provides 86% of the federal funds for computer science research in ... companies, including Amazon.com, investing in cloud computing, the NSF's ...

pdf-1595\embedded-computer-vision-advances-in-computer-vision ...
... apps below to open or edit this item. pdf-1595\embedded-computer-vision-advances-in-comput ... sion-and-pattern-recognition-from-brand-springer.pdf.

ERS State of Texas Vision Plan Comparison Chart.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. ERS State of ...

Ensembles of Generative Adversarial Networks - Computer Vision ...
Computer Vision Center. Barcelona, Spain ... call this class of GANs, Deep Convolutional GANs (DCGAN), and we will use these GANs in our experiments.

a comparison of methods for determining the molecular ...
2009). The spatial and mass resolution of the cosmological simulation is matched exactly to the spatial and mass resolution of the simulation with the fixed ISM. Due to computational expense, the fully self- consistent cosmological simulation was not

A comparison of numerical methods for solving the ...
Jun 12, 2007 - solution. We may conclude that the FDS scheme is second-order accurate, but .... −a2 − cos bt + 2 arctan[γ(x, t)] + bsin bt + 2 arctan[γ(x, t)]. − ln.

CS231M · Mobile Computer Vision
Structure of the course. • First part: – Familiarize with android mobile platform. – Work on two programming assignments on the mobile platform. • Second part: – Teams will work on a final project. – Teams present in class 1-2 state-of-th

The vision of autonomic computing - Computer
Watson Research. Center. Authorized licensed use limited to: Universidad Nacional de Colombia. Downloaded on September 19, 2009 at 15:09 from IEEE Xplore. Restrictions apply. .... weeks. Self-protection. Detection of and recovery from attacks ..... l

LAMoR 2015 Computer Vision - GitHub
Aug 29, 2015 - Michael Zillich. LAMoR Lincoln, Aug 29, 2015. Computer Vision. 10. Sensors .... Was the laptop turned on? .... Represent pose estimate (PDF).

Extraction Of Head And Face Boundaries For Face Detection ieee.pdf
Loading… Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Extraction Of ... ction ieee.pdf. Extraction Of H ... ection ieee.pdf. Open. Extract. Open wit

Extraction Of Head And Face Boundaries For Face Detection.pdf ...
Extraction Of Head And Face Boundaries For Face Detection.pdf. Extraction Of Head And Face Boundaries For Face Detection.pdf. Open. Extract. Open with.

A Tool for Text Comparison
The data to be processed was a comparative corpus, the. METER ..... where xk denotes the mean value of the kth variables of all the entries within a cluster.

A comparison of measures for visualising image similarity
evaluate the usefulness of this type of visualisation as an image browsing aid. So far ... evaluation of the different arrangements, or test them as browsing tools.