Learning a selectivityâinvarianceâselectivity feature ...

Viewer
Transcript

Learning a selectivity–invariance–selectivity feature extraction architecture for images

Michael U. Gutmann

Michael U. Gutmann University of Helsinki

Aapo Hyvärinen University of Helsinki

[email protected]

[email protected]

University of Helsinki - p. 1

Motivation

● Motivation

■

● Research question ● Data ● Architecture ● Learning ● Results

■

● Summary

■

We are very good at detecting specific patterns while being invariant/tolerant to possible variations. It is the pairing of selectivity with invariance which is important. (“tolerant selectivity”) Tolerant selectivities occur at multiple levels Lower- and higher-level tolerant selectivities: a) Same face, luminance and contrast vary

(a) “Low-level”

Michael U. Gutmann

(b) “Higher-level”

b) Same face, facial expression varies (From “Facial Expressions A Visual Reference for Artists” by Mark Simon.)

University of Helsinki - p. 2

Question asked and methodology

● Motivation

■

● Research question ● Data ● Architecture ● Learning ● Results

Basic hypothesis: Higher level tolerant selectivities emerge through a sequence of elementary selectivity and invariance computations. (see for example: Riesenhuber & Poggio, Nature 1999; Kouh & Poggio, NeCo 2008;

● Summary

Rust & Stocker, Curr Op Neurobiol, 2010) ■

■

Michael U. Gutmann

Question asked: In a system with three processing layers, what should be selected and tolerated at each level of the hierarchy? Methodology: ◆ Learn the selectivity and invariance computations from images, using as few assumptions as possible. ◆ Learning ≡ fitting a probability density function

University of Helsinki - p. 3

Data and preprocessing ■ ● Motivation ● Research question ● Data

(Torralba et al, TPAMI 2008)

● Architecture ● Learning ● Results

Tiny images dataset, converted to gray scale: complete scenes downsampled to 32 by 32 images

■

● Summary

■

Preprocessing: ◆ Removing DC component ◆ Normalizing norm after whitening ◆ Reducing the dimension from 32 · 32 = 1024 to 200 Preprocessing can be considered a form of luminance and contrast gain control, followed by low-pass filtering.

Examples from the tiny images dataset before preprocessing.

Michael U. Gutmann

University of Helsinki - p. 4

Feature extraction architecture ■ ● Motivation

■

Let x ∈ R200 be a vectorized image after preprocessing. Feature extraction with three processing layers:

● Research question

(1)

● Data

yi

● Architecture ● Learning

(1) T

= wi

● Results ● Summary

(2)

yk = fth

x # ! " 100 X (2) (1) (2) wki (yi )2 + 1 + bk ln

i = 1 . . . 100 k = 1 . . . 50

i=1

˜ (2) = gain control(y(2) ) y (3) (3) T (2) (3) ˜ + bj yj = fth wj y

j = 1 . . . n(3)

Thresholding function fth (u): smooth version of max(u, 0) Gain control: centering, normalizing the norm after whitening, dimension reduction (similar to the preprocessing) ■

(1)

Parameters of interest: feature vectors wi , pooling weights (3) (2) wki ≥ 0, higher-order feature vectors wj (2)

Other parameters: the thresholds bk Michael U. Gutmann

(3)

and bk

University of Helsinki - p. 5

Learning ■ ● Motivation ● Research question

■

● Data ● Architecture ● Learning ● Results ● Summary

First, learn the parameters of layers one and two. Keeping them fixed, learn the parameters of layer three. For layer one and two, fit the pdf # " 50 X (2) (1) (2) (2) p(x; wi , wki , bk ) ∝ exp yk . | {z } k=1 Parameters

■

For layer three, fit the pdf



■

n X



(3) (3) (3) yj  . p(x; wj , bj ) ∝ exp  | {z } j=1 Parameters

■

(3)

Basic idea: the overall activity of the feature outputs determines how probable the input is. We do not know the partition functions: Likelihood is intractable. Use noise-contrastive estimation for the fitting. (Gutmann and Hyvärinen, JMLR2012)

Michael U. Gutmann

University of Helsinki - p. 6

Noise-contrastive estimation (Gutmann and Hyvärinen, JMLR2012) ● Motivation

■

● Research question ● Data ● Architecture

(1)

(2)

(2)

(3)

(3)

Here: pθ (x) = p(x; wi , wki , bk ) or pθ (x) = p(x; wj , bj )

● Learning ● Results ● Summary

Purpose: learn parameters θ of a pdf pθ when you do not know the partition function.

■

■

Intuition: Learn the differences between the data and auxiliary “noise” whose properties you know. Deduce from the differences the properties of the observed data. More concrete: 1. Choose a random variable z with known pdf pz where sampling is easy. Here: Uniform distribution in the sphere where the data is defined

■

Michael U. Gutmann

2. Obtain an auxiliary sample of z (“noise”). 3. Perform logistic regression on the data and the auxiliary “noise”; use the ratio pθ /pz in the regression function. The procedure provides a consistent estimator of θ.

University of Helsinki - p. 7

Results, layers one and two (1)

(2)

The wi are Gabor-like, the wki are sparse (94.5%: < 10−6 ; 5.1%: > 10) Mostly complex-cell like pooling (2)

Each row corresponds to a different yk (1) wi

(2) yk

= fth ln

hP 100

(2) (1) T x)2 i=1 wki (wi

i

+1 +

(2)

wki

Subset of the features and their icons

Michael U. Gutmann

All the learned features for layer one and two

University of Helsinki - p. 8

(2) bk

Results, layer three Features with enhanced selectivity to orientation and space. Horizontal Vertical Diagonal

Horizontal Vertical Diagonal (3)

(3)

w6

w1

˜ (2) = gain control(y(2) ) y (3) (3) T (2) (3) ˜ yj = fth wj y + bj

(3)

(3)

w2

w7

(3) w3

(3) w8

(3) w4

(3) w9

(3)

k-th element of wj (2)

Activity of yk

is detected. Corre-

sponding icon is colored in red. (3)

k-th element of wj (2)

(3) w10

(3) w5 (3)

Complete set of wj

Michael U. Gutmann

is positive:

Inactivity of yk

is negative:

is detected. Cor-

responding icon is colored in blue.

for n(3) = 10. See paper for n(3) = 100.

University of Helsinki - p. 9

Results, layer three Descriptors of overall image properties? Images giving maximal activation

Features

Images giving minimal activation

Feature outputs were computed for 10000 randomly chosen tiny images.

Michael U. Gutmann

University of Helsinki - p. 10

Summary

● Motivation ● Research question ● Data ● Architecture ● Learning ● Results ● Summary

Michael U. Gutmann

Selectivity and invariance/tolerance are important for any feature extraction system. ■ Question asked: In a system with three processing layers, what should be selected and tolerated at each level of the hierarchy? ■ Looked for an answer by fitting probabilistic models to images: → First layer: Selectivity to Gabor-like image structure → Second layer: Tolerance to exact orientation or localization of the stimulus (“complex-cells”) → Third layer: Enhanced selectivity to orientation and/or location of the stimulus ■

University of Helsinki - p. 11