Learning a selectivity–invariance–selectivity feature extraction architecture for images

Michael U. Gutmann

Michael U. Gutmann University of Helsinki

Aapo Hyvärinen University of Helsinki

[email protected]

[email protected]

University of Helsinki - p. 1

Motivation

● Motivation



● Research question ● Data ● Architecture ● Learning ● Results



● Summary



We are very good at detecting specific patterns while being invariant/tolerant to possible variations. It is the pairing of selectivity with invariance which is important. (“tolerant selectivity”) Tolerant selectivities occur at multiple levels Lower- and higher-level tolerant selectivities: a) Same face, luminance and contrast vary

(a) “Low-level”

Michael U. Gutmann

(b) “Higher-level”

b) Same face, facial expression varies (From “Facial Expressions A Visual Reference for Artists” by Mark Simon.)

University of Helsinki - p. 2

Question asked and methodology

● Motivation



● Research question ● Data ● Architecture ● Learning ● Results

Basic hypothesis: Higher level tolerant selectivities emerge through a sequence of elementary selectivity and invariance computations. (see for example: Riesenhuber & Poggio, Nature 1999; Kouh & Poggio, NeCo 2008;

● Summary

Rust & Stocker, Curr Op Neurobiol, 2010) ■



Michael U. Gutmann

Question asked: In a system with three processing layers, what should be selected and tolerated at each level of the hierarchy? Methodology: ◆ Learn the selectivity and invariance computations from images, using as few assumptions as possible. ◆ Learning ≡ fitting a probability density function

University of Helsinki - p. 3

Data and preprocessing ■ ● Motivation ● Research question ● Data

(Torralba et al, TPAMI 2008)

● Architecture ● Learning ● Results

Tiny images dataset, converted to gray scale: complete scenes downsampled to 32 by 32 images



● Summary



Preprocessing: ◆ Removing DC component ◆ Normalizing norm after whitening ◆ Reducing the dimension from 32 · 32 = 1024 to 200 Preprocessing can be considered a form of luminance and contrast gain control, followed by low-pass filtering.

Examples from the tiny images dataset before preprocessing.

Michael U. Gutmann

University of Helsinki - p. 4

Feature extraction architecture ■ ● Motivation



Let x ∈ R200 be a vectorized image after preprocessing. Feature extraction with three processing layers:

● Research question

(1)

● Data

yi

● Architecture ● Learning

(1) T

= wi

● Results ● Summary

(2)

yk = fth

x # ! " 100 X (2) (1) (2) wki (yi )2 + 1 + bk ln

i = 1 . . . 100 k = 1 . . . 50

i=1

˜ (2) = gain control(y(2) ) y   (3) (3) T (2) (3) ˜ + bj yj = fth wj y

j = 1 . . . n(3)

Thresholding function fth (u): smooth version of max(u, 0) Gain control: centering, normalizing the norm after whitening, dimension reduction (similar to the preprocessing) ■

(1)

Parameters of interest: feature vectors wi , pooling weights (3) (2) wki ≥ 0, higher-order feature vectors wj (2)

Other parameters: the thresholds bk Michael U. Gutmann

(3)

and bk

University of Helsinki - p. 5

Learning ■ ● Motivation ● Research question



● Data ● Architecture ● Learning ● Results ● Summary

First, learn the parameters of layers one and two. Keeping them fixed, learn the parameters of layer three. For layer one and two, fit the pdf # " 50 X (2) (1) (2) (2) p(x; wi , wki , bk ) ∝ exp yk . | {z } k=1 Parameters



For layer three, fit the pdf





n X



(3) (3) (3) yj  . p(x; wj , bj ) ∝ exp  | {z } j=1 Parameters



(3)

Basic idea: the overall activity of the feature outputs determines how probable the input is. We do not know the partition functions: Likelihood is intractable. Use noise-contrastive estimation for the fitting. (Gutmann and Hyvärinen, JMLR2012)

Michael U. Gutmann

University of Helsinki - p. 6

Noise-contrastive estimation (Gutmann and Hyvärinen, JMLR2012) ● Motivation



● Research question ● Data ● Architecture

(1)

(2)

(2)

(3)

(3)

Here: pθ (x) = p(x; wi , wki , bk ) or pθ (x) = p(x; wj , bj )

● Learning ● Results ● Summary

Purpose: learn parameters θ of a pdf pθ when you do not know the partition function.





Intuition: Learn the differences between the data and auxiliary “noise” whose properties you know. Deduce from the differences the properties of the observed data. More concrete: 1. Choose a random variable z with known pdf pz where sampling is easy. Here: Uniform distribution in the sphere where the data is defined



Michael U. Gutmann

2. Obtain an auxiliary sample of z (“noise”). 3. Perform logistic regression on the data and the auxiliary “noise”; use the ratio pθ /pz in the regression function. The procedure provides a consistent estimator of θ.

University of Helsinki - p. 7

Results, layers one and two (1)

(2)

The wi are Gabor-like, the wki are sparse (94.5%: < 10−6 ; 5.1%: > 10) Mostly complex-cell like pooling (2)

Each row corresponds to a different yk (1) wi

(2) yk



= fth ln

hP 100

(2) (1) T x)2 i=1 wki (wi

i

+1 +

(2)

wki

Subset of the features and their icons

Michael U. Gutmann

All the learned features for layer one and two

University of Helsinki - p. 8

(2) bk



Results, layer three Features with enhanced selectivity to orientation and space. Horizontal Vertical Diagonal

Horizontal Vertical Diagonal (3)

(3)

w6

w1

˜ (2) = gain control(y(2) ) y   (3) (3) T (2) (3) ˜ yj = fth wj y + bj

(3)

(3)

w2

w7

(3) w3

(3) w8

(3) w4

(3) w9

(3)

k-th element of wj (2)

Activity of yk

is detected. Corre-

sponding icon is colored in red. (3)

k-th element of wj (2)

(3) w10

(3) w5 (3)

Complete set of wj

Michael U. Gutmann

is positive:

Inactivity of yk

is negative:

is detected. Cor-

responding icon is colored in blue.

for n(3) = 10. See paper for n(3) = 100.

University of Helsinki - p. 9

Results, layer three Descriptors of overall image properties? Images giving maximal activation

Features

Images giving minimal activation

Feature outputs were computed for 10000 randomly chosen tiny images.

Michael U. Gutmann

University of Helsinki - p. 10

Summary

● Motivation ● Research question ● Data ● Architecture ● Learning ● Results ● Summary

Michael U. Gutmann

Selectivity and invariance/tolerance are important for any feature extraction system. ■ Question asked: In a system with three processing layers, what should be selected and tolerated at each level of the hierarchy? ■ Looked for an answer by fitting probabilistic models to images: → First layer: Selectivity to Gabor-like image structure → Second layer: Tolerance to exact orientation or localization of the stimulus (“complex-cells”) → Third layer: Enhanced selectivity to orientation and/or location of the stimulus ■

University of Helsinki - p. 11

Learning a selectivity–invariance–selectivity feature ...

University of Helsinki - p. 1. Learning a selectivity–invariance–selectivity .... them fixed, learn the parameters of layer three. □ For layer one and two, fit the pdf.

227KB Sizes 0 Downloads 30 Views

Recommend Documents

Learning a Selectivity-Invariance-Selectivity Feature Extraction ...
Since we are interested in modeling spatial features, we removed the DC component from the images and normalized them to unit norm before the learning of the features. We compute the norm of the images after. PCA-based whitening. Unlike the norm befo

A Feature Learning and Object Recognition Framework for ... - arXiv
systematic methods proposed to determine the criteria of decision making. Since objects can be naturally categorized into higher groupings of classes based on ...

A baseline feature set for learning rhetorical zones using
Rhetorical zone analysis is an application of natural lan- guage processing in .... Support for recognizing specific types of information and ex- ploiting discourse ...

A Feature Learning and Object Recognition Framework for ... - arXiv
K. Williams is with the Alaska Fisheries Science Center, National Oceanic ... investigated in image processing and computer vision .... associate set of classes with belief functions. ...... of the images are noisy and with substantial degree of.

Transfer learning on convolutional activation feature ... - SAGE Journals
systems for hollowness assessment of large complex com- posite structures. ... Wilk's λ analysis is used for biscuit's crack inspection, and the approach .... layers, data with the ROIs are passed to fast R-CNN object detection network, where the ..

Exploiting Feature Hierarchy for Transfer Learning in ...
lated target domain, where the two domains' data are ... task learning, where two domains may be re- lated ... train and test articles written by the same author to.

Reading Digits in Natural Images with Unsupervised Feature Learning
Reliably recognizing characters in more complex scenes like ... As a result systems based on hand-engineered representations perform far worse on this task ...

Learning Homophily Couplings from Non-IID Data for Joint Feature ...
Learning Homophily Couplings from Non-IID Data for. Joint Feature Selection and Noise-Resilient Outlier Detection ... In recent applications such as insider trad- ing, network intrusion detection and fraud detection, a key task is to ...... ACM Trans

Saliency-Guided Unsupervised Feature Learning For Scene ieee.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Saliency-Guided ...

Feature Infusing Active Learning into the Large ...
large-enrollment class, particularly in research universities, are still with us (Edgerton, 2001) and are not likely to change ... of small-enrollment lab and discussion sections, or of virtual environments such as electronic bulletin boards and ....