Self taught Learning - - P.PDFKUL.COM

Viewer
Transcript

Machine learning

Problem

Self Taught Learning

Self taught Learning Based on the paper ”Self-taught learning: Transfer Learning from Unlabeled Data” Prateekshit Pandey BTech CSE, 2nd year

February 7, 2014

Final Note

Machine learning

Problem

Self Taught Learning

Final Note

Introduction

What is machine learning?

Learning covers a broad range of categories: experiencing, gaining knowledge, understanding etc. No fixed definition, but this is what wiki says: ”Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data.” Seperate disciplines which have contributed to ML: Statistics Brain models Psychological models Evolutionary models and many more...

Machine learning

Problem

Self Taught Learning

Computational model of brain

McCulloch-Pitt’s neuron Also called linear threshold gate. Introduced by Warren McCulloch and Walter Pitts in 1943. A set of inputs I1 , I2 , I3 , ..., Im and output y which is binary. Mathematically: sum =

N X

Ii Wi

i=1

y = g (sum) g is a linear step function at threshold T

Figure 1 : McCulloch-Pitt’s neuron model

Final Note

Machine learning

Problem

Self Taught Learning

Computational model of brain

Neural Networks A set of such neuron aligned together to form a network. Layer: a set of neurons working on the same set of inputs.

Figure 2 : 3 Layered Neural Net

Each layer is calculated one after the other. Learning done through various ways, largely regression.

Final Note

Machine learning

Problem

Self Taught Learning

Computational model of brain

Analogy to biological model

Figure 3 : Biological and Artificial Neural Nets

Final Note

Machine learning

Problem

Self Taught Learning

Real world problems in ML

Real world problems

Perform speaker identification, provided unlimited access to natural sounds Perform classification of elephants and rhinos, provided unlimited access to natural images Perform email foldering of ICML reviewing emails and NIPS reviewing emails, provided unlimited access to news articles (text). Conclusion: always a mix of labeled and unlabeled data.

Final Note

Machine learning

Problem

Self Taught Learning

Real world problems in ML

Problems faced

Labeled data: difficult and expensive to obtain. Small data set bad for generalization. Unlabled data: expensive to find unlabeled data of desired domain. Motivation: exploit the abundance of unlabeled data to generalize over a larger scale of data.

Final Note

Machine learning

Problem

Self Taught Learning

Final Note

Motivation

Previous algorithms and their shortcomings

Supervised learning: works perfectly well only in abundance of labeled data. Semi-supervised learning: assumes that the unlabeled data can be labeled with the same labels as the classification task. Transfer learning: requires additional labeled data. Idea: Transfer knowledge from unalabeled data.

Machine learning

Problem

Self Taught Learning

Motivation

Advantages and further motivations

Use unlabeled data (from the same domain) without any restrictions. More accurately reflects how humans may learn, since much of human learning is believed to be from unlabeled data.

Final Note

Machine learning

Problem

Self Taught Learning

Problem Formalism

Problem formalisation

Number of classes to classify data: C A set of m labeled examples: (1) (2) (m) (i) {(xl , y (1) ), (xl , y (2) ), ..., (xl , y (m) )} where xl R n and (i) y {1, 2, ..., C } (1)

(2)

(k)

A set of k unlabeled example: {xu , xu , ..., xu } where (i) xu R n The learning algorithm outputs a hypothesis h : R n → {1, 2, ..., C } The hypothesis function tries to mimic the input-output relationship represented by the labeled training data.

Final Note

Machine learning

Problem

Self Taught Learning

Final Note

A sample approach

Learning high level features - I

Using large unlabeled data to learn a higher level, more succint representation of the inputs. Data elements tend to be internally correlated. Applying learned representation to the labeled data, we obtain a higher level representation of the data. Makes the task of supervised learning much easier.

Machine learning

Problem

Self Taught Learning

Final Note

A sample approach

Learning high level features - II

Following a modified version of sparse coding by Olshausen & Field (1996). Optimization objective: P P (i) (i) minimizeb,a ki ||xu − j aj bj ||22 + β||a(i) ||1 Number of bases: s Basis: b = {b1 , b2 , ..., bs }; bj R n (i) Activations: a = {a(1) , a(2) , ..., a(k) ; aj R s } The number of bases s can be much larger than s.

Machine learning

Problem

Self Taught Learning

Final Note

A sample approach

Learning high level features - III

The optimization objective balances two terms: (i)

The first quadratic term pushes each xu to be reconstructed well as a weighted linear combination of the bases. It encourages the activation to have a low L1 norm, thus encouraging the activations to be sparse.

Machine learning

Problem

Self Taught Learning

Autoencoders

Alternate way: Autoencoders An autoencoder neural network is an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs.

Figure 4 : Autoencoder

Final Note

Machine learning

Problem

Self Taught Learning

Final Note

Autoencoders

Autoencoders: Overview

The autoencoder tries to learn a function hW ,b (x) ≈ x. By limiting the number of hidden units, we can discover interesting structure about the data. Compressed representation: If the number of layers in the middle layer is less than the input Sparse representation: If the number of layers in the middle layer is more than the input

Machine learning

Problem

Self Taught Learning

Final Note

Algorithm

How it works

By backpropagation, the error for the middle layer, or encoding by: Player, will be given (2) (3) (2) s2 0 (z (2) ) W δ δi = f j=1 ji j i We introduce ha term: i (2) 1 Pm ρˆj = m i=1 aj (x (i) ) and enforce ρˆj = ρ where ρ is the sparsity parameter ans is kept near to zero (0.05). Because of the sparsity parameter, a penalty term is added to the error for the layer: calculation encoding Ps2 (2) (2) (3) (2) ρ 1−ρ δi = + β − ρˆi + 1−ˆ f 0 (zi ) j=1 Wji δj ρi

Machine learning

Problem

Self Taught Learning

Algorithm

Unsupervised Feature Construction

After learning a set of bases b from the unlabeled data, we (i) compute the features aˆ(xl ) for the labeled data. We do this by solving the following optimization problem: P (i) (i) (i) aˆ(xl ) = argmina(i) ||xl − j aj bj ||22 + β||a(i) ||1 Because of the L1 regularization term, we obtain a sparse representation of the labeled data.

Final Note

Machine learning

Problem

Self Taught Learning

Final Note

Algorithm

Algorithm: Self-taught learning via Sparse Coding inputLabeled training set (1) (1) (2) (2) (m) (m) = {(xl , yl ), (xl , yl ), ..., (xl , yl )} (1) (2) (k) Unlabeled data {xu , xu , ..., xu } output Learned classifier for the classification task. (i) algorithm Using unlabeled data {xu }, solve the optimization problem to obtain bases b. Compute features for the classification (i) task to obtain a new labeled training set Tˆ = {(ˆ a(xl ), y (i) )m i=1 }, where P (i) (i) (i) aˆ(xl ) = argmina(i) ||xl − j aj bj ||22 + β||a(i) ||1 Learn a classifier by applying a supervised learning algorithm (eg. SVM) to the labeled training set Tˆ . result the learned classifier C.

Machine learning

Problem

Self Taught Learning

Final Note

Algorithm

Comparison with other methods

Principal Component Analysis (PCA): identifies a lower dimensional subspace of maximal variation within the unlabeled data. PCA seems convenient, but it has some limitations: (i)

PCA results in a linear feature extraction; features aj are simply a linear function of the input. PCA assumes bases to be orthogonal, hence the number of PCA features cannot exceed n.

Machine learning

Problem

Self Taught Learning

Final note

Open problem

Self-taught learning empirically leads to signicant gains in a large variety of domains. Important theoretical open question: charecterizing how ”similarity” between unlabeled and labeled data affects the performance of self-taught learning.

Final Note

Machine learning

Problem

Self Taught Learning

Final note

Thank You

Final Note

Self taught Learning -

Feb 7, 2014 - Perform email foldering of ICML reviewing emails and NIPS reviewing emails ... Following a modified version of sparse coding by Olshausen &.

Download PDF

506KB Sizes 4 Downloads 150 Views

Report

Self taught Learning -

Recommend Documents