Machine learning
Problem
Self Taught Learning
Self taught Learning Based on the paper ”Self-taught learning: Transfer Learning from Unlabeled Data” Prateekshit Pandey BTech CSE, 2nd year
February 7, 2014
Final Note
Machine learning
Problem
Self Taught Learning
Final Note
Introduction
What is machine learning?
Learning covers a broad range of categories: experiencing, gaining knowledge, understanding etc. No fixed definition, but this is what wiki says: ”Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data.” Seperate disciplines which have contributed to ML: Statistics Brain models Psychological models Evolutionary models and many more...
Machine learning
Problem
Self Taught Learning
Computational model of brain
McCulloch-Pitt’s neuron Also called linear threshold gate. Introduced by Warren McCulloch and Walter Pitts in 1943. A set of inputs I1 , I2 , I3 , ..., Im and output y which is binary. Mathematically: sum =
N X
Ii Wi
i=1
y = g (sum) g is a linear step function at threshold T
Figure 1 : McCulloch-Pitt’s neuron model
Final Note
Machine learning
Problem
Self Taught Learning
Computational model of brain
Neural Networks A set of such neuron aligned together to form a network. Layer: a set of neurons working on the same set of inputs.
Figure 2 : 3 Layered Neural Net
Each layer is calculated one after the other. Learning done through various ways, largely regression.
Final Note
Machine learning
Problem
Self Taught Learning
Computational model of brain
Analogy to biological model
Figure 3 : Biological and Artificial Neural Nets
Final Note
Machine learning
Problem
Self Taught Learning
Real world problems in ML
Real world problems
Perform speaker identification, provided unlimited access to natural sounds Perform classification of elephants and rhinos, provided unlimited access to natural images Perform email foldering of ICML reviewing emails and NIPS reviewing emails, provided unlimited access to news articles (text). Conclusion: always a mix of labeled and unlabeled data.
Final Note
Machine learning
Problem
Self Taught Learning
Real world problems in ML
Problems faced
Labeled data: difficult and expensive to obtain. Small data set bad for generalization. Unlabled data: expensive to find unlabeled data of desired domain. Motivation: exploit the abundance of unlabeled data to generalize over a larger scale of data.
Final Note
Machine learning
Problem
Self Taught Learning
Final Note
Motivation
Previous algorithms and their shortcomings
Supervised learning: works perfectly well only in abundance of labeled data. Semi-supervised learning: assumes that the unlabeled data can be labeled with the same labels as the classification task. Transfer learning: requires additional labeled data. Idea: Transfer knowledge from unalabeled data.
Machine learning
Problem
Self Taught Learning
Motivation
Advantages and further motivations
Use unlabeled data (from the same domain) without any restrictions. More accurately reflects how humans may learn, since much of human learning is believed to be from unlabeled data.
Final Note
Machine learning
Problem
Self Taught Learning
Problem Formalism
Problem formalisation
Number of classes to classify data: C A set of m labeled examples: (1) (2) (m) (i) {(xl , y (1) ), (xl , y (2) ), ..., (xl , y (m) )} where xl R n and (i) y {1, 2, ..., C } (1)
(2)
(k)
A set of k unlabeled example: {xu , xu , ..., xu } where (i) xu R n The learning algorithm outputs a hypothesis h : R n → {1, 2, ..., C } The hypothesis function tries to mimic the input-output relationship represented by the labeled training data.
Final Note
Machine learning
Problem
Self Taught Learning
Final Note
A sample approach
Learning high level features - I
Using large unlabeled data to learn a higher level, more succint representation of the inputs. Data elements tend to be internally correlated. Applying learned representation to the labeled data, we obtain a higher level representation of the data. Makes the task of supervised learning much easier.
Machine learning
Problem
Self Taught Learning
Final Note
A sample approach
Learning high level features - II
Following a modified version of sparse coding by Olshausen & Field (1996). Optimization objective: P P (i) (i) minimizeb,a ki ||xu − j aj bj ||22 + β||a(i) ||1 Number of bases: s Basis: b = {b1 , b2 , ..., bs }; bj R n (i) Activations: a = {a(1) , a(2) , ..., a(k) ; aj R s } The number of bases s can be much larger than s.
Machine learning
Problem
Self Taught Learning
Final Note
A sample approach
Learning high level features - III
The optimization objective balances two terms: (i)
The first quadratic term pushes each xu to be reconstructed well as a weighted linear combination of the bases. It encourages the activation to have a low L1 norm, thus encouraging the activations to be sparse.
Machine learning
Problem
Self Taught Learning
Autoencoders
Alternate way: Autoencoders An autoencoder neural network is an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs.
Figure 4 : Autoencoder
Final Note
Machine learning
Problem
Self Taught Learning
Final Note
Autoencoders
Autoencoders: Overview
The autoencoder tries to learn a function hW ,b (x) ≈ x. By limiting the number of hidden units, we can discover interesting structure about the data. Compressed representation: If the number of layers in the middle layer is less than the input Sparse representation: If the number of layers in the middle layer is more than the input
Machine learning
Problem
Self Taught Learning
Final Note
Algorithm
How it works
By backpropagation, the error for the middle layer, or encoding by: Player, will be given (2) (3) (2) s2 0 (z (2) ) W δ δi = f j=1 ji j i We introduce ha term: i (2) 1 Pm ρˆj = m i=1 aj (x (i) ) and enforce ρˆj = ρ where ρ is the sparsity parameter ans is kept near to zero (0.05). Because of the sparsity parameter, a penalty term is added to the error for the layer: calculation encoding Ps2 (2) (2) (3) (2) ρ 1−ρ δi = + β − ρˆi + 1−ˆ f 0 (zi ) j=1 Wji δj ρi
Machine learning
Problem
Self Taught Learning
Algorithm
Unsupervised Feature Construction
After learning a set of bases b from the unlabeled data, we (i) compute the features aˆ(xl ) for the labeled data. We do this by solving the following optimization problem: P (i) (i) (i) aˆ(xl ) = argmina(i) ||xl − j aj bj ||22 + β||a(i) ||1 Because of the L1 regularization term, we obtain a sparse representation of the labeled data.
Final Note
Machine learning
Problem
Self Taught Learning
Final Note
Algorithm
Algorithm: Self-taught learning via Sparse Coding inputLabeled training set (1) (1) (2) (2) (m) (m) = {(xl , yl ), (xl , yl ), ..., (xl , yl )} (1) (2) (k) Unlabeled data {xu , xu , ..., xu } output Learned classifier for the classification task. (i) algorithm Using unlabeled data {xu }, solve the optimization problem to obtain bases b. Compute features for the classification (i) task to obtain a new labeled training set Tˆ = {(ˆ a(xl ), y (i) )m i=1 }, where P (i) (i) (i) aˆ(xl ) = argmina(i) ||xl − j aj bj ||22 + β||a(i) ||1 Learn a classifier by applying a supervised learning algorithm (eg. SVM) to the labeled training set Tˆ . result the learned classifier C.
Machine learning
Problem
Self Taught Learning
Final Note
Algorithm
Comparison with other methods
Principal Component Analysis (PCA): identifies a lower dimensional subspace of maximal variation within the unlabeled data. PCA seems convenient, but it has some limitations: (i)
PCA results in a linear feature extraction; features aj are simply a linear function of the input. PCA assumes bases to be orthogonal, hence the number of PCA features cannot exceed n.
Machine learning
Problem
Self Taught Learning
Final note
Open problem
Self-taught learning empirically leads to signicant gains in a large variety of domains. Important theoretical open question: charecterizing how ”similarity” between unlabeled and labeled data affects the performance of self-taught learning.
Final Note
Machine learning
Problem
Self Taught Learning
Final note
Thank You
Final Note