RBMs and autoencoders

Hopfield network • Content-addressable memory • The goal is to memorize the training dataset • Core idea: – Training Store some patterns into the network (set weights accordingly) – Inference Show the net a corrupted pattern, the net reconstructs the original one.

Examples Stored patterns

Operation – pattern reconstruction

Anatomy of a Hopfield net No self-loops 𝑤𝑖𝑖 = 0

Inference: Let 𝑥 be the state of all neurons 1. While not converged: 1. Pick a neuron 𝑘 at random 2. Set 𝑋𝑘 = 𝑠𝑖𝑔𝑛(𝑊𝑘: 𝑋)

Symmetrical, bidirectional weights 𝑤𝑖𝑗 = 𝑤𝑗𝑖

Trainig: Idea: „fire together wire together” 𝑊 = 𝑋𝑋 𝑇 𝑊𝑖𝑖 = 0 where 𝑋 = [𝑥 1 , … , 𝑥 (2) ]. Note: can also train via SGD on energies!

Core concept: Energy! Each state 𝑥 of the net is assiciated a scalar value called energy: 𝐸 = −1/2 𝑥 𝑇 𝑊𝑥 Stored patterns correspond to the states with lowest energy. Inference searches for them! Also note that inference has to stop, because the energy is bounded from below and monotonically decreasing during inference.

A few things to remember • What is the energy function? • Why do we want symmetric weights? • Why do we must update neurons one by one? • If in doubt: see MacKay (http://www.inference.phy.cam.ac.uk/itprnn/ book.html) chapter 42 & 43!

How many patterns can we store? • The capacity of a hopfield net is limited. – Two nearby energy minimas can merge to create a new one.

• There are attractors that are not training patterns – a linear combination of patterns is likely to be an attractor too

• How to remove spurious atractors? – Unlearn them! Start with a random input. Run inference. Then change the weights to make the resulting attractor more energetic. – In a way, the network dreams of patterns and forgets them…

Input-output for Hopfield net • Divide the neurons into two sets: visible and hidden: Hidden units and connections Visible-to-hidden connections

Visible units and connections

• Set the visible units to a pattern. Set the hidden units randomly • Run inference only on the hidden units • In the end, the state of the hidden units will be an explanation for the visible ones! • Q: how to train hidden weights? Q: how to escape poor minimas during inference?

Probabilistic Hopfield net (aka Boltzmann Machine) • Change the inference rule: – From: Set 𝑋𝑘 = 𝑠𝑖𝑔𝑛(𝑊𝑘: 𝑋) 1 with probability 𝜎(2𝛽𝑊𝑘: 𝑋) – To: Sample 𝑋𝑘 = −1 with probability (1−𝜎(2𝛽𝑊𝑘: 𝑋)

• Thus Inference is a random walk through neuron activations • Still some configurations are more probable than other. After a vary long time (equilibrium), the probability of a configuration is: 1 𝑇 𝛽 𝑒 −𝛽𝐸(𝑥) 𝑒 2𝑥 𝑊𝑥 𝑃 𝑥|𝑊 = = 𝑍(𝑊) 𝑍(𝑊) where 𝐸 𝑥 = −1/2 𝑥 𝑇 𝑊𝑥 is the energy, 𝛽 is the inverse temperature (it smooths out probabilities), and 𝑍(𝑊) is the normalization constant also called the partition function

Boltzmann machine example • What is the conditional probability of a neuron being 1, given all other neurons? 𝑒 −𝛽𝐸(𝑥|𝑥𝑘=1) 𝑃 𝑥𝑘 = 1 𝑥¬𝑘 , 𝑊 = −𝛽𝐸(𝑥| = −𝛽𝐸(𝑥|𝑥𝑘 =−1 ) 𝑥𝑘 =1 ) 𝑒 +𝑒 1 = = 𝜎 2𝛽𝑊𝑘: 𝑥 −2𝛽𝑊 𝑥 𝑘: 1+𝑒 • The energy function properly defines the inference procedure (it is technically called Gibbs sampling). • We now see that the inference is just sampling until we’re bored one neuron conditioned on the other ones!

How to train a Boltzmann machine

• The Boltzmann machine gives probabilities • Train by maximizing log-likelihood! • After a few transformations we get (that’s a nice exercise): 𝜕 ln 𝜕𝑤𝑖𝑗

𝑁

𝑃(𝑥 𝑛=1

𝑛

(𝑛) (𝑛)

|𝑊) =

𝑥𝑖 𝑥𝑗

− 𝔼𝑃(𝑥|𝑊) [𝑥𝑖 𝑥𝑗 ] =

𝑛

= 𝑁 𝔼𝑥~Data 𝑥𝑖 𝑥𝑗 − 𝔼𝑃

𝑥𝑊

𝑥𝑖 𝑥𝑗

• The gradient has two terms: – The Hebbian term 𝔼𝑥~Data 𝑥𝑖 𝑥𝑗 that moves the weights towards correlations in the data This step “pulls down” the energy of data samples! – The un-learnig term 𝔼𝑃 𝑥 𝑊 𝑥𝑖 𝑥𝑗 that moves the weights away from correlations when the net is “dreaming” This step “pulls up” the energy of spurious attractors!

What about hidden units? 𝜕 ln 𝜕𝑤𝑖𝑗

𝑁

𝑃(𝑥

𝑛

|𝑊) = 𝑁 𝔼𝑥~Data 𝑥𝑖 𝑥𝑗 − 𝔼𝑃

𝑥𝑊

𝑥𝑖 𝑥𝑗

𝑛=1

Now we know how to learn hidden weights: • To compute 𝔼𝑥~Data 𝑥𝑖 𝑥𝑗 set the visible units to a data pattern, run a random walk over hidden units and observe the correlations • To compute 𝔼𝑃 𝑥 𝑊 𝑥𝑖 𝑥𝑗 run a random walk over all units and observe the correlations This is a very nice learning rule – the unit only looks at what its neighbors are doing (no weird backprop) and either strengthens the connection to mimics its neighbors, or during un-learning weakens the connection. That’s sort of a two-phase awake-sleeping operation.

NB: the name wake-sleep algorithm is used for a similar model, the Helmholtz machine

Restricted Boltzmann Machine Visible

Hidden

• Boltzmann learning with hidden units is slow, because each gradient evaluation requires two long Gibbs samplings (random walks). • Idea: forbid visible-to-visible and hidden-to-hidden connections! • Notice how: – hiddens are independent conditioned on the visibles, – visibles independent conditioned on hiddens

• Only need to run one Gibbs sampling (and it can be trincated to just a few steps) -> Fast training!

RBM stacking: The coolest idea of 2006

Article: https://www.cs.toronto.edu/~hinton/science.pdf Demo: http://www.cs.toronto.edu/~hinton/adi/index.htm

Intermezzo: autoencoding net

• Idea: train a net that pushes the data through a bottleneck (e.g. small layer, sparse layer…) • Prevent from learning the identity – sparsity, noise, other constraints. • Use the code for further tasks…

The Deep Learning revolution (2006-today) • Learn a hierarchy of features by greedily pretraining RMBs or autoencoders! • Combine together to initialize a deep net • Fine-tune via backprop. • Core idea: discover a hierarchy of transformations, yielding more and more abstract features! • Note: with the advances in training deep nets, this layerwise pre-training is not necessary to train deep nets. But good to know about it!

VAE: Variational auto-encoder

http://arxiv.org/abs/1312.6114

Encoder: Decoder: 𝑞 𝑧 𝑥 = 𝒩(𝜇, 𝜎) 𝑝 𝑧 and 𝑝 𝑥 𝑧

– learn a generator network (from some noise input, generate a data sample) – Learn an approximate inference network (for a sample, learn the noise input that would recreate it)

Injected noise

• Simplified idea:

GAN: Generative Adversarial Net – autoencoder a’rebours

http://arxiv.org/abs/1406.2661 http://www.cs.toronto.edu/~dtarlow/pos14/talks/goodfellow.pdf

VAE + GAN

Ladder network Problem with autoencoders: • We want a hierarchy of more and more abstract features. • But we also want perfect reconstruction. • Thus all information has to be carried all the way! Hard to abstract when you must remember! • Introduce “horizontal” connections that capture details • Inject noise • And done! http://arxiv.org/abs/1507.02672

RBMs and autoencoders

Hopfield network. • Content-addressable memory. • The goal is to memorize the training dataset. • Core idea: – Training. Store some patterns into the network (set weights accordingly). – Inference. Show the net a corrupted pattern, the net reconstructs the original one.

1010KB Sizes 2 Downloads 121 Views

Recommend Documents

Adversarial Images for Variational Autoencoders
... posterior are normal distributions, their KL divergence has analytic form [13]. .... Our solution was instead to forgo a single choice for C, and analyze the.

2017-2018 RBMS 8th Grade Supply List.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. 2017-2018 RBMS 8th Grade Supply List.pdf. 2017-2018 RBMS 8th Grade Supply List.pdf. Open. Extract. Open with

2017_2018 RBMS 7th Grade Supply List.pdf
4 boxes of tissues ... Advanced Theatre Arts: One 1.5-2 inch binder, pencils and a package of post it flags. ... 2017_2018 RBMS 7th Grade Supply List.pdf.

Sparse Autoencoders for Word Decoding from ...
1 Department of Computer Science, Dartmouth College, Hanover, NH 03755 ... system that could join feature extraction and decoding into one powerful joint ... One such an algorithm is an artificial neural network variant called a Sparse ...

Gas and electric residential and
Feb 8, 2016 - Gas and electric residential and business rebate programs continue in ... in the energy needs of our customers, and Vectren is committed to.

Gas and electric residential and
Feb 8, 2016 - Vectren's energy efficiency programs achieve record energy ... Gas and electric residential and business rebate programs continue in 2016.

pdf-1267\historic-homes-and-institutions-and-genealogical-and ...
... the apps below to open or edit this item. pdf-1267\historic-homes-and-institutions-and-genealogi ... ty-massachusetts-with-a-history-of-worcester-socie.pdf.

Performance of Amplify-and-Forward and Decode-and ...
Current broadband wireless networks are characterized by large cell ... Partnership Project (3GPP) to fulfill 4G requirements specified by the International ... enhancements in radio link technology will not solve the basic problem related to ...

!051018 - Planning and Service Coordination and Finance and ...
Donald Morandini, Chair; Freman Hendrix; Robert Luce; Chuck Moss; Alma Smith ... and Fina ... et Joint Committee Meeting Packet - May 10, 2018.pdf !051018 ...

Epilogue War and its Consequences and Hamilton and Jefferson ...
Epilogue War and its Consequences and Hamilton and Jefferson Chart.pdf. Epilogue War and its Consequences and Hamilton and Jefferson Chart.pdf. Open.Missing:

pdf-1316\crickets-and-bullfrogs-and-whispers-and ...
... apps below to open or edit this item. pdf-1316\crickets-and-bullfrogs-and-whispers-and-thun ... pictures-by-harry-and-hopkins-lee-bennett-ed-behn.pdf.

ATM and CAVs-Operational and User Information Challenges and ...
ATM and CAVs-Operational and User Information Challenges and Opportunities.pdf. ATM and CAVs-Operational and User Information Challenges and ...

Mission and Vision Statements and goals English and Spanish.pdf ...
KEMENTERIAN PELAJARAN MALAYSIA. PENTAKSIRAN DIAGNOSTIK AKADEMIK SBP 2012. PERCUBAAN PENILAIAN MENENGAH RENDAH. SCIENCE.

Judiciary and micro and small enterprises: Impact and ...
that will make possible a good economic performance. Several .... During the past decade, municipal and rural savings and loans were .... The 25 - 45 year-old age bracket stands out, as it accounts for over 40% of the micro and small ... Gradually, t

Enhance Security and Usability Security and Usability Security and ...
Even though graphical passwords are difficult to guess and break, if someone direct observe during the password enter sessions, he/she probably figure out the password by guessing it randomly. Nevertheless, the issue of how to design the authenticati

Microformats and Microdata and SEO - GitHub
Microformats and Microdata and SEO. Making the Web a ... This document was written in Markdown then saved into a PDF using Marked. You can download ...

Organizational Report and Election Campaign Contributions and ...
Organizational Report and Election Campaign Contributions and Expenditures Manual .pdf. Organizational Report and Election Campaign Contributions and ...

Neuroscience and education - myths and messages.pdf ...
Page 1 of 8. Imagine having a brain that is only 10%. active, that shrinks when you drink less. than 6 to 8 glasses of water a day and that. increases its interhemispheric connectivity. when you rub two invisible buttons on your. chest. For neuroscie