EECS542 Presentation Recurrent Neural Networks Part I Xinchen Yan, Brian Wang Sept. 18, 2014

Outline • Temporal Data Processing • Hidden Markov Model • Recurrent Neural Networks • Long Short-Term Memory

Temporal Data Processing • Speech Recognition • Object Tracking • Activity Recognition • Pose Estimation

Figure credit: Feng

Sequence Labeling • Sequence Classification 1 (“verb”)

• Segment Classification (hamming distance) swim

• Temporal Classification (edit distance) swim

Example: The Dishonest Casino • A casino has two dice: • Fair dice 1 Pr 𝑋 = 𝑘 = , 𝑓𝑜𝑟 1 ≤ 𝑘 ≤ 6 6 • Loaded dice 1 Pr 𝑋 = 𝑘 = , 𝑓𝑜𝑟 1 ≤ 𝑘 ≤ 5 10 1 Pr 𝑋 = 6 = 2 Casino player switches back-&-forth between fair and loaded dice once every 20 turns Slide credit: Eric Xing

Example: The Dishonest Casino • Given: A sequence of rolls by the casino player 124552656214614613613666166466 …. • Questions: • How likely is this sequence, given our model of how the casino works? (Evaluation) • What portion of the sequence was generated with the fair die, and what portion with the loaded die? (Decoding) • How “loaded” is the loaded die? How “fair” is the fair die? How often does the casino player change from fair to loaded, and back? (Learning) Slide credit: Eric Xing

time 

output

output

• Hidden State: 𝐻𝑡 • Output: 𝑂𝑡 • Transition Prob between two states 𝑃(𝐻𝑡 = 𝑘|𝐻𝑡−1 = 𝑗) • Start Prob 𝑃(𝐻1 = 𝑗) • Emission Prob associated with each state 𝑃(𝑂𝑡 = 𝑖|𝐻𝑡 = 𝑗)

output

Hidden Markov Model

Hidden Markov Model (Generative Model)

𝑂𝑡

𝐻𝑡

Slide credit: Eric Xing

Example: dice rolling vs. speech signal • Output sequence: Discrete value

• Output sequence: Continuous data

• Limited number of states

• Syntax, semantics, accent,

rate, volume, and etc. • Temporal Segmentation

Limitations of HMM • Modeling continuous data • Long-term dependencies • Can only remember log(N) bits about what it generated so far.

Slide credit: G. Hinton

Feed-Forward NN vs. Recurrent NN

• “piped” vs. cyclic • Function vs. dynamic system

Definition: Recurrent Neural Networks • • • • •

Observation/Input vector: 𝑥𝑡 Hidden state vector: ℎ𝑡 Output vector: 𝑦𝑡 Weight Matrices Input-Hidden Weights: 𝑊𝐼 Hidden-Hidden Weights: 𝑊𝐻 Hidden-Output Weights: 𝑊𝑂 Updating Rules ℎ𝑡 = 𝜎(𝑊𝐼 𝑥𝑡 + 𝑊𝐻 ℎ𝑡−1 + 𝑏) 𝑦𝑡 = 𝑊𝑂 ℎ𝑡

ℎ𝑡 𝑥𝑡

𝑦𝑡

Unfolded RNN: Shared Weights

Recurrent Neural Networks

time 

output

output

output

hidden

hidden

hidden

input

input

input

• Power of RNN: - Distributed hidden units - Non-linear dynamics ℎ𝑡 = 𝜎(𝑊𝐼 𝑥𝑡 + 𝑊𝐻 ℎ𝑡−1 + 𝑏) • Quote from Hinton

Providing input to recurrent networks

• Specify the initial states of all the units. • Specify the initial states of a subset of the units. • Specify the states of the same subset of the units at every time step.

w1

w3 w4 w2 

• We can specify inputs in several ways:

w1

w3 w4

w2

w1

w3 w4

w2

time

• This is the natural way to model most sequential data.

Slide credit: G. Hinton

Teaching signals for recurrent networks • We can specify targets in several ways:

• Specify desired final activities of all the units • Specify desired activities of all units for the last few steps • Good for learning attractors • It is easy to add in extra error derivatives as we backpropagate.

• Specify the desired activity of a subset of the units.

w1

w3 w4 w2

w1

w3 w4

w2

w1

w3 w4

w2

• The other units are input or hidden units.

Slide credit: G. Hinton

Next Q: Training Recurrent Neural Networks w1 w3

w2 w4

Backprop through time (BPTT)

time=3

w1

w3 w4 w2

w1

w3 w4

w2

w1

w3 w4

w2

time=2

time=1

time=0

Slide credit: G. Hinton

Recall: Training a FFNN • The algorithm: - Provide: 𝑥, 𝑦 - Learn: 𝑊 (1) , ⋯ , 𝑊 𝐿 - Forward Pass: 𝑧 (𝑖+1) = 𝑊 (𝑖) 𝑎(𝑖) , 𝑎(𝑖+1) = 𝑓(𝑧 (𝑖+1) ) - Backprop: 𝛿 (𝑖)

=

𝑇 (𝑖+1) (𝑖) 𝑊 𝛿

𝛻W i J W, b; x, y = 𝛿

⋅ 𝑓′(𝑧 (𝑖) )

𝑖+1

𝑎

(𝑖) 𝑇

About function 𝑓 • Example: Sigmoid Function •𝑓 𝑧 =

1 1+𝑒 −𝑡

∈ (0,1)

• 𝑓 ′ 𝑧 = 𝑓 𝑧 1 − 𝑓 𝑧 ≤ 0.25 • Backprop Analysis (RNN): Magnitude of gradients 𝑞 (𝑖) 𝜕𝛿 𝑇 𝑖+𝑚−1 (𝑖+𝑚−1) ) = 𝑊 𝑓′(𝑧 𝜕𝛿 (𝐿) 𝑚=1 (𝑖) 𝜕𝛿 ′ 𝑁𝑒𝑡 𝑞 ≤ 𝑊 max 𝑓 𝜕𝛿 (𝐿)

Exploding/Vanishing gradients • Better Initialization • Long-range dependencies

Long Short-Term Memory • Memory Block: Basic Unit • Store and Access information Xt-1

Ht-1

Yt-1

Block Xt-1

Ht-1

Yt-1

Ht

Yt

Ct-1

Xt

Ht

Yt

Xt

Ct

Memory Cell and Gates • Input Gate: 𝑖𝑡 • Forget Gate: 𝑓𝑡 • Output Gate: 𝑜𝑡 • Cell Activation Vector: 𝑐𝑡

Example: LSTM network • 4 input units • 5 output units • 1 block - 2 LSTM cells

LSTM: Forward Pass

• Input Gate:

𝑖𝑡 = 𝜎(𝑊𝑥𝑖 𝑥𝑡 + 𝑊ℎ𝑖 ℎ𝑡−1 + 𝑊𝑐𝑖 𝑐𝑡−1 + 𝑏𝑖 )

• Forget Gate:

𝑓𝑡 = 𝜎(𝑊𝑥𝑓 𝑥𝑡 + 𝑊ℎ𝑓 ℎ𝑡−1 + 𝑊𝑐𝑓 𝑐𝑡−1 + 𝑏𝑓 )

• Output Gate:

𝑜𝑡 = 𝜎 𝑊𝑥𝑜 𝑥𝑡 + 𝑊ℎ𝑜 ℎ𝑡−1 + 𝑊𝑐𝑜 𝑐𝑡 + 𝑏𝑜

Preservation of gradient information • LSTM: 1 input unit, 1 hidden unit, & 1 output unit • Node: black (activated) • Gate: “-” (closed), “o” (open)

LSTM: Forward Pass

• Memory cell:

𝑐𝑡 = 𝑓𝑡 𝑐𝑡−1 + 𝑖𝑡 tanh(𝑊𝑥𝑐 𝑥𝑡 + 𝑊ℎ𝑐 ℎ𝑡−1 + 𝑏𝑐 )

• Hidden Vector: ℎ𝑡 = 𝑜𝑡 tanh(𝑐𝑡 )

How LSTM deals with V/E Gradients? • RNN hidden unit ℎ𝑡 = 𝜎 𝑊𝐼 𝑥𝑡 + 𝑊𝐻 ℎ𝑡−1 + 𝑏ℎ • Memory cell (Linear Unit) 𝑐𝑡 = 𝑓𝑡 𝑐𝑡−1 + 𝑖𝑡 tanh(𝑊𝑥𝑐 𝑥𝑡 + 𝑊ℎ𝑐 ℎ𝑡−1 + 𝑏𝑐 )

Summary • HMM: discrete data • RNN: continuous domain • LSTM: long-range dependencies

The End Thank you!

## Recurrent Neural Networks

Sep 18, 2014 - Memory Cell and Gates. â¢ Input Gate: ... How LSTM deals with V/E Gradients? â¢ RNN hidden ... Memory cell (Linear Unit). . =  ...

#### Recommend Documents

Explain Images with Multimodal Recurrent Neural Networks
Oct 4, 2014 - In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating .... It needs a fixed length of context (i.e. five words), whereas in our model, ..... The perplexity of MLBL-F and LBL now are 9.90.

recurrent neural networks for voice activity ... - Research at Google
28th International Conference on Machine Learning. (ICML), 2011. [7] R. Gemello, F. Mana, and R. De Mori, âNon-linear es- timation of voice activity to improve ...

Using Recurrent Neural Networks for Slot Filling in Spoken ... - Microsoft
experiments on the well-known airline travel information system. (ATIS) benchmark. ... The dialog manager then interprets and decides on the ...... He received an. M.S. degree in computer science from the University .... He began his career.

recurrent deep neural networks for robust
network will be elaborated in Section 3. We report our experimental results in Section 4 and conclude our work in Section 5. 2. RECURRENT DNN ARCHITECTURE. 2.1. Hybrid DNN-HMM System. In a conventional GMM-HMM LVCSR system, the state emission log-lik

On Recurrent Neural Networks for Auto-Similar Traffic ...
auto-similar processes, VBR video traffic, multi-step-ahead pre- diction. ..... ulated neural networks versus the number of training epochs, ranging from 90 to 600.

Using Recurrent Neural Networks for Slot Filling in Spoken ... - Microsoft
two custom SLU data sets from the entertainment and movies .... searchers employed statistical methods. ...... large-scale data analysis, and machine learning.

Bengio - Recurrent Neural Networks - DLSS 2017.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Bengio ...

Using Recurrent Neural Networks for Time.pdf
Submitted to the Council of College of Administration & Economics - University. of Sulaimani, As Partial Fulfillment for the Requirements of the Master Degree of.

Recurrent neural networks for remote sensing image ...
classification by proposing a novel deep learning framework designed in an ... attention mechanism for the task of action recognition in videos. 3 Proposed ...

Recurrent Neural Networks for Noise Reduction in Robust ... - CiteSeerX
duce a model which uses a deep recurrent auto encoder neural network to denoise ... Training noise reduction models using stereo (clean and noisy) data has ...

Bengio - Recurrent Neural Networks - DLSS 2017.pdf
model: every variable predicted from all previous ones. Page 4 of 42. Bengio - Recurrent Neural Networks - DLSS 2017.pdf. Bengio - Recurrent Neural Networks ...

Neural Networks - GitHub
Oct 14, 2015 - computing power is limited, our models are necessarily gross idealisations of real networks of neurones. The neuron model. Back to Contents. 3. ..... risk management target marketing. But to give you some more specific examples; ANN ar

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
that aim to reduce the average number of operations per step [18, 10], ours enables skipping steps completely. Replacing RNN updates with copy operations increases the memory of the network and its ability to model long term dependencies even for gat

Fast and Accurate Recurrent Neural Network Acoustic Models for ...
Jul 24, 2015 - the input signal, we first stack frames so that the networks sees multiple (e.g. 8) ..... guage Technology Workshop, 1994. [24] S. FernÃ¡ndez, A.

The Context-dependent Additive Recurrent Neural Net
Inspired by recent works in dialog systems (Seo et al., 2017 ... sequence mapping problem with a strong control- ling context ... minimum to improve train-ability in domains with limited data ...... ference of the European Chapter of the Associa-.

Long Short-Term Memory Recurrent Neural ... - Research at Google
RNN architectures for large scale acoustic modeling in speech recognition. ... a more powerful tool to model such sequence data than feed- forward neural ...

Intriguing properties of neural networks
Feb 19, 2014 - we use one neural net to generate a set of adversarial examples, we ... For the MNIST dataset, we used the following architectures [11] ..... Still, this experiment leaves open the question of dependence over the training set.

Recurrent Neural Network based Approach for Early Recognition of ...
a specific AD signature from EEG. As a result they are not sufficiently ..... [14] Ifeachor CE, Jervis WB. Digital signal processing: a practical approach. Addison-.

A Recurrent Neural Network that Produces EMG from ...
consisted of a simulated M1 circuit (sM1, 150 neurons), which provided input to three separate spinal cord circuits (sSC1-â3, 25 neurons each performing ...

Long Short-Term Memory Based Recurrent Neural Network ...
Feb 5, 2014 - an asymmetrical window, with 5 frames on the right and either 10 or. 15 frames on the left (denoted 10w5 and 15w5 respectively). The LSTM ...

Neural Graph Learning: Training Neural Networks Using Graphs
many problems in computer vision, natural language processing or social networks, in which getting labeled ... inputs and on many different neural network architectures (see section 4). The paper is organized as .... Depending on the type of the grap

Recurrent epidemics in small world networks - Semantic Scholar
Dec 15, 2004 - social and geographical mobilities imply a fraction of random .... k Â¼ 4n neighbouring sites on the square lattice (first ..... 10, 83â95. Bolker, B.

lecture 17: neural networks, deep networks, convolutional ... - GitHub
As we increase number of layers and their size the capacity increases: larger networks can represent more complex functions. â¢ We encountered this before: as we increase the dimension of the ... Lesson: use high number of neurons/layers and regular