Bayesian Hypernetworks David Krueger*, Chin-Wei Huang* Riahsat Islam, Ryan Turner, Aaron Courville MILA and McGill RLLAB

What’s a Bayesian Hypernet? Hypernet: a DNN that generates params of another DNN (the “primary net”)

Task: predict y from x

Y Z Think GAN / VAE / Real NVP

Hypernet

Primary Net

X

What is a Bayesian Neural Net? Bayes Rule: Predict using ensemble: Argmax

“Weight Uncertainty in Neural Networks” - Blundell et al 2015

What’s special about Bayesian Neural Nets? Bayes Rule:

“Knows what it knows” “That’s my best guess”

“I’m 99% sure!”

“Calibrated confidence”

“Weight Uncertainty in Neural Networks” - Blundell et al 2015

Example: self-driving cars Q: Is there a person in the road? Car: No, and…. “That’s my best guess”

“I’m 51% sure!” “I’m 99.999999% sure!”

__________ Humans want?

Existential risk

AI Safety “Is the default outcome doom?”

Concrete Problems in AI Safety

● Five “concrete problems”, calibrated confidence helps in 4/5

← Reward uncertainty

← I don’t know, ask Tom Everrit

← active learning

← safe exploration

← anomaly detection

Technique

Variational Inference for Bayesian DNNs ● ELBO:

constant

maximize

minimize

● Examples: Weight Uncertainty Variational Dropout / MC dropout

Encourages stochasticity!

Problem with Variational Inference: KL divergence Variational inference can underestimate uncertainty! P = true posterior (mixture of Gaussians)

Q = variational approx (Gaussian)

Are Bayesian Hypernets the solution? ● Previous work: approximate posterior is factorial: ● Use a DNN! ○ ⇒ can be dependent, multimodal Y Z

Hypernet

Primary Net

X

Note: h must be invertible! ...but the image of h can be a subset of R^|theta|, unlike with NICE (generative model)

Some Qualitative Results: Multimodality

Correlation

Background: Hypernetworks

“Dynamic Filter Networks” - Brabandere et al. 2016 “Learning feed-forward one-shot learners” Bertinetto et al. 2016 “HyperNetworks” Ha et al. 2016

Background: Weight Normalization

“Weight Normalization” Salimans and Kingma (slide from NIPS 2016 talk)

Background: Invertible Deep Generative Models

Key property: tractable likelihood (via change of variable):

(figure and equation: “Density Estimation via Real NVP” - Dinh et al. 2016)

Some results (5000 examples of MNIST):

QUESTIONS?

Kill all humans??

Krueger - Bayesian Hypernetworks.pdf

Whoops! There was a problem loading more pages. Retrying... Krueger - Bayesian Hypernetworks.pdf. Krueger - Bayesian Hypernetworks.pdf. Open. Extract.

2MB Sizes 24 Downloads 152 Views

Recommend Documents

Alan B. Krueger
American Free Trade Agreement (Grossman and Krueger, 1993). There we ... The World Bank Development Report (1992) also reports evidence on the ...

Freddy Krueger of plants .pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Freddy Krueger ...

THE OPEN BODY Joel Krueger , Dorothée Legrand ...
University of Copenhagen, CFS Center For Subjectivity Research, ..... course, a view of autopoietic systems as being "made externally" would be a ..... and showing off to call attention to themselves (Reddy and Trevarthen 2004; Reddy 2008).

Bayesian dark knowledge - Audentia
By contrast, we use online training (and can thus handle larger datasets), and use ..... Stochastic gradient VB and the variational auto-encoder. In ICLR, 2014.

Bayesian optimism - Springer Link
Jun 17, 2017 - also use the convention that for any f, g ∈ F and E ∈ , the act f Eg ...... and ESEM 2016 (Geneva) for helpful conversations and comments.

Bayesian optimism
Jun 17, 2017 - are more likely to use new information to update their beliefs when the information received is in ... sistency.2 For example, consider an investor who is choosing between two investing .... ante stage when she holds a “cool headedâ€

Bayesian PLDA
Aug 13, 2010 - There is one path between M and π, which is blocked when the head-to-tail node θt is observed, so that: P(M|θt,π) = P(M|θt). (7). Note that when the head-to-head node Dt, which is on the same path, is also observed, this node is n

Bayesian Reinforcement Learning
2.1.1 Bayesian Q-learning. Bayesian Q-learning (BQL) (Dearden et al, 1998) is a Bayesian approach to the widely-used Q-learning algorithm (Watkins, 1989), in which exploration and ex- ploitation are balanced by explicitly maintaining a distribution o

bayesian statistics pdf
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. bayesian statistics pdf. bayesian statistics pdf. Open. Extract.

Bayesian Network Tutorial.pdf
learning both the parameters and structure of a Bayesian network, including techniques ..... As an illustration, let us revisit the thumbtack problem. Here, .... In the final step of constructing a Bayesian network, we assess the local probability.

Dynamic Bayesian Networks
M.S. (University of Pennsylvania) 1994. A dissertation submitted in partial ..... 6.2.4 Modelling freeway traffic using coupled HMMs . . . . . . . . . . . . . . . . . . . . 134.

Bayesian Network Tutorial.pdf
Figure 3: A Bayesian-network for detecting credit-card fraud. ... Fraud, Age, and Sex are direct causes of Jewelry, we obtain the network structure in Figure. 3.

COMPUTATIONAL EPIDEMIOLOGY: BAYESIAN ...
data for two different demographic and geographic scenarios for pneumonia and influenza, that exhibit ... susceptibles, and are either in a state of recovery, fully recovered, or expired on contraction of ..... K. Rothman, Oxford University Press.

Bayesian Hierarchical Curve Registration
The analysis often proceeds by synchronization of the data through curve registration. In this article we propose a Bayesian hierarchical model for curve ...

pdf-1889\the-devils-bed-by-william-kent-krueger ...
... more apps... Try one of the apps below to open or edit this item. pdf-1889\the-devils-bed-by-william-kent-krueger-2003-12-01-by-william-kent-krueger.pdf.

Bayesian Experimental Design: A Review
The basic idea in ex- perimental design is ..... page 16), who showed that a simple linear transfor- mation can ...... Seo and Larntz (1992) suggested some criteria.

BAYESIAN DEFORMABLE MODELS BUILDING VIA ...
Abstract. The problem of the definition and the estimation of generative models based on deformable tem- plates from raw data is of particular importance for ...

Statistical resynchronization and Bayesian detection of periodically ...
course microarray experiments, a number of strategies have ... cell cycle course transcriptions. However, obtaining a pure synchronized ... Published online January 22, 2004 ...... Whitfield,M.L., Sherlock,G., Saldanha,A.J., Murray,J.I., Ball,C.A.,.