Model-induced Regularization

Shinichi Nakajima Nikon Corporation, Tokyo 140-8601, Japan

[email protected]

Masashi Sugiyama Tokyo Institute of Technology and JST PRESTO, Tokyo 152-8552, Japan

Abstract

caused by density non-uniformity of distribution functions in the parameter space, and therefore observed only when at least one parameter is integrated out. (No parameter is integrated out in MAP.) Other popular models in machine learning, e.g., mixture models and hidden Markov models, also have a similar structure to Eq.(2), which induces MIR.

When the Bayesian estimation is applied to modern probabilistic models, an unintentional strong regularization is often observed. We explain the mechanism of this effect, and introduce relevant works. Suppose we are given i.i.d. samples {x1 , . . . , xn ∈ R} taken from Gaussian model with the mean parameter u ∈ R: p(x) = N (x; u, 12 ).

(1)

Assuming Gaussian prior, pu (u) = N (u; 0, c2u ), where c2u > 0 is a variance hyperparameter, we can perform Bayesian estimation, controlling c2u as a regularization constant. What if we set c2u to a large value (c2u → ∞)? The answer may be trivial; we get an unregularized estimator. (More accurately, the mode of the Bayesian predictive distribution coincides to the maximum likelihood (ML) estimator.) Suppose next the following model: p(x) = N (x; ab, 12 ).

[email protected]

(2)

Here, the parameters are a, b ∈ R, whose product corresponds to the parameter u in the original model (1). Let us assume Gaussian priors on a and b: pa (a) = N (a; 0, c2a ), pb (b) = N (b; 0, c2b ). Will we similarly get an unregularized estimator of u = ab when c2a , c2b → ∞? The answer is NO. The estimator tends to be strongly regularized. We call this effect model-induced regularization (MIR), since it is inherent in the model likelihood function. Actually, Eq.(2) is a special case of the matrix factorization model, and therefore, MIR explains the empirically observed superiority (Salakhutdinov & Mnih, 2008) of full-Bayesian estimation over maximum a posteriori (MAP) estimation. Here, note that MIR is

The origin of MIR can be explained in terms of the Jeffreys prior (Jeffreys, 1946), with which the two models, (1) and (2), give the equivalent estimation. Another explanation has been done in the context of visual recognition (Freeman, 1994). Although the idea of the Jeffreys prior is widely known, people seem to underestimate the strength of this effect. In our poster, we will explain why MIR occurs. Then, works that relate MIR with singularities of probabilistic models are introduced. A powerful procedure for quantitative evaluation of MIR has been developed, and applied to various models (Watanabe, 2009). Theoretical analysis has been extended to the variational Bayesian (VB) approximation. We will also introduce works that clarified the strength of MIR when VB is applied (Nakajima & Sugiyama, 2010).

References Freeman, W. (1994). The Generic Viewpoint Assumption in a Framework for Visual Perception. Nature, 368, 542– 545. Jeffreys, H. (1946). An Invariant Form for the Prior Probability in Estimation Problems. Proceedings of the Royal Society of London. Series A. (pp. 453–461). Nakajima, S., & Sugiyama, M. (2010). Implicit Regularization in Variational Bayesian Matrix Factorization. ICML2010. Salakhutdinov, R., & Mnih, A. (2008). Bayesian Probabilistic Matrix Factorization using Markov Chain Monte Carlo. ICML 2008. Watanabe, S. (2009). Algebraic geometry and statistical learning. Cambridge, UK: Cambridge University Press.

Model-induced Regularization

The answer may be trivial; we get an unregularized es- timator. (More accurately, the mode of the Bayesian predictive distribution coincides to the maximum like- lihood (ML) estimator.) Suppose next the following model: p(x) = N(x; ab, 12). (2). Here, the parameters are a, b ∈ R, whose product cor- responds to the ...

47KB Sizes 2 Downloads 272 Views

Recommend Documents

Regularization Energy E(R,M)
Oct 20, 2009 - 5) Which processes achieve the maximum and the minimum E(R,M)? .... regularization energy than processes with lower minimum distance.

Discriminant Component Pruning: Regularization and ...
Neural networks are often employed as tools in classification tasks. The ... (PCP) (Levin, Leen, & Moody, 1994) uses principal component analysis to determine which ... ond demonstrates DCP's ability to cope with data of varying scales across ......

VARIANCE REGULARIZATION OF RNNLM FOR ...
algorithm for RNNLMs to address this problem. All the softmax-normalizing factors in ..... http://www.fit.vutbr.cz/ imikolov/rnnlm/thesis.pdf. [3] Holger Schwenk and ...

via Total Variation Regularization
sensor have a field of view of 360 degrees; this property is very useful in robotics since it increases a rohot's performance for navigation and localization.

Regularization Energy E(R,M)
an be the points of φ ∩ B(0,R) , then for an optimal mapping f, f(ai) < f(aj) if i

REGULARIZATION OF TRANSPORTATION MAPS FOR ...
tion 3 as a way to suppress these artifacts. We conclude with ... In the following, by a slight abuse of language, we call trans- portation map the image of ...

Discriminant Component Pruning: Regularization ... - MIT Press Journals
We present dis- criminant components pruning (DCP), a method of pruning matrices of summed contributions between layers of a neural network. Attempting to.

Implicit Regularization in Variational Bayesian ... - Semantic Scholar
MAPMF solution (Section 3.1), semi-analytic expres- sions of the VBMF solution (Section 3.2) and the. EVBMF solution (Section 3.3), and we elucidate their.

DESPOT: Online POMDP Planning with Regularization
Here we give a domain-independent construction, which is the average ... We evaluated the algorithms on four domains, including a very large one with about ...

Entropic Graph Regularization in Non ... - Research at Google
of unlabeled data is known as semi-supervised learning (SSL). ... obtained, and adding large quantities of unlabeled data leads to improved performance.

Social influences on the regularization of unpredictable linguistic ...
acquire all the variants and use them probabilistically (i.e. ... the Student and Graduate Employment job search web site ... screen and prompted to name plant.

Social influences on the regularization of unpredictable linguistic ...
regularized more when learning from individually consistent teachers. ... Participants progressed through a self-paced computer program individually, in .... binomial distribution, implemented in the R programming environment version 3.0.1 ...

Regularization behavior in a non-linguistic domain - Linguistics and ...
gest that regularization behavior may be due to domain-general factors, such as memory ... (2005, 2009) have shown that children tend to regularize free variation ..... one output per input) can be predicted with 100% accuracy. The ceiling on ...

't Hooft, Dimensional Regularization and the Renormalization Group ...
dimension method. The techniques proposed ... Callan-Symanzik equation. ... 't Hooft, Dimensional Regularization and the Renormalization Group.pdf. 't Hooft ...

Clustering with Local and Global Regularization
text mining, Web analysis, marketing, computational biol- ogy, and many others ... which causes many local optimal solutions; (2) the itera- tive procedure (e.g. ...

Regularization of the NNARX structure for steam ...
System being modeled is a new steam distillation essential oil extraction system integrated with ... 1(b) shows the traditional refilling system and Fig. 1(c) shows ...

From Regularization Operators to Support Vector Kernels
A.I.. Memo No. 1606, MIT, 1997. F. Girosi, M. Jones, and T. Poggio. Priors, stabilizers ... Technical Report 1997-1064, URL: http://svm.first.gmd.de/papers.html.

An Energy Regularization Method for the Backward ...
After time T, the images are blurred by diffusion (1.1) with the second boundary condi- tion ∂u. ∂ν. =0, and we obtained blurred images g(x), which are what we observed at time. T. The problem is how to get the original images u0(x) from the kno

The Little Engine that Could Regularization by ... - Semantic Scholar
Nov 9, 2016 - Abstract. Removal of noise from an image is an extensively studied problem in image processing. Indeed, the recent advent of sophisticated and highly effective denoising algorithms lead some to believe that existing methods are touching

Implicit Regularization in Variational Bayesian Matrix ...
Jun 22, 2010 - 9. NIKON CORPORATION. Core Technology Center. June 22, 2010. VB hardly overfits. [Raiko et al.2007]. [Lim&Teh2007]. MAP/ML test error ...