Elastic Net Regularization in Learning Theory

Christine De Mol [email protected] Department of Mathematics and ECARES, Universit`e Libre de Bruxelles, Campus Plaine CP 217, Bd du Triomphe, 1050 Brussels, Belgium Ernesto De Vito [email protected] Universit` a di Genova, Stradone Sant’Agostino, 37, 16123, Genova, Italy and INFN, Sezione di Genova, Via Dodecaneso 33, 16146 Genova, Italy Lorenzo Rosasco [email protected] Center for Biological and Computational Learning, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 and DISI, Universit`a di Genova, v. Dodecaneso 35, 16146 Genova, Italy

Abstract In many applications of supervised learning a main goal, besides achieving good generalization properties, is to detect which features are meaningful to build an estimator. There at least two main difficulties in the solution of this type of problems: the initial number of potentially relevant features is often much larger than the number of examples and it is often the case that many of the variables (also those that are relevant) are strongly dependent. Note that if our criterion to select variables is only their generalization property, when faced to select two highly dependent variables we would obtain the same discriminative power selecting either one or both features. Both the above issues make the problem of variable selection ill-posed and suggest that the minimizer of the empirical risk is prone to overfit the data. Our study aims at exploring the use of regularization techniques for restoring well-posedness and for ensuring statistically meaningful solutions within the framework of statistical learning. A key to solve the above problems is assuming the number of relevant features to be small. Such a sparsity assumption advocates for the use of sparsity enhancing learning algorithms and indeed this class of methods have recently attracted increasing attention as a way to deal with high dimensional data. In this paper we wish to develop a sparsity based algorithm which can deal with features which may be dependent and infinite in number. To this aim we study a regularization procedure based on penalized empirical risk minimization with a penalty that is a weighted sum of an `1 and an `2 norm, namely elastic-net regularization (Zou & Hastie, 2005). Such a method allows

to obtain estimators that are both sparse and stable with respect to the data. The `1 term in the penalty promotes sparsity of the obtained estimator. The effect of the `2 term is twofold: first, a whole group of correlated relevant variables is selected rather than a single variable in the group; second, when the variables are dependent, stability is ensured both with respect to noise and random sampling. Our study focus on the generalization properties as well as the algorithmic properties of the elastic net regularization. We prove that if the hypotheses space induced by the set of feature is sufficiently rich the algorithm is universal consistent. Under suitable assumptions on the target function we can derive sample bounds and when such assumption are not available we propose an adaptive parameter choice to achieve the same rates. A key in our study is the characterization of the optimization problem defining elastic net using tools from convex analysis. As a by product we can define an iterative thresholding algorithm (Daubechies et al., 2004) to compute the regularized solution which is very easy to implement and it’s an alternative to the LARS iteration used in (Zou & Hastie, 2005).

References Daubechies, I., Defrise, M., & De Mol, C. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math., 57, 1413–1457. Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. JRSSB, 67(2), 301–320.

Elastic Net Regularization in Learning Theory

Elastic Net Regularization in Learning Theory ... Center for Biological and Computational Learning, Department of Brain and ... prone to overfit the data.

51KB Sizes 0 Downloads 176 Views

Recommend Documents

Basic Theory of Plates and Elastic Stability
A feature of the body is that one dimension is much smaller than the other two .... Constitutive Equations. Since the thickness of a plate is small in comparison with the other dimensions, it is usually ..... By either the second- or the fourth-order

theory of elastic stability timoshenko pdf free download
timoshenko pdf free download. Download now. Click here if your download doesn't start automatically. Page 1 of 1. theory of elastic stability timoshenko pdf free ...

Implicit Regularization in Variational Bayesian ... - Semantic Scholar
MAPMF solution (Section 3.1), semi-analytic expres- sions of the VBMF solution (Section 3.2) and the. EVBMF solution (Section 3.3), and we elucidate their.

L2 Regularization for Learning Kernels - NYU Computer Science
via costly cross-validation. However, our experiments also confirm the findings by Lanckriet et al. (2004) that kernel- learning algorithms for this setting never do ...

Entropic Graph Regularization in Non ... - Research at Google
of unlabeled data is known as semi-supervised learning (SSL). ... obtained, and adding large quantities of unlabeled data leads to improved performance.

Regularization behavior in a non-linguistic domain - Linguistics and ...
gest that regularization behavior may be due to domain-general factors, such as memory ... (2005, 2009) have shown that children tend to regularize free variation ..... one output per input) can be predicted with 100% accuracy. The ceiling on ...

Model-induced Regularization
The answer may be trivial; we get an unregularized es- timator. (More accurately, the mode of the Bayesian predictive distribution coincides to the maximum like- lihood (ML) estimator.) Suppose next the following model: p(x) = N(x; ab, 12). (2). Here

PDF-Download Service-Learning in Theory and ...
PDF-Download Service-Learning in Theory and ... Profitable Social Media Marketing: How To Grow Your Business Using Facebook, Twitter, Instagram, LinkedIn.

Comparison processes in category learning: From theory to ... - CS - Huji
May 13, 2008 - eSchool of Computer Sciences and Engineering, Hebrew University, Israel ..... objects into clusters than the use of different-class exemplars, in.

A Theory of Model Selection in Reinforcement Learning
4.1 Comparison of off-policy evaluation methods on Mountain Car . . . . . 72 ..... The base of log is e in this thesis unless specified otherwise. To verify,. γH Rmax.

A Theory of Model Selection in Reinforcement Learning - Deep Blue
seminar course is my favorite ever, for introducing me into statistical learning the- ory and ..... 6.7.2 Connections to online learning and bandit literature . . . . 127 ...... be to obtain computational savings (at the expense of acting suboptimall

Implicit Regularization in Variational Bayesian Matrix ...
Jun 22, 2010 - 9. NIKON CORPORATION. Core Technology Center. June 22, 2010. VB hardly overfits. [Raiko et al.2007]. [Lim&Teh2007]. MAP/ML test error ...

Inverse Game Theory: Learning Utilities in Succinct ...
Feb 12, 2016 - perspective,for the class of succinct games we show that finding a set (not necessarily the most likely) of utilities is .... Correlated equilibria in the form of a PMP exist in every game and can be computed ..... The Inverse-Utility

Entropic Graph Regularization in Non-Parametric ... - NIPS Proceedings
Most of the current graph-based SSL algorithms have a number of shortcomings – (a) in ... clude [7, 8]) attempt to minimize squared error which is not optimal for classification problems [10], ..... In this section we present results on two popular

Comparison processes in category learning: From theory to ... - CS - Huji
bNeurobiology Department, Institute of Life Sciences, Hebrew University, Israel. cIntel Research, Israel ... eSchool of Computer Sciences and Engineering, Hebrew University, Israel. A R T I C L E I N F O ... Available online 13 May 2008 ...... In ord

Elastic Remote Methods - KR Jayaram
optimize the performance of new or existing distributed applications while de- ploying or moving them to the cloud, engineering robust elasticity management components is essential. This is especially vital for applications that do not fit the progra

Elliptical moveout operator for data regularization in ...
Dec 11, 2012 - Elliptical moveout operator for data regularization in azimuthally anisotropic media. Jeffrey Shragge1. ABSTRACT. Data regularization by azimuthal moveout (AMO) is an important seismic processing step applied to minimize the deleteriou

Entropic Graph Regularization in Non-Parametric Semi ...
We also showed that this objective can be minimized using alternating minimization (AM), and can ... For example, on the Internet alone we create 1.6 ... We use the phone classification problem to demonstrate the scalability of the algorithm.

Mixed Priority Elastic Resource Allocation in Cloud Computing ... - IJRIT
Cloud computing is a distributed computing over a network, and means the ... In this they use the stack to store user request and pop the stack when they need.

Elastic Routing in Wireless Networks With Directional ... - IEEE Xplore
Abstract— Throughput scaling law of a large wireless network equipping directional antennas at each node is analyzed based on the information-theoretic ...