Raphael Fonteneau1 Nathan Korda2 RÃ©mi Munos3

Viewer
Transcript

An Optimistic Posterior Sampling Strategy for Bayesian Reinforcement Learning Raphael Fonteneau

1

Nathan Korda

2

Rémi Munos

3

Department of Electrical Engineering and Computer Science, University of Liège, Belgium 2 Department of Engineering Science, Oxford University, United Kingdom 3 Inria Lille – Nord Europe, France / Microsoft Research New England, USA

1

Abstract

Optimistic Posterior Sampling

We consider the problem of decision making in the context of unknown Markov Decision Processes (MDPs) with finite state and action spaces. In a Bayesian reinforcement learning framework, we propose an optimistic posterior sampling strategy based on the maximization of state-action value functions of MDPs sampled from the posterior. First experiments are promising.

Introduction ●

●

●

The design of algorithms addressing the Exploration/Exploitation dilemma in MDPs remains challenging This contribution lies within the class of Bayesian Reinforcement Learning techniques We propose to combine the optimism in the face of uncertainty principle with posterior sampling techniques

Background and problem statement ●

Let

be an unknown MDP, with

●

Optimality criterion for a given policy:

●

The Bayesian setting: ●

●

Given

The OPS algorithm:

, we define:

The goal is to efficiently exploit the posterior distribution for guiding exploration in order to generate a sequence of policies which maximizes a given E/E criterion. Such a criterion can be, for instance, the expected (either finite or discounted) sum of rewards collected, or the performance of the policy found after a given phase.

Experimental results ●

The 5-state chain MDP:

●

On this benchmark, OPS provides better results that Thompson sampling (which corresponds to OPS with n=1)

Acknowledgements Raphael Fonteneau is a postdoctoral fellow of the F.R.S-FNRS (Belgium Fund for Scientific Research). We also thank the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreements no 270327 (CompLACS) and the Belgian Network DYSCO funded by the IAP Programme, initiated by the Belgian State, Science Policy Office.

Raphael Fonteneau1 Nathan Korda2 RÃ©mi Munos3

Nathan Seegert

Raphael Rossi.pdf

Nathan Seegert

Steps to Implementing Java RMI

Nathan Morin

Nathan Tibbetts - GitHub

6- Salvando a Raphael Santiago.pdf

Nathan Seegert - University of Utah

(Fábio Marvulle Bueno e Raphael Seabra) subimperialismo ...

Final Itb Rebidding-AKL-RMI P1-20140714.pdf

A New RMI Framework for Outdoor Objects Recognition - CiteSeerX

Nathan For You - Rebranding.pdf

Dear Nathan-Erisca Febrian.pdf

2018 - 2019 School Calendar for Saint Raphael Catholic School

Raphael Fonteneau Louis Wehenkel Damien Ernst

Dear Nathan-Erisca Febrian.pdf

Nathan Explosion, Mona Wales

Nathan fake steam days

Raphael Fonteneau Damien Ernst Bernard Boigelot ...

3Inria Lille â Nord Europe, France / Microsoft Research New England, USA. Introduction. â The design of algorithms addressing the ... We also thank the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreements no 270327. (CompLACS) and the Belgian Network DYSCO funded by ...

Download PDF

613KB Sizes 0 Downloads 27 Views

Report

Raphael Fonteneau1 Nathan Korda2 RÃ©mi Munos3

Nathan Seegert

Raphael Rossi.pdf

Nathan Seegert

Steps to Implementing Java RMI

Nathan Morin

Nathan Tibbetts - GitHub

6- Salvando a Raphael Santiago.pdf

Nathan Seegert - University of Utah

(Fábio Marvulle Bueno e Raphael Seabra) subimperialismo ...

Final Itb Rebidding-AKL-RMI P1-20140714.pdf

A New RMI Framework for Outdoor Objects Recognition - CiteSeerX

Nathan For You - Rebranding.pdf

Dear Nathan-Erisca Febrian.pdf

2018 - 2019 School Calendar for Saint Raphael Catholic School

Raphael Fonteneau Louis Wehenkel Damien Ernst

Dear Nathan-Erisca Febrian.pdf

Nathan Explosion, Mona Wales

Nathan fake steam days

Raphael Fonteneau Damien Ernst Bernard Boigelot ...

Raphael Fonteneau1 Nathan Korda2 RÃ©mi Munos3

Recommend Documents