An Optimistic Posterior Sampling Strategy for Bayesian Reinforcement Learning Raphael Fonteneau

1

Nathan Korda

2

Rémi Munos

3

Department of Electrical Engineering and Computer Science, University of Liège, Belgium 2 Department of Engineering Science, Oxford University, United Kingdom 3 Inria Lille – Nord Europe, France / Microsoft Research New England, USA

1

Abstract

Optimistic Posterior Sampling

We consider the problem of decision making in the context of unknown Markov Decision Processes (MDPs) with finite state and action spaces. In a Bayesian reinforcement learning framework, we propose an optimistic posterior sampling strategy based on the maximization of state-action value functions of MDPs sampled from the posterior. First experiments are promising.

Introduction ●





The design of algorithms addressing the Exploration/Exploitation dilemma in MDPs remains challenging This contribution lies within the class of Bayesian Reinforcement Learning techniques We propose to combine the optimism in the face of uncertainty principle with posterior sampling techniques

Background and problem statement ●

Let

be an unknown MDP, with



Optimality criterion for a given policy:



The Bayesian setting: ●



Given

The OPS algorithm:

, we define:

The goal is to efficiently exploit the posterior distribution for guiding exploration in order to generate a sequence of policies which maximizes a given E/E criterion. Such a criterion can be, for instance, the expected (either finite or discounted) sum of rewards collected, or the performance of the policy found after a given phase.

Experimental results ●

The 5-state chain MDP:



On this benchmark, OPS provides better results that Thompson sampling (which corresponds to OPS with n=1)

Acknowledgements Raphael Fonteneau is a postdoctoral fellow of the F.R.S-FNRS (Belgium Fund for Scientific Research). We also thank the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreements no 270327 (CompLACS) and the Belgian Network DYSCO funded by the IAP Programme, initiated by the Belgian State, Science Policy Office.

Raphael Fonteneau1 Nathan Korda2 Rémi Munos3

3Inria Lille – Nord Europe, France / Microsoft Research New England, USA. Introduction. ○ The design of algorithms addressing the ... We also thank the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreements no 270327. (CompLACS) and the Belgian Network DYSCO funded by ...

613KB Sizes 0 Downloads 27 Views

Recommend Documents

Raphael Fonteneau1 Nathan Korda2 Rémi Munos3
1Department of Electrical Engineering and Computer Science, University of Liège, ... 2Department of Engineering Science, Oxford University, United Kingdom.

Nathan Seegert
We find strong empirical support for the model, demonstrating ... to the years 1992 to 2012, over 20,000 acquisitions by publicly traded companies, using .... Looking at post 1960 data they find no evidence of negative correlation, suggesting.

Raphael Rossi.pdf
Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Raphael Rossi.pdf. Raphael Ro

Nathan Seegert
Page 1 ..... free trapped equity.” This pattern does not exist in our data. ...... payment method and firm monitoring, that could explain this result. Together these ...

Steps to Implementing Java RMI
connect to the WeatherServer object through the WeatherIntf interface. Once we have ... directory. On my machine it is: C:\Program Files\JavaSoft\JRE\1.3\lib\ext.

Nathan Morin
Be a Software Engineer at an engaging company solving ... System Analysis & Database Design ... Purdue Dean's List and Semester Honors 2013-2016. Skills.

Nathan Tibbetts - GitHub
Software Engineer - Feb 2014 to Sept 2014 ... Developed new product that helps companies proactively identify software components they are using or want to ...

6- Salvando a Raphael Santiago.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. 6- Salvando a ...

Nathan Seegert - University of Utah
University of Wisconsin, Madison, WI. B.A. Economics, (with honors, mathematics emphasis, Dean's List). PUBLISHED PAPERS: The Performance of State Tax ...

(Fábio Marvulle Bueno e Raphael Seabra) subimperialismo ...
(Fábio Marvulle Bueno e Raphael Seabra) subimperialismo brasileiro.pdf. (Fábio Marvulle Bueno e Raphael Seabra) subimperialismo brasileiro.pdf. Open.

Final Itb Rebidding-AKL-RMI P1-20140714.pdf
(Sgd) MARGIE A. BILIGAN. SBAC Chairperson. Page 2 of 2. Final Itb Rebidding-AKL-RMI P1-20140714.pdf. Final Itb Rebidding-AKL-RMI P1-20140714.pdf.

A New RMI Framework for Outdoor Objects Recognition - CiteSeerX
recognition function. For instance, intruder recognition function can be incorporated into a security system to classify intruders in order to reduce nuisance alarm ...

Nathan For You - Rebranding.pdf
Page 1 of 1. NATHAN FOR YOU: Ghost Realtor. 1) “In an oversaturated market it can be hard to stand out in the crowd”. What does Nathan. mean by this, and ...

Dear Nathan-Erisca Febrian.pdf
Page 1 of 525. Page 1 of 525. Page 2 of 525. Page 2 of 525. Page 3 of 525. Page 3 of 525. Main menu. Displaying Dear Nathan-Erisca Febrian.pdf. Page 1 of 525.Missing:

2018 - 2019 School Calendar for Saint Raphael Catholic School
Date. Event. Time. Location. August 8, 2018. Back-to-school Fair. 10:30 - 12:00 IH and RH. August 9, 2018. Teacher Work Day. August 9, 2018. SET Training.

Raphael Fonteneau Louis Wehenkel Damien Ernst
•For treating such diseases, physicians often adopt explicit, operationalized series of decision rules specifying how drug types and quantities should vary over time: these are named. Dynamic Treatment Regimes (DTRs). •While typically DTRs are ba

Dear Nathan-Erisca Febrian.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Dear Nathan-Erisca Febrian.pdf. Dear Nathan-Erisca Febrian.pdf. Open. Extract. Open with. Sign In. Main menu

Nathan Explosion, Mona Wales
NathanExplosion, Mona Wales.Sheetmusic pdf.Men's Health. Australia – January 2016.Jamies 15 s01e02.Heappeared unmoved about hisattitude needs to ... Mr pickles is_safe:1. Gdfr k theory.008912417.Crossy road apk.House of dvfs01e03.Punky brewster sea

Nathan fake steam days
F.e.a.r 2 project origin. Nathan fakesteamdays - Download.Nathan fakesteamdays.Once upon a. times05e10 killers.Nathan fakesteamdays.Nathan fakesteamdays.Miss RaquellMid DayLay.Nataleeroxy. Passion in the maze.640878470.Windows 10 torrent.Theextants01

Raphael Fonteneau Damien Ernst Bernard Boigelot ...
... using Lagrangian relaxation. Introduction. ○. Discrete-time optimal control problems arise in many fields (engineering, finance, medicine, artificial intelligence, ...