Relaxation Schemes for Min Max Generalization in Deterministic Batch Mode Reinforcement Learning Raphael Fonteneau Damien Ernst

Bernard Boigelot

Quentin Louveaux

University of Liège, Belgium Abstract

The T-stage min max generalization optimization problem

We study the min max optimization problem introduced in [1] for computing policies for batch mode reinforcement learning in a deterministic setting. This problem is NP-hard. We focus on the two-stage case for which we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. Both relaxation schemes are shown to provide better results than those given in [1].

Introduction Discrete-time optimal control problems arise in many fields (engineering, finance, medicine, artificial intelligence, etc). Batch mode reinforcement learning (RL) is a powerful tool to solve such problems when the only information available on the system is contained in a batch collection of trajectories of the system.



Batch mode RL algorithms are challenged when dealing with large or continuous spaces. In such cases, the main approach is to combine Dynamic Programming with function approximators, which can often lead to hazardous generalization.



To overcome this difficulty, [1] proposes a min max-type strategy for generalizing in deterministic, Lipschitz continuous environments with continuous state spaces, finite action spaces and finite time horizon.



In this work, we deeper investigate the min max optimization problem introduced in [1]. In particular, we propose two relaxation schemes that both provide better results that those given in [1]. Proofs of results are given in [2].



Focus on the 2-stage problem

Formalization ●

Deterministic discrete-time system, finite optimization horizon:



Continuous normed state space, finite action space:



Reward function:



T-stage return of a sequence of actions:

Two relaxation schemes for Assumption: the batch mode setting ●



Trust-region

The system dynamics and the reward function are unknown For each action transitions is known:

, a sample of

Lagrangian Relaxation

system

where

Assumption: Lipschitz continuity

Problem statement Comparing the bounds Given

,

and

, what is the worst possible return that

can be obtained for a specific sequence of actions? ●



Experimental results

Comparison of the two relaxation schemes with the solution (called CGRL) proposed in [1]:

Once this problem is solved, the min max approach to generalization aims at identifying a sequence of actions which maximizes its worst possible return. Distribution of the returns of control policies at the end of the sampling process

References [1] R. Fonteneau, S.A. Murphy, L. Wehenkel and D. Ernst. Towards min max generalization in reinforcement learning. Agents and Artificial Intelligence: International Conference, ICAART 2010, Valencia, Spain, January 2010, Revised Selected Papers. Series: Communications in Computed and Information Science (CCIS), Volume 129, pp. 61-77. Editors: J. Filipe, A. Fred, and B.Sharp. Springer, Heidelberg, 2011. [2] R. Fonteneau, D. Ernst, B. Boigelot, Q. Louveaux. Min max generalization for deterministic batch mode reinforcement learning : relaxation schemes. Submitted.

Acknowledgements Raphael Fonteneau is a Postdoctoral Fellow of the FRS-FNRS. This paper presents research results of the Belgian Network DYSCO and the PASCAL2 European Network of Excellence. The authors also thank Yurii Nesterov for pointing out the idea of using Lagrangian relaxation.

Regular grid

Uniform sampling (average over 100 runs)

Raphael Fonteneau Damien Ernst Bernard Boigelot ...

... using Lagrangian relaxation. Introduction. ○. Discrete-time optimal control problems arise in many fields (engineering, finance, medicine, artificial intelligence, ...

638KB Sizes 7 Downloads 137 Views

Recommend Documents

Raphael Fonteneau Louis Wehenkel Damien Ernst
•For treating such diseases, physicians often adopt explicit, operationalized series of decision rules specifying how drug types and quantities should vary over time: these are named. Dynamic Treatment Regimes (DTRs). •While typically DTRs are ba

Raphael Fonteneau Susan Murphy Louis Wehenkel ...
Dept. of Electrical Engineering and Computer Science, University of Liège, Belgium. †. Dept. of Statistics, University of Michigan, USA. ABSTRACT. The treatment of chronic-like illnesses such has HIV infection, cancer or chronic depression implies

Raphael Rossi.pdf
Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Raphael Rossi.pdf. Raphael Ro

Damien De Lepeleire.pdf
With his broad background knowledge of art. history, he places the discourse about reproduction in a distinctive, in- telligent, and refined context. His focus is on ...

6- Salvando a Raphael Santiago.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. 6- Salvando a ...

Ernst Boris Chain.pdf
bacteria without hurting the host organism, could. not have foreseen the almost incredible efficiency. and the wide scope of antibiotics in the fight against.

(Fábio Marvulle Bueno e Raphael Seabra) subimperialismo ...
(Fábio Marvulle Bueno e Raphael Seabra) subimperialismo brasileiro.pdf. (Fábio Marvulle Bueno e Raphael Seabra) subimperialismo brasileiro.pdf. Open.

MIRM_Borsa Ernst Young2015.pdf
Page 1 of 1. 2015-2016 Academic Year. MIRM SCHOLARSHIP COMPETITION. Master in Insurance and Risk Management | XVI edition. Ernst & Young and MIB ...

Ernst & Young Pvt Ltd.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Main menu.

2018 - 2019 School Calendar for Saint Raphael Catholic School
Date. Event. Time. Location. August 8, 2018. Back-to-school Fair. 10:30 - 12:00 IH and RH. August 9, 2018. Teacher Work Day. August 9, 2018. SET Training.

Seren Bernard Pembrokeshire Herald.pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying.

ocsb-st-bernard-boundary-map.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.