POMDP Homomorphisms Alicia Peregrin Wolfe Autonomous Learning Laboratory University of Massachusetts, Amherst [email protected]

Abstract

Linear PSR Algorithm

The problem of finding hidden state in a POMDP and the problem of finding state abstractions for MDPs are closely related. This work analyzes the connection between existing Predictive State Representation methods and homomorphic reductions of Markov Processes. We formally define a POMDP homomorphism, then extend PSR reduction methods to find POMDP homomorphisms when the original POMDP is known. The resulting methods find more compact abstract models than PSR reduction methods in situations where different observations have the same meaning for some task or set of tasks.

• Tests:

t = a1o1a2o2a3o3, P(t | s) = P(o1o2o3 | sa1a2a3)

• State represented by set of linearly independent tests: • State Mapping:

f(s1) = f(s2) ⇔ ∀ qi P(qi | s1) = P(qi | s2)

qi ∈ Q

q1 ... q2 ...

s1 0.3 s2 0.4 s3 0.3 ...

Model Minimization Find a smaller model which maintains only the relevant properties of the original model, with respect to some output variable y.

0.2 0.5 0.2

• If mt is the prediction for test t, update vector for state consists of: mao for all tests qi • Action Mapping:

True State Transition

maoqi/

g(a), k(o) = g(a'), k(o') ⇔ ∀ qi maoqi/mao = ma’o’qi/ma’o’

Output Function (y) Homomorphisms 1. 2. 3. 4.

Abstract State Transition

Initial set of tests: a1y1 (one time step, y observed) Split a,o pairs which help predict Q Extend tests by one time step using g(a), k(o) Repeat (2, 3) until no change

POMDP Homomorphisms

Value Function Homomorphims

Reduction over states, actions and observations

Start with the immediate reward as the only basis vector, as in (Poupart, 2002).

y=0

y=0

y=1 y=0

y=0 y=3

Results y=1

y=0

Original POMDP:

y=3

y=0 y=0

y=0

y=a 0.5 0.5

y=1

y=b

y=c

State, action and observation mappings:

f: S → S’ g: A → A’ ka: O → O’ Seek to predict some specific output variable y, where observation.

Task one: a = b = c

y is a function of the

Homomorphic reduction PSR Reduction

Constraints (Bayes Net View) at ...

st+1

st

at ...

...

st+1

st

...

Task two: a = 2, b = 1, c = 1.5 ((a+b)/2 = c) Value Function Reduction Output Function Reduction

ot+1

ot+1

yt

at

Output: • P(yt | f(st)) = P(yt | st)

...

st

yt st+1

...

Transitions: • P(f(st+1) | f(st), g(at), ka(ot+1)) = P(f(st+1) | st, at, ot+1) Observations: • P(ka(ot+1) | f(st), g(at)) = P(ka(ot+1) | st, at)

Acknowledgements ot+1

yt

State Specific Action/Observation Mappings If agent could believe that it might be in s1 or s2, cannot have different action mappings in those states. History specific action/obsevation mappings may be easier.

This research was facilitated in part by a National Physical Science Consortium Fellowship and by stipend support from Sandia National Laboratories, CA. This research was also funded in part by NSF grant CCF 0432143.

Citations Thomas Dean and Robert Givan. Model minimization in markov decision processes. AAAI, 1997. Masoumeh T. Izadi and Doina Precup. Model minimization by linear psr. IJCAI, 2005. Michael L. Littman, Richard S. Sutton, and Satinder P. Singh. Predictive representations of state. NIPS, 2001. Pascal Poupart and Craig Boutilier. Value-directed compression of pomdps. NIPS, 2002. B Ravindran. An Algebraic Approach to Abstraction in Reinforcement Learning. PhD thesis, University of Massachusetts, 2004. Alicia Peregrin Wolfe and Andrew G. Barto. Decision tree methods for finding reuseable mdp homomorphisms. AAAI, 2006.

POMDP Homomorphisms

tion between existing Predictive State Representation methods and homomor- ... phism, then extend PSR reduction methods to find POMDP homomorphisms.

86KB Sizes 7 Downloads 142 Views

Recommend Documents

POMDP Homomorphisms
a. 3 o. 3. , P(t | s) = P(o. 1 o. 2 o. 3. | sa. 1 a. 2 a. 3. ) State represented by set of linearly independent tests: q i. ∈ Q. State Mapping: f(s. 1. ) = f(s. 2. ) ⇔ ∀ q i. P(q.

POMDP Homomorphisms
in practice inferring the abstract model directly from data without knowing M is ... However, the precise definition of a valid reduction and algorithms for finding .... The final two constraints encode consistency in the transition and observation f

POMDP Homomorphisms - Semantic Scholar
reductions of Markov Processes [5, 6]. We formally define a POMDP homo- morphism, then extend PSR reduction methods to find POMDP homomorphisms.

DESPOT: Online POMDP Planning with Regularization
Here we give a domain-independent construction, which is the average ... We evaluated the algorithms on four domains, including a very large one with about ...

Borel homomorphisms of smooth σ-ideals
Oct 17, 2007 - Given a countable Borel equivalence relation E on a Polish space, let IE denote the σ-ideal generated by the Borel partial transversals of E. We ...

DESPOT: Online POMDP Planning with ... - NUS School of Computing
By Cayley's formula [3], the number of trees with i labeled nodes is i(i−2), thus ... the definition of a policy derivable from a DESPOT in Section 4 in the main text.

MODELLING USER BEHAVIOUR IN THE HIS-POMDP ...
ing robustness, but it requires that the N-best lists of user act hypotheses ... Based on the current belief state b, the machine selects an ... user performed his action into account. .... cess rate when using the UAM (the dialogue scores not given.

Intention-Aware Online POMDP Planning for Autonomous Driving in a ...
However, the use of POMDPs for robot planning under uncertainty is not widespread. Many are concerned about its reputedly high computational cost. In this work, we apply DESPOT [17], a state-of-the-art approx- imate online POMDP planning algorithm, t