UMass Machine Learning and Friends Lunch | Main / Reasoning In MD Ps By Option Model Composition

Reasoning in MDPs by Option Model Composition

David Silver

In this talk I will introduce a theoretical framework for reasoning in Markov Decision Processes. Unlike MDP planning, which uses knowledge to produce a plan, MDP reasoning uses knowledge to produce deeper knowledge. The basic building blocks of this knowledge are option models: temporal abstractions that, like macro-operators in classical planning, jump directly from a start state to an end state. The fundamental operation of MDP reasoning is to compose option models together, starting with low-level models, into increasingly abstract option models. Specifically, each option model is composed with the model that maximises progress towards a particular subgoal. I will illustrate this new framework using an iterative algorithm that simultaneously constructs optimal option models for all subgoals, and also searches over those option models to provide rapid progress towards other subgoals. The talk will conclude with a demonstration of reasoning in an MDP formulation of a classical AI task: the n-disc Tower of Hanoi problem. To solve this problem, classical MDP planning algorithms require a number of iterations that is exponential in n, whereas MDP reasoning algorithms require only a linear number of iterations.