Chapter 8: Markov Decision Theory

8.1 Introduction

We consider a Markov chain whose transition probabilities are decided at each transition according to the actions of a controller. Each action has an associated cost and the goal of the controller is to minimize the expected cost up to a finite horizon N. Let be a countable state space of the Markov chain. If the chain is in state i at time t and the controller picks an action a _t from a finite action set associated with state i then two things occur:

The transition probabilities from state i to state j are then given by K _ij( t, a _t).
A cost C( t,i,a _t) is incurred.

Let X _t represent the state at time t and let A _t represent the action taken at time t. Define the past or history up to time t by

The preceding assumptions imply

where and are the sequence of states and actions taken until time t. This Markovian structure leads to a considerable simplification of the decision making process as we shall see.

The controller who is not clairvoyant must operate according to some policy ? ? ? which at each time t assigns an action A _t depending on the past up to t and the time to the horizon; that is

This policy may, in principal, even depend on...

< Previous Excerpt Next Excerpt >

Purchase This Book

Elements Of Applied Probability For Engineering, Mathematics And Systems Science

TABLE OF CONTENTS

Chapter 8: Markov Decision Theory

8.1 Introduction

Contact Preferences

This is embarrasing...

Customize Your GlobalSpec Experience

Select Your Free Newsletters

Industry Newsletters

Select Your Free Product Alerts

This is embarrasing...