Elements Of Applied Probability For Engineering, Mathematics And Systems Science

Chapter 8: Markov Decision Theory

8.1 Introduction

We consider a Markov chain whose transition probabilities are decided at each transition according to the actions of a controller. Each action has an associated cost and the goal of the controller is to minimize the expected cost up to a finite horizon N. Let be a countable state space of the Markov chain. If the chain is in state i at time t and the controller picks an action a t from a finite action set associated with state i then two things occur:

  • The transition probabilities from state i to state j are then given by K ij( t, a t).

  • A cost C( t,i,a t) is incurred.

Let X t represent the state at time t and let A t represent the action taken at time t. Define the past or history up to time t by

The preceding assumptions imply

where and are the sequence of states and actions taken until time t. This Markovian structure leads to a considerable simplification of the decision making process as we shall see.

The controller who is not clairvoyant must operate according to some policy ? ? ? which at each time t assigns an action A t depending on the past up to t and the time to the horizon; that is

This policy may, in principal, even depend on...

UNLIMITED FREE
ACCESS
TO THE WORLD'S BEST IDEAS

SUBMIT
Already a GlobalSpec user? Log in.

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.

Customize Your GlobalSpec Experience

Category: Thermostats
Finish!
Privacy Policy

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.