Elements Of Applied Probability For Engineering, Mathematics And Systems Science

We consider a Markov chain whose transition probabilities are decided at each transition according to the actions of a controller. Each action has an associated cost and the goal of the controller is to minimize the expected cost up to a finite horizon N. Let
be a countable state space of the Markov chain. If the chain is in state i at time t and the controller picks an action a t from a finite action set
associated with state i then two things occur:
The transition probabilities from state i to state j are then given by K ij( t, a t).
A cost C( t,i,a t) is incurred.
Let X t represent the state at time t and let A t represent the action taken at time t. Define the past or history up to time t by
The preceding assumptions imply
where
and
are the sequence of states and actions taken until time t. This Markovian structure leads to a considerable simplification of the decision making process as we shall see.
The controller who is not clairvoyant must operate according to some policy ? ? ? which at each time t assigns an action A t depending on the past up to t and the time to the horizon; that is
This policy may, in principal, even depend on...