An Introduction to Statistical Signal Processing

The idea of conditional probability can be used to provide a general representation of a joint distribution as a product, but a more complicated product than arises with an iid vector. As one would hope, the complicated form reduces to the simpler form when the vector is iid. The individual terms of the product have useful interpretations.
The use of conditional probabilities allows us to break up many problems in a convenient form and focus on the relations among random variables. Examples to be treated include statistical detection, statistical classification, and additive noise.
We begin with the discrete alphabet case as elementary conditional probability suffices in this simple case. We can derive results that appear similar for the continuous case, but nonelementary conditional probability will be required to interpret the results correctly.
Begin with the simple case of a discrete random vector ( X, Y) with alphabet A X A Y described by a pmf p X , Y ( x, y). Let p X and p Y denote the corresponding marginal pmf s. Define for each x ? A X for which p X( x) > 0 the conditional pmf p Y X( y x); y ? A Y as the elementary conditional probability of Y = y given X = x, that is,
where we have assumed that