Random variable monads
haskell monad probability
In this note we give a mathematical translation of the monadic treatment to random variables in this amazing stackexchange answer, which is my (figuratively) sixth attempt to understanding monads.
We show the meanings of return
, bind
/ (>>=)
, join
, fmap
/ (<$>)
and ap
/ (<*>)
and the monad laws in this concrete example.
See Hackage for a clear and concise definition of monads.
For notational consistency, we use capital letters for random variables in Haskell code, as is the convention in probability theory.
We also use ==
to stand for derivation in Haskell code.
For any state space \(S\), let \(R(S)\) be the space of random variables on \(S\). In the following \(S, T, P\) are notations of state spaces, and \(X, Y, Z\) random variables.
In the stackexchange answer (we ignore the psuedorandomness and pretend everything is truly random),

return x
is defined as the atomic random variable \(\delta_x\) 
For \(f: S \to R(T)\),
X >>= f
is the random variable \(\expe_X f(X)\). It can be thought of in terms of conditional random variables. Say \(X \in R(S)\) and \(Y \in R(T)\) are two random variables, and that \(Y\) given \(X = x\) is \(f(x)\). ThenX >>= f
is the (unconditional) random variable \(Y\). 
join
can be derived from(>>=)
: Let \(X \in R(R(S))\) be a random random variable, thenjoin X = X >>= id
\(= X >>= id_{R(S)} = \expe X\) is a random variable on \(S\), obtained by averaging over the random variables \(X\) can take values in. 
X >> Y
is simply \(Y\).
Deriving fmap
/ (<$>)
:

And for \(g: S \to T\), let \(\mu_X\) be the distribution of \(X\), then
fmap g X = X >>= (return . g)
\(= \expe_X (\delta_{g(\cdot)})(X) = \int \delta_{g(x)} \mu_X(d x) = g(X)\) which agrees with the definition of therandomMap
function in that answer.
As for ap
/ (<*>)
:

Let \(F \in R(T^S)\) and \(X \in R(S)\), then
F <*> X
is defined asF <*> X = do {f < F; x < X; return (f x)} == do{f < F; fmap f X} == F >>= (\f > fmap f X)
which is equal to \(\expe_F F(X)\), a random variable on \(T\) by averaging over \(F\).
Now we translate the monad laws:

For \(x \in S\) and \(f: S \to R(T)\),
return x >>= f == f x
: \(LHS = \delta_x >>= f = \expe_{X \sim \delta_x} f(X) = f(x) = RHS\). 
For \(X \in R(S)\),
X >>= return == X
: \(LHS = \expe_X \delta_X = X = RHS\). 
For \(X \in R(S), f: S \to R(T), g: T \to R(P)\),
X >>= (\x > f x >>= g) == (X >>= f) >>= g
: Suppose \(Y \in R(T)\) and \(Z \in R(P)\) such that \((YX = x) = f(x)\) and \((ZY = y) = g(y)\), then \begin{align} LHS &= \expe_X (\expe_{f(\cdot)} g \circ f (\cdot)) (X) = \expe_X \expe_{YX = \cdot} g (YX = \cdot) = \expe_X (ZX = \cdot) (X) = Z\\ RHS &= \expe_{\expe_X f(X)} g(\expe_X f(X)) = \expe_Y g(Y) = Z. \end{align} In the LHS we average first over \(Y\) then over \(X\) whereas in the RHS we average over \(X\) then \(Y\).