law of the unconscious statistician

Law of the unconscious statistician

The law of the unconscious statistician is a theorem which expresses the expected value of a function \[g(X)\] of a random variable \[X\] in terms of \[g\] and the probability distribution of \[X\].

Discrete probability distribution

Let \[X\] be a discrete random variable, \[\mathcal{X}\] as the support and probability mass function \[f_{X}(x)\], then the expected value of \[g(X)\] is \[E(g(X))=\sum_{x\in \mathcal{X}}g(x)f_{X}(x)\].

Say we've computed \[E(X)\] for some distribution \[X\], \[E(X)=\sum_{x\in \mathcal{X}}xf_{X}(x)\]. Now we're looking to compute the variance, \[E(X^{2})-E(X)^{2}\]. Logically, we would need to compute \[E(X^{2})\], which is the expected value of a new distribution, \[Y=X^{2}\]. Well, the unconscious statistician often don't feel like computing another PMF, so instead just reasons by analogy that if \[E(X)=\sum_{x\in \mathcal{X}}xf_{X}(x)\], then surely we can simply replace the \[X\] with an \[X^{2}\], \[E(X^{2})=\sum_{x\in \mathcal{X}}x^{2}f_{X}(x)\]. This isn't very legitimate, but in general, this laziness turns out to be true, i.e. \[E(g(x))=\sum_{x\in\mathcal{X}}g(x)f_{X}(x)\]. This also explains the origins of the name, as some statisticians unknowingly present this as a definition of expected value rather than a theorem.

Now, we shall prove this theorem. Suppose \[g\] is differentiable and its inverse \[g^{-1}\] is monotonic. The expected value of \[Y=g(X)\] is defined as \[E(Y)=\sum_{y\in \mathcal{Y}}yf_{Y}(y)\]. Writing the PMF \[f_{Y}(y)\] in terms of \[y=g(x)\], we get, \[E(g(X))=\sum_{y\in \mathcal{Y}}y\cdot P(g(x)=y)\] where \[P\] is the probability measure. Then,

\begin{align*} E(g(x))&=\sum_{y\in \mathcal{Y}}y\cdot P(x=g^{-1}(y))\\ &=\sum_{y\in \mathcal{Y}}y\sum_{x=g^{-1}(y)}f_{X}(x)\\ &=\sum_{y\in \mathcal{Y}}\sum_{x=g^{-1}(y)}yf_{X}(x)\\ &=\sum_{y\in \mathcal{Y}}\sum_{x=g^{-1}(y)}g(x)f_{X}(x) \end{align*}

Note that since \[\forall y,x=g^{-1}(y)\] is equivalent to just saying \[\forall x\], we conclude that \[E(g(X))=\sum_{x\in \mathcal{X}}g(x)f_{X}(x)\].

Continuous probability distribution

Similarly, if \[X\] is continuous with probability density function \[f_{X}\], then the expected value of \[g(X)\] is \[E(g(X))=\int_{-\infty}^{\infty}g(x)f_{X}(x)\,dx\].

Let \[y=g(x)\] and \[F_{X}\] denote the cumulative distribution function of random variable \[X\]. By inverse function rule, \[\frac{d}{dy}g^{-1}(y)=\frac{1}{g^{\prime}(g^{-1}(y))}\]. Substitute \[x=g^{-1}(y)\], \[dx=\frac{1}{g^{\prime}(g^{-1}(y))}dy\], then \[E(g(X))=\int_{-\infty}^{\infty}g(g^{-1}(y))f_{X}(g^{-1}(y))\frac{1}{g^{\prime}(g^{-1}(y))}\,dy\]. Next, considering the relationship between the CDF of \[Y\] and \[X\], \[F_{Y}(y)=P(Y\le y)=P(g(X)\le Y)=P(X\le g^{-1}(y))=F_{X}(g^{-1}(y))\]. Then, differentiating \[f_{Y}(y)\],

\begin{align*} f_{Y}(y)&=\frac{d}{dy}F_{Y}(y)\\ &=\frac{d}{dy}F_{X}(g^{-1}(y))\\ &=f_{X}(g^{-1}(y))\cdot \frac{d}{dy}g^{-1}(y)\\ &=f_{X}(g^{-1}(y))\frac{1}{g^{\prime}(g^{-1}(y))}\\ \end{align*}

With this result, we can conclude that \[E(g(X))=\int_{-\infty}^{\infty}g(g^{-1}(y))f_{X}(g^{-1}(y))\frac{1}{g^{\prime}(g^{-1}(y))}\,dy=\int_{-\infty}^{\infty}yf_{Y}(y)\,dy\]. Therefore, \[E(g(X))=\int_{-\infty}^{\infty}g(x)f_{X}(x)\,dx\].

Referenced by:

No backlinks found.