Example 5.4.2

Setting up the likelihood function

Haokun asks how do one actually show that \(Y_{\mathsf{max}}\) is the maximum likelihood estimator. Thanks to Haokun for asking this question.

The setup is slightly more complicated than what you see in Example 5.2.4. But you have to start with the idea that the likelihood function is a function of \(\theta\) and the data.

Let \(\theta\) be fixed. Without loss of generality, suppose there is some \(y_1\) such that \(y_1 > \theta\) (this could have been any other \(y_i\)).1 Then, the density at this \(y_1i\) has to be zero, i.e. \(f\left(y_1;\theta\right)=0\). Since we are in the IID case, the likelihood function has to be a product of the form \(L\left(\theta\right)=L\left(\theta; y_1,\ldots,y_n\right)=\displaystyle\prod_{i=1}^n f\left(y_i;\theta\right)\). Because one of these densities is equal to zero, \(L\left(\theta\right)=0\). Therefore, \(L\left(\theta\right)=0\) if \(y_{\mathsf{max}}> \theta\). The latter follows because once there is just one \(y_1 > \theta\), \(y_{\mathsf{max}}\) has to be greater than \(\theta\).

Now, suppose \(y_{\mathsf{max}} \leq \theta\). For every \(y_i\), we have \(\displaystyle f\left(y_i;\theta\right)=\frac{2y_i}{\theta^2}\). Again, since we are in the IID case, the likelihood function has to be a product of the form \[L\left(\theta\right)=L\left(\theta; y_1,\ldots,y_n\right)=\displaystyle\prod_{i=1}^n f\left(y_i;\theta\right)=\prod_{i=1}^n \frac{2y_i}{\theta^2}=\frac{2^n}{\theta^{2n}}\prod_{i=1}^n y_i\].

Putting all these together, \[L\left(\theta\right)=\begin{cases} \displaystyle\frac{2^n}{\theta^{2n}}\prod_{i=1}^n y_i & \mathrm{if}\ y_{\mathsf{max}} \leq \theta \\ 0 & \mathrm{if}\ y_{\mathsf{max}} > \theta \end{cases}\] Another way to write this is \[L\left(\theta\right)= \displaystyle\frac{2^n}{\theta^{2n}}\left(\prod_{i=1}^n y_i \right) I_{[0,\theta]\left(y_{\mathsf{max}}\right)}\]

To maximize this likelihood function, \(\widehat{\theta}_{\mathsf{MLE}}\) has to be the value of \(\theta\) which will make the likelihood function the largest. \(\widehat{\theta}_{\mathsf{MLE}}\) has to be \(y_{\mathsf{max}}\). If \(\widehat{\theta}_{\mathsf{MLE}} < y_{\mathsf{max}}\), the likelihood function at \(\widehat{\theta}_{\mathsf{MLE}}\) would be zero. If \(\widehat{\theta}_{\mathsf{MLE}} \geq y_{\mathsf{max}}\), the likelihood function will be positive. This means that \(\widehat{\theta}_{\mathsf{MLE}}\) has to be greater than or equal to \(y_{\mathsf{max}}\). We now rule out the “greater than” part of the statement. Whenever \(\widehat{\theta}_{\mathsf{MLE}} \geq y_{\mathsf{max}}\), the likelihood function \(\displaystyle\frac{2^n}{\theta^{2n}}\left(\prod_{i=1}^n y_i \right)\) is a strictly decreasing function of \(\theta\). Therefore, the highest value of the likelihood function is achieved whenever \(\widehat{\theta}_{\mathsf{MLE}} = y_{\mathsf{max}}\).

Below you will find a rough picture of the graph:

Capital \(Y\) or small \(y\)

Haokun asks the difference between the case where I put \(Y\) in capital letters or in small letters. Thanks to Haokun for asking this question.

The argument earlier when we setup the likelihood function is for the observed data \(\left(y_1,\ldots,y_n\right)\). But it equally applies when we treat \(\left(Y_1,\ldots,Y_n\right)\) as random variables. This reflects what we did in the first four weeks of the course where we usually get to see an estimate \(\overline{y}\) from our one dataset and we are studying the performance of the procedure or estimator \(\overline{Y}\).

When we talk about the properties of the MLE, we are in a setting where the data \(\left(Y_1,\ldots,Y_n\right)\) are treated as random variables. This is what you see here, when we discussed why MLE works. You also see it here or in Theorem 5.5.1 where you see that you are taking the expected value of the negative of the second derivative of the log-likelihood. The \(Y\)’s are in capital letters because the data are treated as random variables.

Footnotes

  1. If you are wondering what happens when there is some \(y_1<0\), the likelihood function here is also zero and we can rule this out in the same way I show in the Apr 13 course diary.↩︎