Homework 02 Solution

Exercise A: ID numbers and contributions

1 bonus point earned for doing this exercise properly.

Exercise B: Chips Ahoy, Part 2

You have personally observed that the number of chocolate chips on a Chips Ahoy cookie is not the same across all cookies. In other words, there is variability in the number of chocolate chips even though these cookies are mass-produced and have quality control standards. You will now go through an argument meant to motivate the choice of a parametric statistical model to describe the variability of $X$ .

Chips Ahoy, Part 1 gave you a chance to work through what could be a suitable model for the number of chocolate chip cookies. You have reached the conclusion that $X \sim Poi (λ)$ , where $λ = m / n$ . Both $m$ and $n$ are unobservable from the point of view of the consumer. In this exercise, you will be constructing a rough confidence interval for $λ$ from the point of view of the consumer. Let $Y_{1}, \dots, Y_{J}$ be IID draws from $Poi (λ)$ , where $J$ is the sample size¹.

(3 points) Write down a proof, along with your reasoning, of the following statement: $P (| \frac{\overset{―}{Y} - λ}{\sqrt{λ / J}} | \leq c) \geq 1 - \frac{1}{c^{2}}$

ANSWER: Observe that $Y_{1}, \dots, Y_{J}$ are IID draws from $Poi (λ)$ . Therefore, $E (\overset{―}{Y}) = λ$ and $Var (\overset{―}{Y}) = λ / n$ . By Chebyshev’s inequality, we have for any $ε > 0$ , $P (| \overset{―}{Y} - E (\overset{―}{Y}) | \leq ε) \geq 1 - \frac{Var (\overset{―}{Y})}{ε^{2}} \Rightarrow P (| \overset{―}{Y} - λ | \leq ε) \geq 1 - \frac{λ}{J ε^{2}} .$ If we choose $ε = c λ / J$ , we obtain $P (| \frac{\overset{―}{Y} - λ}{\sqrt{λ / J}} | \leq c) \geq 1 - \frac{1}{c^{2}} .$

(1 point) Note that $| \frac{\overset{―}{Y} - λ}{\sqrt{λ / J}} | \leq c \Leftrightarrow {(\frac{\overset{―}{Y} - λ}{\sqrt{λ / J}})}^{2} \leq c^{2} .$ Do some algebra and show that ${(\frac{\overset{―}{Y} - λ}{\sqrt{λ / J}})}^{2} \leq c^{2} \Leftrightarrow λ^{2} + λ (- 2 \overset{―}{Y} - \frac{c^{2}}{J}) + {(\overset{―}{Y})}^{2} \leq 0$

I do not show the solution for this part, as this is straightforward algebra.

(4 points) Consider a modified version of the definition of a confidence interval found here. In particular, let $θ$ be an unknown parameter of interest or estimand and $c > 0$ . Let $L$ and $U$ be statistics. $(L, U)$ is a $100 (1 - 1 / c^{2}) %$ finite-sample conservative $100 (1 - 1 / c^{2}) %$ confidence interval for $θ$ if $P (L \leq θ \leq U) \geq 1 - \frac{1}{c^{2}}$ for every $θ$ . Use the result in Item 2 to construct a finite-sample conservative $100 (1 - 1 / c^{2}) %$ confidence interval for $λ$ . You should give me expressions for $L$ and $U$ , which will achieve the lower bound $1 - 1 / c^{2}$ .

First, solve the quadratic equation $λ^{2} + λ (- 2 \overset{―}{Y} - \frac{c^{2}}{J}) + {(\overset{―}{Y})}^{2} = 0.$ The solutions are given by $L = \frac{2 \overset{―}{Y} + \frac{c^{2}}{J} - \sqrt{\frac{4 \overset{―}{Y} c^{2}}{J} + \frac{c^{4}}{J^{2}}}}{2}, U = \frac{2 \overset{―}{Y} + \frac{c^{2}}{J} + \sqrt{\frac{4 \overset{―}{Y} c^{2}}{J} + \frac{c^{4}}{J^{2}}}}{2} .$ Therefore, the solutions to the inequality $λ^{2} + λ (- 2 \overset{―}{Y} - \frac{c^{2}}{J}) + {(\overset{―}{Y})}^{2} \leq 0$ are given by $L \leq λ \leq U$ . Next, show that the desired guarantee is attained: $P (L \leq λ \leq U) = P ({(\frac{\overset{―}{Y} - λ}{\sqrt{λ / J}})}^{2} \leq c^{2}) = P (| \frac{\overset{―}{Y} - λ}{\sqrt{λ / J}} | \leq c) \geq 1 - \frac{1}{c^{2}} .$

(1 point) Recall we constructed a confidence interval for $μ$ in the IID normal case. We used an approach where we pretended to know $σ^{2}$ . Do we have to pretend to know $σ^{2}$ for the Poisson case? Explain.

We cannot even pretend to know $σ^{2}$ for the Poisson case because $σ^{2} = λ$ and $λ$ is unknown.

(2 points) You will now apply the interval found in Item 3 to obtain a finite-sample conservative 95% confidence interval for $λ$ using our cookies dataset. You may have to use read.csv(). Report the R code you have used and your finding.

We must set $c^{2} = 20$ because we want a finite-sample conservative 95% confidence interval for $λ$ . Below, we produce the calculations which apply the interval obtained in Item 3.

cookie <- read.csv("data_cookie.csv")
ybar <- mean(cookie$numchoc)
J <- length(cookie$numchoc)
c <- sqrt(20)
c(2*ybar + c^2/J - sqrt(4*ybar*c^2/J + c^4/J^2), 2*ybar + c^2/J + sqrt(4*ybar*c^2/J + c^4/J^2))/2

[1] 13.64160 16.71778

(4 points) You will be using a Monte Carlo simulation to evaluate the performance of the finite-sample conservative $100 (1 - 1 / c^{2}) %$ confidence interval for $λ$ . Consider the R code we used to demonstrate what a confidence interval means in the IID normal case here. I created a modified version below. You have to fill up the missing parts accordingly for the case of a finite-sample conservative $100 (1 - 1 / c^{2}) %$ confidence interval for $λ$ . Report the code you have used and your finding for mean(results[1,] < lambda & lambda < results[2,]). Discuss whether your finding matches the theoretical guarantee developed in Item 3.

ANSWER: Below you will find the R code used. The theoretical guarantee matches the one found in Item 3. In fact, the lower bound 3/4 (when $c = 2$ ) is quite conservative.

# Set a value for lambda to generate artificial datasets following a Poisson distribution
lambda <- 14
# Set c for the desired 1-1/c^2  
c <- 2
# Create a function depending on sample size J
cons.ci <- function(J)
{
  y <- rpois(J, lambda)
  ybar <- mean(y)
  c(2*ybar + c^2/J - sqrt(4*ybar*c^2/J + c^4/J^2), 2*ybar + c^2/J + sqrt(4*ybar*c^2/J + c^4/J^2))/2
}
# Construct interval 10000 times, you can change 5 into something else
results <- replicate(10^4, cons.ci(J))
# Calculate how many 
mean(results[1,] < lambda & lambda < results[2,])

[1] 0.9553

NOTE: It is not necessary to match the simulation design to the Chips Ahoy example, but you may do so. For one thing, we really do not know the real value of $λ$ .

R NOTE: The object J in cons.ci(J) points to the $J$ available in memory which should match J <- length(cookie$numchoc). The J in the function definition for cons.ci is only a placeholder.

Exercise C: The IID Uniform case

This exercise should force you to start working on the examples and exercises in the book. This exercise is a version of LM Examples 5.4.2, 5.4.6, 5.7.1, and LM Exercises 5.4.18, 5.7.4, 5.7.5. You may also need to use integration by parts at some point for the mathematical derivations.

Let $Y_{1}, \dots, Y_{n}$ be IID $U (0, θ)$ , where $θ > 0$ . The common pdf of these random variables is given by $f (y) = {\begin{cases} \frac{1}{θ} & 0 \leq y \leq θ \\ 0 & otherwise \end{cases},$ the common mean is $θ / 2$ , and the common variance is $θ^{2} / 12$ . The estimand for this exercise is $θ$ .

You will be considering three estimators for $θ$ : ${\hat{θ}}_{1} = (n + 1) Y_{\min}, {\hat{θ}}_{2} = 2 \overset{―}{Y}, {\hat{θ}}_{3} = \frac{n + 1}{n} Y_{\max} .$ You will learn later that ${\hat{θ}}_{1}$ , which depends on smallest order statistic², $Y_{\min}$ is a bias-adjusted estimator that applies the plug-in principle. You will also learn later that ${\hat{θ}}_{2} = 2 \overset{―}{Y}$ is a method of moments estimator. You will also learn later that ${\hat{θ}}_{3}$ , which depends on the largest order statistic $Y_{\max}$ , is a bias-adjusted maximum likelihood estimator.

ANSWER: First, let me write down all the calculus-based results I am going to need. The cdf for $U (0, θ)$ is given by $F (y) = \int_{- \infty}^{y} f (s) d s = {\begin{cases} 0 & if y < 0 \\ \int_{- \infty}^{0} f (s) d s + \int_{0}^{y} f (s) d s = \int_{0}^{y} \frac{1}{θ} d s = \frac{y}{θ} & if 0 \leq y \leq θ \\ \int_{- \infty}^{θ} f (s) d s + \int_{θ}^{y} f (s) d s = 1 & if y > θ \end{cases}$ Next, by Theorem 3.10.1, the pdfs of $Y_{\max}$ and $Y_{\min}$ are given by $f_{Y_{\max}} (y) = \frac{n y^{n - 1}}{θ^{n}}, f_{Y_{\min}} (y) = \frac{n}{θ} {(1 - \frac{y}{θ})}^{n - 1}$ for $0 \leq y \leq θ$ . Finally, I collect the following integration results below to tidy up the calculations later.

For positive integers $k$ , we have $\int {(1 - \frac{s}{θ})}^{k} d s = - \frac{θ}{k + 1} {(1 - \frac{s}{θ})}^{k + 1}$ This result is obtained using integration by substitution.
For positive integers $m$ and $k$ , we have $\int s^{m} {(1 - \frac{s}{θ})}^{k} d s = - \frac{θ s^{m}}{k + 1} {(1 - \frac{s}{θ})}^{k + 1} + \frac{m θ}{k + 1} \int s^{m - 1} {(1 - \frac{s}{θ})}^{k + 1} d s$ So, if $m = 1$ and $k = n - 1$ , $\int_{0}^{θ} s^{m} {(1 - \frac{s}{θ})}^{k} d s = \frac{θ^{2}}{n (n + 1)}$ . If $m = 2$ and $k = n - 1$ , $\int_{0}^{θ} s^{m} {(1 - \frac{s}{θ})}^{k} d s = \frac{2 θ^{3}}{n (n + 1) (n + 2)} .$

(6 points) Find $E ({\hat{θ}}_{1})$ , $E ({\hat{θ}}_{2})$ , and $E ({\hat{θ}}_{3})$ . One of the calculations is much easier. Can you explain why?

ANSWER:

We need to derive the pdf of $Y_{\min}$ and to directly calculate some integrals. These results were collected earlier. So, we have

$E ({\hat{θ}}_{1}) = (n + 1) E (Y_{\min}) = \frac{(n + 1) n}{θ} \int_{0}^{θ} s {(1 - \frac{s}{θ})}^{n - 1} d s = \frac{(n + 1) n}{θ} \cdot \frac{θ^{2}}{n (n + 1)} = θ$ Because $Y_{1}, \dots, Y_{n}$ are IID $U (0, θ)$ , $E (\overset{―}{Y}) = θ / 2$ . Therefore,

$E ({\hat{θ}}_{2}) = 2 E (\overset{―}{Y}) = 2 θ / 2 = θ$ We also need to derive the pdf of $Y_{\max}$ and to directly calculate some integrals. So, we have $E ({\hat{θ}}_{3}) = \frac{n + 1}{n} E (Y_{\max}) = \frac{n + 1}{θ^{n}} \int_{0}^{θ} s^{n} d s = \frac{n + 1}{θ^{n}} \cdot \frac{θ^{n + 1}}{n + 1} = θ$

The easiest to work out was $E ({\hat{θ}}_{2})$ because you do not need the distribution of $\overset{―}{Y}$ . The linearity of $\overset{―}{Y}$ substantially simplifies the calculations compared to the other two estimators.

NOTE: $Y_{\min}$ took a lot more time to work out compared to $Y_{\max}$ , as the latter does not require integration by parts.

(6 points) Find $Var ({\hat{θ}}_{1})$ , $Var ({\hat{θ}}_{2})$ , and $Var ({\hat{θ}}_{3})$ . Which estimator has a variance that goes to zero more rapidly?

ANSWER: Using the previously collected results, we have

$E ({\hat{θ}}_{1}^{2}) = {(n + 1)}^{2} E (Y_{\min}^{2}) = \frac{{(n + 1)}^{2} n}{θ} \cdot \frac{2 θ^{3}}{n (n + 1) (n + 2)} = \frac{2 (n + 1) θ^{2}}{n + 2} .$ As a result, $Var ({\hat{θ}}_{1}) = \frac{2 (n + 1) θ^{2}}{n + 2} - θ^{2} = \frac{n}{n + 2} θ^{2} .$ Next, $Y_{1}, \dots, Y_{n}$ are IID $U (0, θ)$ , so $Var (\overset{―}{Y}) = θ^{2} / (12 n)$ . Therefore, $Var ({\hat{θ}}_{2}) = 4 Var (\overset{―}{Y}) = \frac{1}{3 n} θ^{2} .$ Finally, $E ({\hat{θ}}_{3}^{2}) = {(\frac{n + 1}{n})}^{2} E (Y_{\max}^{2}) = {(\frac{n + 1}{n})}^{2} \cdot \frac{n}{θ^{n}} \int_{0}^{θ} s^{n + 1} d s = \frac{{(n + 1)}^{2}}{n (n + 2)} θ^{2} .$ Therefore, $Var ({\hat{θ}}_{3}) = \frac{{(n + 1)}^{2}}{n (n + 2)} θ^{2} - θ^{2} = \frac{1}{n (n + 2)} θ^{2}$

${\hat{θ}}_{3}$ has a variance that goes to zero more rapidly compared to ${\hat{θ}}_{1}$ and ${\hat{θ}}_{2}$ .

(3 points) Read the portion of the notes related to squared-error consistency here. There you will find the definition of squared-error consistency, bias, asymptotically unbiased, and MSE. Using what you already have in Item 1, what would be the bias of $Y_{\min}$ , $2 \overset{―}{Y}$ , and $Y_{\max}$ . Are these estimators unbiased for $θ$ ? asymptotically unbiased for $θ$ ? Why do you think ${\hat{θ}}_{1}$ and ${\hat{θ}}_{3}$ are labeled bias-adjusted?

ANSWER: The bias of $Y_{\min}$ for $θ$ is $E (Y_{\min}) - θ = - \frac{n}{n + 1} θ$ . The bias of $2 \overset{―}{Y}$ for for $θ$ is 0. The bias of $Y_{\max}$ for $θ$ is $E (Y_{\max}) - θ = - \frac{1}{n + 1} θ$ . Therefore, only $2 \overset{―}{Y}$ is unbiased for $θ$ .

The limit of the bias of $2 \overset{―}{Y}$ and $Y_{\max}$ as $n \to \infty$ are both equal to zero. Therefore, these two estimators are asymptotically unbiased for $θ$ .

Since $E (Y_{m a x}) = \frac{n}{n + 1} θ$ , we must have $E (\frac{n + 1}{n} Y_{m a x}) = θ$ . Observe that $\frac{n + 1}{n} Y_{m a x}$ is exactly ${\hat{θ}}_{3}$ .

In a similar fashion, $E (Y_{m i n}) = \frac{1}{n + 1} θ$ , we must have $E ((n + 1) Y_{m i n}) = θ$ . Observe that $(n + 1) Y_{m i n}$ is exactly ${\hat{θ}}_{1}$ . Therefore, in both cases, we can justify the label bias-adjusted estimators.

(3 points) Determine whether or not ${\hat{θ}}_{1}$ , ${\hat{θ}}_{2}$ , and ${\hat{θ}}_{3}$ are squared-error consistent for $θ$ . Show your work and reasoning.

Since ${\hat{θ}}_{1}$ , ${\hat{θ}}_{2}$ , and ${\hat{θ}}_{3}$ are all unbiased for $θ$ , they are also asymptotically unbiased for $θ$ .

From Item 2, $lim_{n \to \infty} Var ({\hat{θ}}_{1}) = lim_{n \to \infty} \frac{n}{n + 2} θ^{2} = θ^{2} \neq 0.$ Therefore, ${\hat{θ}}_{1}$ is not squared-error consistent for $θ$ .

From Item 2, observe that both ${\hat{θ}}_{2}$ and ${\hat{θ}}_{3}$ have variances that go to zero as $n \to \infty$ . Because ${\hat{θ}}_{2}$ and ${\hat{θ}}_{3}$ are asymptotically unbiased and have variances that go to zero asymptotically, both these estimators are squared-error consistent for $θ$ .

Below you see sample code to demonstrate the apply() function. To learn more about it, type ?apply. But the idea it is to “apply” a function (either built-in or user-defined) to either the rows or columns of an array. This is a convenient way to repeatedly do something to the columns or rows of an array without really specifying a for loop. For example, you encountered colMeans() which may be formulated in terms of apply().

n <- 5
mu <- 1
sigma.sq <- 4
nsim <- 8 # number of realizations to be obtained
# repeatedly obtain realizations
ymat <- replicate(nsim, rnorm(n, mu, sqrt(sigma.sq)))
ymat

           [,1]        [,2]      [,3]     [,4]       [,5]      [,6]      [,7]
[1,]  1.3904051 -0.03111589  2.369097 2.213403 2.22615523 -2.934541 2.3109174
[2,]  0.3284039  2.29255777  1.778515 3.661879 1.00847223 -2.319194 3.6014871
[3,] -2.1726322  0.03044256 -3.045596 1.200483 3.73690601 -1.535897 3.0837312
[4,]  0.6038961  0.26164353  5.022951 2.845416 0.01033531  2.566453 0.4153617
[5,]  0.7250735  1.18166540  2.054126 2.044209 2.23500607  3.795071 1.8485274
           [,8]
[1,]  3.9297191
[2,]  2.2290166
[3,]  1.7287033
[4,] -0.6197381
[5,] -1.8608037

apply(ymat, 2, mean)

[1]  0.17502928  0.74703867  1.63581857  2.39307784  1.84337497 -0.08562156
[7]  2.25200495  1.08137944

apply(ymat, 1, mean)

[1] 1.4342550 1.5726422 0.3782676 1.3882898 1.5028593

(6 points) You are going to repeatedly generate $10^{4}$ artificial datasets from $U (0, θ)$ . Set $n = 5$ and $n = 500$ , but you can choose any valid value of $θ$ . You may have to use the apply(), min(), max(), and runif() commands for this item. Your tasks are (a) to apply the estimators ${\hat{θ}}_{1}$ , ${\hat{θ}}_{2}$ , and ${\hat{θ}}_{3}$ to the artificial datasets (b) to produce histograms of the sampling distributions of ${\hat{θ}}_{1}$ , ${\hat{θ}}_{2}$ , and ${\hat{θ}}_{3}$ (c) to report your own R code by modifying the code below. What do you notice about the sampling distributions of each estimator?

theta <- 1
nsim <- 10^4 # number of realizations to be obtained
# repeatedly obtain realizations
ymat <- replicate(nsim, runif(5, min = 0, max = theta))
# calculate estimates
theta1hat <- 6*apply(ymat, 2, min)
theta2hat <- 2*colMeans(ymat)
theta3hat <- 6/5*apply(ymat, 2, max)
# put a 1x3 canvas for the case of n=5
par(mfrow=c(1,3))
hist(theta1hat, freq = FALSE)
hist(theta2hat, freq = FALSE)
hist(theta3hat, freq = FALSE)

c(mean(theta1hat), mean(theta2hat), mean(theta3hat))

[1] 0.9935800 0.9930326 0.9979351

c(var(theta1hat), var(theta2hat), var(theta3hat))

[1] 0.70735197 0.06603466 0.02905609

# n=500
ymat <- replicate(nsim, runif(500, min = 0, max = theta))
# calculate estimates
theta1hat <- 501*apply(ymat, 2, min)
theta2hat <- 2*colMeans(ymat)
theta3hat <- 501/500*apply(ymat, 2, max)
# put a 1x3 canvas for the case of n=500
par(mfrow=c(1,3))
hist(theta1hat, freq = FALSE)
hist(theta2hat, freq = FALSE)
hist(theta3hat, freq = FALSE)

c(mean(theta1hat), mean(theta2hat), mean(theta3hat))

[1] 1.0046896 1.0000454 0.9999998

c(var(theta1hat), var(theta2hat), var(theta3hat))

[1] 9.782793e-01 6.656409e-04 3.967086e-06

(1 point) Report your findings for c(mean(theta1hat), mean(theta2hat), mean(theta3hat)) and c(var(theta1hat), var(theta2hat), var(theta3hat)) when $n = 5$ and $n = 500$ . Compare with the theoretical results obtained in Items 1 and 2.

Findings from the simulation could be found in Item 5, where we fixed $θ = 1$ by design. We should see a value close to 1 for the c(mean(theta1hat), mean(theta2hat), mean(theta3hat)) regardless of whether $n = 5$ or $n = 500$ .

For c(var(theta1hat), var(theta2hat), var(theta3hat)), we should obtain values roughly around 0.714, 0.067, 0.029, respectively, when $n = 5$ . When $n = 500$ , we should obtain values roughly around 1, $6.7 \times 10^{- 4}$ , and $4 \times 10^{- 6}$ .

The theoretical and simulation results match very well for the case where $θ = 1$ .

Footnotes

I used $J$ for sample size instead of $n$ , which is what we usually use in class. The main reason is that I have already used up $n$ for the exercise.↩︎
If you forgot this concept, consult LM Section 3.10.↩︎