Homework 02 Solution

Exercise A: ID numbers and contributions

1 bonus point earned for doing this exercise properly.

Exercise B: Chips Ahoy, Part 2

You have personally observed that the number of chocolate chips on a Chips Ahoy cookie is not the same across all cookies. In other words, there is variability in the number of chocolate chips even though these cookies are mass-produced and have quality control standards. You will now go through an argument meant to motivate the choice of a parametric statistical model to describe the variability of X.

Chips Ahoy, Part 1 gave you a chance to work through what could be a suitable model for the number of chocolate chip cookies. You have reached the conclusion that XPoi(λ), where λ=m/n. Both m and n are unobservable from the point of view of the consumer. In this exercise, you will be constructing a rough confidence interval for λ from the point of view of the consumer. Let Y1,,YJ be IID draws from Poi(λ), where J is the sample size.

  1. (3 points) Write down a proof, along with your reasoning, of the following statement: P(|Yλλ/J|c)11c2

ANSWER: Observe that Y1,,YJ are IID draws from Poi(λ). Therefore, E(Y)=λ and Var(Y)=λ/n. By Chebyshev’s inequality, we have for any ε>0, P(|YE(Y)|ε) 1Var(Y)ε2P(|Yλ|ε) 1λJε2. If we choose ε=cλ/J, we obtain P(|Yλλ/J|c)11c2.

  1. (1 point) Note that |Yλλ/J|c(Yλλ/J)2c2. Do some algebra and show that (Yλλ/J)2c2λ2+λ(2Yc2J)+(Y)20

I do not show the solution for this part, as this is straightforward algebra.

  1. (4 points) Consider a modified version of the definition of a confidence interval found here. In particular, let θ be an unknown parameter of interest or estimand and c>0. Let L and U be statistics. (L,U) is a 100(11/c2)% finite-sample conservative 100(11/c2)% confidence interval for θ if P(LθU)11c2 for every θ. Use the result in Item 2 to construct a finite-sample conservative 100(11/c2)% confidence interval for λ. You should give me expressions for L and U, which will achieve the lower bound 11/c2.

First, solve the quadratic equation λ2+λ(2Yc2J)+(Y)2=0. The solutions are given by L=2Y+c2J4Yc2J+c4J22,   U=2Y+c2J+4Yc2J+c4J22. Therefore, the solutions to the inequality λ2+λ(2Yc2J)+(Y)20 are given by LλU. Next, show that the desired guarantee is attained: P(LλU)=P((Yλλ/J)2c2)=P(|Yλλ/J|c)11c2.

  1. (1 point) Recall we constructed a confidence interval for μ in the IID normal case. We used an approach where we pretended to know σ2. Do we have to pretend to know σ2 for the Poisson case? Explain.

We cannot even pretend to know σ2 for the Poisson case because σ2=λ and λ is unknown.

  1. (2 points) You will now apply the interval found in Item 3 to obtain a finite-sample conservative 95% confidence interval for λ using our cookies dataset. You may have to use read.csv(). Report the R code you have used and your finding.

We must set c2=20 because we want a finite-sample conservative 95% confidence interval for λ. Below, we produce the calculations which apply the interval obtained in Item 3.

cookie <- read.csv("data_cookie.csv")
ybar <- mean(cookie$numchoc)
J <- length(cookie$numchoc)
c <- sqrt(20)
c(2*ybar + c^2/J - sqrt(4*ybar*c^2/J + c^4/J^2), 2*ybar + c^2/J + sqrt(4*ybar*c^2/J + c^4/J^2))/2
[1] 13.64160 16.71778
  1. (4 points) You will be using a Monte Carlo simulation to evaluate the performance of the finite-sample conservative 100(11/c2)% confidence interval for λ. Consider the R code we used to demonstrate what a confidence interval means in the IID normal case here. I created a modified version below. You have to fill up the missing parts accordingly for the case of a finite-sample conservative 100(11/c2)% confidence interval for λ. Report the code you have used and your finding for mean(results[1,] < lambda & lambda < results[2,]). Discuss whether your finding matches the theoretical guarantee developed in Item 3.

ANSWER: Below you will find the R code used. The theoretical guarantee matches the one found in Item 3. In fact, the lower bound 3/4 (when c=2) is quite conservative.

# Set a value for lambda to generate artificial datasets following a Poisson distribution
lambda <- 14
# Set c for the desired 1-1/c^2  
c <- 2
# Create a function depending on sample size J
cons.ci <- function(J)
{
  y <- rpois(J, lambda)
  ybar <- mean(y)
  c(2*ybar + c^2/J - sqrt(4*ybar*c^2/J + c^4/J^2), 2*ybar + c^2/J + sqrt(4*ybar*c^2/J + c^4/J^2))/2
}
# Construct interval 10000 times, you can change 5 into something else
results <- replicate(10^4, cons.ci(J))
# Calculate how many 
mean(results[1,] < lambda & lambda < results[2,])
[1] 0.9553

NOTE: It is not necessary to match the simulation design to the Chips Ahoy example, but you may do so. For one thing, we really do not know the real value of λ.

R NOTE: The object J in cons.ci(J) points to the J available in memory which should match J <- length(cookie$numchoc). The J in the function definition for cons.ci is only a placeholder.

Exercise C: The IID Uniform case

This exercise should force you to start working on the examples and exercises in the book. This exercise is a version of LM Examples 5.4.2, 5.4.6, 5.7.1, and LM Exercises 5.4.18, 5.7.4, 5.7.5. You may also need to use integration by parts at some point for the mathematical derivations.

Let Y1,,Yn be IID U(0,θ), where θ>0. The common pdf of these random variables is given by f(y)={1θ   0yθ0   otherwise, the common mean is θ/2, and the common variance is θ2/12. The estimand for this exercise is θ.

You will be considering three estimators for θ: θ^1=(n+1)Ymin,   θ^2=2Y,   θ^3=n+1nYmax. You will learn later that θ^1, which depends on smallest order statistic, Ymin is a bias-adjusted estimator that applies the plug-in principle. You will also learn later that θ^2=2Y is a method of moments estimator. You will also learn later that θ^3, which depends on the largest order statistic Ymax, is a bias-adjusted maximum likelihood estimator.

ANSWER: First, let me write down all the calculus-based results I am going to need. The cdf for U(0,θ) is given by F(y)=yf(s)ds={0if y<00f(s)ds+0yf(s)ds=0y1θds=yθif 0yθθf(s)ds+θyf(s)ds=1if y>θ Next, by Theorem 3.10.1, the pdfs of Ymax and Ymin are given by fYmax(y)=nyn1θn,  fYmin(y)=nθ(1yθ)n1 for 0yθ. Finally, I collect the following integration results below to tidy up the calculations later.

  • For positive integers k, we have (1sθ)kds=θk+1(1sθ)k+1 This result is obtained using integration by substitution.
  • For positive integers m and k, we have sm(1sθ)kds=θsmk+1(1sθ)k+1+mθk+1sm1(1sθ)k+1ds So, if m=1 and k=n1, 0θsm(1sθ)kds=θ2n(n+1). If m=2 and k=n1, 0θsm(1sθ)kds=2θ3n(n+1)(n+2).
  1. (6 points) Find E(θ^1), E(θ^2), and E(θ^3). One of the calculations is much easier. Can you explain why?

ANSWER:

We need to derive the pdf of Ymin and to directly calculate some integrals. These results were collected earlier. So, we have

E(θ^1)=(n+1)E(Ymin)=(n+1)nθ0θs(1sθ)n1ds=(n+1)nθθ2n(n+1)=θ Because Y1,,Yn are IID U(0,θ), E(Y)=θ/2. Therefore,

E(θ^2)=2E(Y)=2θ/2=θ We also need to derive the pdf of Ymax and to directly calculate some integrals. So, we have E(θ^3)=n+1nE(Ymax)=n+1θn0θsnds=n+1θnθn+1n+1=θ

The easiest to work out was E(θ^2) because you do not need the distribution of Y. The linearity of Y substantially simplifies the calculations compared to the other two estimators.

NOTE: Ymin took a lot more time to work out compared to Ymax, as the latter does not require integration by parts.

  1. (6 points) Find Var(θ^1), Var(θ^2), and Var(θ^3). Which estimator has a variance that goes to zero more rapidly?

ANSWER: Using the previously collected results, we have

E(θ^12)=(n+1)2E(Ymin2)=(n+1)2nθ2θ3n(n+1)(n+2)=2(n+1)θ2n+2. As a result, Var(θ^1)=2(n+1)θ2n+2θ2=nn+2θ2. Next, Y1,,Yn are IID U(0,θ), so Var(Y)=θ2/(12n). Therefore, Var(θ^2)=4Var(Y)=13nθ2. Finally, E(θ^32)=(n+1n)2E(Ymax2)=(n+1n)2nθn0θsn+1ds=(n+1)2n(n+2)θ2. Therefore, Var(θ^3)=(n+1)2n(n+2)θ2θ2=1n(n+2)θ2

θ^3 has a variance that goes to zero more rapidly compared to θ^1 and θ^2.

  1. (3 points) Read the portion of the notes related to squared-error consistency here. There you will find the definition of squared-error consistency, bias, asymptotically unbiased, and MSE. Using what you already have in Item 1, what would be the bias of Ymin, 2Y, and Ymax. Are these estimators unbiased for θ? asymptotically unbiased for θ? Why do you think θ^1 and θ^3 are labeled bias-adjusted?

ANSWER: The bias of Ymin for θ is E(Ymin)θ=nn+1θ. The bias of 2Y for for θ is 0. The bias of Ymax for θ is E(Ymax)θ=1n+1θ. Therefore, only 2Y is unbiased for θ.

The limit of the bias of 2Y and Ymax as n are both equal to zero. Therefore, these two estimators are asymptotically unbiased for θ.

Since E(Ymax)=nn+1θ, we must have E(n+1nYmax)=θ. Observe that n+1nYmax is exactly θ^3.

In a similar fashion, E(Ymin)=1n+1θ, we must have E((n+1)Ymin)=θ. Observe that (n+1)Ymin is exactly θ^1. Therefore, in both cases, we can justify the label bias-adjusted estimators.

  1. (3 points) Determine whether or not θ^1, θ^2, and θ^3 are squared-error consistent for θ. Show your work and reasoning.

Since θ^1, θ^2, and θ^3 are all unbiased for θ, they are also asymptotically unbiased for θ.

From Item 2, limnVar(θ^1)=limnnn+2θ2=θ20. Therefore, θ^1 is not squared-error consistent for θ.

From Item 2, observe that both θ^2 and θ^3 have variances that go to zero as n. Because θ^2 and θ^3 are asymptotically unbiased and have variances that go to zero asymptotically, both these estimators are squared-error consistent for θ.

Below you see sample code to demonstrate the apply() function. To learn more about it, type ?apply. But the idea it is to “apply” a function (either built-in or user-defined) to either the rows or columns of an array. This is a convenient way to repeatedly do something to the columns or rows of an array without really specifying a for loop. For example, you encountered colMeans() which may be formulated in terms of apply().

n <- 5
mu <- 1
sigma.sq <- 4
nsim <- 8 # number of realizations to be obtained
# repeatedly obtain realizations
ymat <- replicate(nsim, rnorm(n, mu, sqrt(sigma.sq)))
ymat
           [,1]        [,2]      [,3]     [,4]       [,5]      [,6]      [,7]
[1,]  1.3904051 -0.03111589  2.369097 2.213403 2.22615523 -2.934541 2.3109174
[2,]  0.3284039  2.29255777  1.778515 3.661879 1.00847223 -2.319194 3.6014871
[3,] -2.1726322  0.03044256 -3.045596 1.200483 3.73690601 -1.535897 3.0837312
[4,]  0.6038961  0.26164353  5.022951 2.845416 0.01033531  2.566453 0.4153617
[5,]  0.7250735  1.18166540  2.054126 2.044209 2.23500607  3.795071 1.8485274
           [,8]
[1,]  3.9297191
[2,]  2.2290166
[3,]  1.7287033
[4,] -0.6197381
[5,] -1.8608037
apply(ymat, 2, mean)
[1]  0.17502928  0.74703867  1.63581857  2.39307784  1.84337497 -0.08562156
[7]  2.25200495  1.08137944
apply(ymat, 1, mean)
[1] 1.4342550 1.5726422 0.3782676 1.3882898 1.5028593
  1. (6 points) You are going to repeatedly generate 104 artificial datasets from U(0,θ). Set n=5 and n=500, but you can choose any valid value of θ. You may have to use the apply(), min(), max(), and runif() commands for this item. Your tasks are (a) to apply the estimators θ^1, θ^2, and θ^3 to the artificial datasets (b) to produce histograms of the sampling distributions of θ^1, θ^2, and θ^3 (c) to report your own R code by modifying the code below. What do you notice about the sampling distributions of each estimator?
theta <- 1
nsim <- 10^4 # number of realizations to be obtained
# repeatedly obtain realizations
ymat <- replicate(nsim, runif(5, min = 0, max = theta))
# calculate estimates
theta1hat <- 6*apply(ymat, 2, min)
theta2hat <- 2*colMeans(ymat)
theta3hat <- 6/5*apply(ymat, 2, max)
# put a 1x3 canvas for the case of n=5
par(mfrow=c(1,3))
hist(theta1hat, freq = FALSE)
hist(theta2hat, freq = FALSE)
hist(theta3hat, freq = FALSE)

c(mean(theta1hat), mean(theta2hat), mean(theta3hat))
[1] 0.9935800 0.9930326 0.9979351
c(var(theta1hat), var(theta2hat), var(theta3hat))
[1] 0.70735197 0.06603466 0.02905609
# n=500
ymat <- replicate(nsim, runif(500, min = 0, max = theta))
# calculate estimates
theta1hat <- 501*apply(ymat, 2, min)
theta2hat <- 2*colMeans(ymat)
theta3hat <- 501/500*apply(ymat, 2, max)
# put a 1x3 canvas for the case of n=500
par(mfrow=c(1,3))
hist(theta1hat, freq = FALSE)
hist(theta2hat, freq = FALSE)
hist(theta3hat, freq = FALSE)

c(mean(theta1hat), mean(theta2hat), mean(theta3hat))
[1] 1.0046896 1.0000454 0.9999998
c(var(theta1hat), var(theta2hat), var(theta3hat))
[1] 9.782793e-01 6.656409e-04 3.967086e-06
  1. (1 point) Report your findings for c(mean(theta1hat), mean(theta2hat), mean(theta3hat)) and c(var(theta1hat), var(theta2hat), var(theta3hat)) when n=5 and n=500. Compare with the theoretical results obtained in Items 1 and 2.

Findings from the simulation could be found in Item 5, where we fixed θ=1 by design. We should see a value close to 1 for the c(mean(theta1hat), mean(theta2hat), mean(theta3hat)) regardless of whether n=5 or n=500.

For c(var(theta1hat), var(theta2hat), var(theta3hat)), we should obtain values roughly around 0.714, 0.067, 0.029, respectively, when n=5. When n=500, we should obtain values roughly around 1, 6.7×104, and 4×106.

The theoretical and simulation results match very well for the case where θ=1.

Footnotes

  1. I used J for sample size instead of n, which is what we usually use in class. The main reason is that I have already used up n for the exercise.↩︎

  2. If you forgot this concept, consult LM Section 3.10.↩︎