Homework 02

Instructions

For this homework, you may choose to work with another classmate enrolled in the course. Choose only one member of the pair to submit at SPOC. Otherwise, submit your solutions in the usual way.
Submit your solutions in PDF format to our SPOC website. The file name should be of the form IDNumber1_IDNumber2_HW02.pdf, where IDNumber1 and IDNumber2 are the corresponding university ID numbers of the pair who worked on this homework. If you worked alone, then use the file name IDNumber_HW02.pdf. The file size should be less than 20 MB.
The deadline is on March 26, 2023 at 1700 Beijing time. This deadline is also indicated in the submission form at SPOC. Late submissions will not be accepted, regardless of the validity of the reason. You may still choose to have your solutions checked, but it gets zero credit if submitted late.
Make sure to click on the Submit button after you have uploaded your assignment, so that it does not stay in Draft mode.
You are allowed to discuss the exercises with other classmates, but you have to write up your own solutions. This means you are not allowed to directly copy or even look at solutions from any other source.
Some of the solutions may be chosen for presentation in class. If you or your partner are unable to explain your answers in class, then both of you will get zero credit for this homework.
If you do not provide the table in Exercise A, you automatically will get zero credit for this homework. If you do provide the table in Exercise A, at least you get a 1 point bonus right away.

Exercise A: ID numbers and contributions

Create a table which lists every classmate involved in the development of your solutions. Only put down ID numbers. You will also indicate the contributions made by each member of the pair. You will also indicate what you have discussed with other classmates, along with their ID numbers. If you work alone, then there is no need to be explicit about the contributions. But if you discussed with another classmates, make sure to indicate what you have discussed with them.

The table has the following form:

ID number	Write YES if ID number belongs to submitter.	Contribution

Here is an example, which is not very complete, but conveys the general idea.

ID number	Write YES if ID number belongs to submitter.	Contribution
ID number 1	YES	Worked on R code.
ID number 2	YES	Wrote up the solutions.
ID number 3		Discussed Exercise B Item 3.

If you are working alone, your table might look like:

ID number	Write YES if ID number belongs to submitter.	Contribution
ID number 1	YES
ID number 2		Discussed Exercise B Item 3.
ID number 3		Discussed Exercise A Item 4.

Exercise B: Chips Ahoy, Part 2

You have personally observed that the number of chocolate chips on a Chips Ahoy cookie is not the same across all cookies. In other words, there is variability in the number of chocolate chips even though these cookies are mass-produced and have quality control standards. You will now go through an argument meant to motivate the choice of a parametric statistical model to describe the variability of \(X\).

Chips Ahoy, Part 1 gave you a chance to work through what could be a suitable model for the number of chocolate chip cookies. You have reached the conclusion that \(X\sim \mathrm{Poi}\left(\lambda\right)\), where \(\lambda=m/n\). Both \(m\) and \(n\) are unobservable from the point of view of the consumer. In this exercise, you will be constructing a rough confidence interval for \(\lambda\) from the point of view of the consumer. Let \(Y_1,\ldots, Y_J\) be IID draws from \(\mathrm{Poi}\left(\lambda\right)\), where \(J\) is the sample size¹.

(3 points) Write down a proof, along with your reasoning, of the following statement: \[\mathbb{P}\left(\left|\frac{\overline{Y}-\lambda}{\sqrt{\lambda/J}}\right| \leq c \right) \geq 1-\frac{1}{c^2}\]
(1 point) Note that \[\left|\frac{\overline{Y}-\lambda}{\sqrt{\lambda/J}}\right| \leq c \Leftrightarrow \left(\frac{\overline{Y}-\lambda}{\sqrt{\lambda/J}}\right)^2 \leq c^2.\] Do some algebra and show that \[\left(\frac{\overline{Y}-\lambda}{\sqrt{\lambda/J}}\right)^2 \leq c^2 \Leftrightarrow \lambda^2+\lambda\left(-2\overline{Y}-\frac{c^2}{J}\right)+\left(\overline{Y}\right)^2\leq 0\]
(4 points) Consider a modified version of the definition of a confidence interval found here. In particular, let \(\theta\) be an unknown parameter of interest or estimand and \(c>0\). Let \(L\) and \(U\) be statistics. \((L, U)\) is a \(100\left(1-1/c^2\right)\%\) finite-sample conservative \(100\left(1-1/c^2\right)\%\) confidence interval for \(\theta\) if \[\mathbb{P}\left(L\leq \theta \leq U\right) \geq 1-\frac{1}{c^2}\] for every \(\theta\). Use the result in Item 2 to construct a finite-sample conservative \(100\left(1-1/c^2\right)\%\) confidence interval for \(\lambda\). You should give me expressions for \(L\) and \(U\), which will achieve the lower bound \(1-1/c^2\).
(1 point) Recall we constructed a confidence interval for \(\mu\) in the IID normal case. We used an approach where we pretended to know \(\sigma^2\). Do we have to pretend to know \(\sigma^2\) for the Poisson case? Explain.
(2 points) You will now apply the interval found in Item 3 to obtain a finite-sample conservative 95% confidence interval for \(\lambda\) using our cookies dataset. You may have to use read.csv(). Report the R code you have used and your finding.
(4 points) You will be using a Monte Carlo simulation to evaluate the performance of the finite-sample conservative \(100\left(1-1/c^2\right)\%\) confidence interval for \(\lambda\). Consider the R code we used to demonstrate what a confidence interval means in the IID normal case here. I created a modified version below. You have to fill up the missing parts accordingly for the case of a finite-sample conservative \(100\left(1-1/c^2\right)\%\) confidence interval for \(\lambda\). Report the code you have used and your finding for mean(results[1,] < lambda & lambda < results[2,]). Discuss whether your finding matches the theoretical guarantee developed in Item 3.

# Set a value for lambda to generate artificial datasets following a Poisson distribution
lambda <- 
# Set c for the desired 1-1/c^2  
c <- 
# Create a function depending on sample size J
cons.ci <- function(J)
{
  y <- 
  c(, )
}
# Construct interval 10000 times, you can change 5 into something else
results <- replicate(10^4, cons.ci(5))
# Calculate how many 
mean(results[1,] < lambda & lambda < results[2,])

Exercise C: The IID Uniform case

This exercise should force you to start working on the examples and exercises in the book. This exercise is a version of LM Examples 5.4.2, 5.4.6, 5.7.1, and LM Exercises 5.4.18, 5.7.4, 5.7.5. You may also need to use integration by parts at some point for the mathematical derivations.

Let \(Y_1,\ldots,Y_n\) be IID \(\mathsf{U}\left(0,\theta\right)\), where \(\theta>0\). The common pdf of these random variables is given by \[f\left(y\right)=\begin{cases}\dfrac{1}{\theta} & \ \ \ 0\leq y\leq \theta\\ 0 & \ \ \ \mathrm{otherwise} \end{cases},\] the common mean is \(\theta/2\), and the common variance is \(\theta^2/12\). The estimand for this exercise is \(\theta\).

You will be considering three estimators for \(\theta\): \[\widehat{\theta}_1=\left(n+1\right)Y_{\mathsf{min}}, \ \ \ \widehat{\theta}_2=2\overline{Y}, \ \ \ \widehat{\theta}_3=\dfrac{n+1}{n}Y_{\mathsf{max}}.\] You will learn later that \(\widehat{\theta}_1\), which depends on smallest order statistic², \(Y_{\mathsf{min}}\) is a bias-adjusted estimator that applies the plug-in principle. You will also learn later that \(\widehat{\theta}_2=2\overline{Y}\) is a method of moments estimator. You will also learn later that \(\widehat{\theta}_3\), which depends on the largest order statistic \(Y_{\mathsf{max}}\), is a bias-adjusted maximum likelihood estimator.

(6 points) Find \(\mathbb{E}\left(\widehat{\theta}_1\right)\), \(\mathbb{E}\left(\widehat{\theta}_2\right)\), and \(\mathbb{E}\left(\widehat{\theta}_3\right)\). One of the calculations is much easier. Can you explain why?
(6 points) Find \(\mathsf{Var}\left(\widehat{\theta}_1\right)\), \(\mathsf{Var}\left(\widehat{\theta}_2\right)\), and \(\mathsf{Var}\left(\widehat{\theta}_3\right)\). Which estimator has a variance that goes to zero more rapidly?
(3 points) Read the portion of the notes related to squared-error consistency here. There you will find the definition of squared-error consistency, bias, asymptotically unbiased, and MSE. Using what you already have in Item 1, what would be the bias of \(Y_{\mathsf{min}}\), \(2\overline{Y}\), and \(Y_{\mathsf{max}}\). Are these estimators unbiased for \(\theta\)? asymptotically unbiased for \(\theta\)? Why do you think \(\widehat{\theta}_1\) and \(\widehat{\theta}_3\) are labeled bias-adjusted?
(3 points) Determine whether or not \(\widehat{\theta}_1\), \(\widehat{\theta}_2\), and \(\widehat{\theta}_3\) are squared-error consistent for \(\theta\). Show your work and reasoning.

Below you see sample code to demonstrate the apply() function. To learn more about it, type ?apply. But the idea it is to “apply” a function (either built-in or user-defined) to either the rows or columns of an array. This is a convenient way to repeatedly do something to the columns or rows of an array without really specifying a for loop. For example, you encountered colMeans() which may be formulated in terms of apply().

n <- 5
mu <- 1
sigma.sq <- 4
nsim <- 8 # number of realizations to be obtained
# repeatedly obtain realizations
ymat <- replicate(nsim, rnorm(n, mu, sqrt(sigma.sq)))
ymat

           [,1]      [,2]     [,3]     [,4]       [,5]       [,6]      [,7]
[1,]  1.9671349  1.706468 3.443282 1.933991  0.2663599 -2.8786714 -3.706098
[2,] -0.5778011  4.405135 2.144623 2.051587  1.2024463  3.3212694  4.326703
[3,]  1.4653632  3.060005 1.080083 1.842811  0.4887059 -3.7729695  2.451657
[4,] -1.1812593  3.245220 2.422256 1.579526  1.0272511  6.1636389  3.018296
[5,] -1.0333700 -1.127120 3.530654 1.464714 -1.0018255 -0.1561552  1.564877
          [,8]
[1,]  2.346638
[2,]  2.418972
[3,]  4.376029
[4,]  2.253096
[5,] -1.047758

apply(ymat, 2, mean)

[1] 0.1280135 2.2579415 2.5241799 1.7745257 0.3965875 0.5354224 1.5310869
[8] 2.0693954

apply(ymat, 1, mean)

[1] 0.6348879 2.4116169 1.3739606 2.3160031 0.2742520

(6 points) You are going to repeatedly generate \(10^4\) artificial datasets from \(\mathsf{U}\left(0,\theta\right)\). Set \(n=5\) and \(n=500\), but you can choose any valid value of \(\theta\). You may have to use the apply(), min(), max(), and runif() commands for this item. Your tasks are (a) to apply the estimators \(\widehat{\theta}_1\), \(\widehat{\theta}_2\), and \(\widehat{\theta}_3\) to the artificial datasets (b) to produce histograms of the sampling distributions of \(\widehat{\theta}_1\), \(\widehat{\theta}_2\), and \(\widehat{\theta}_3\) (c) to report your own R code by modifying the code below. What do you notice about the sampling distributions of each estimator?

theta <- 
nsim <- 10^4 # number of realizations to be obtained
# repeatedly obtain realizations
ymat <- replicate(nsim, )
# calculate estimates
theta1hat <- 
theta2hat <- 
theta3hat <- 
# put a 1x3 canvas for the case of n=5
par(mfrow=c(1,3))

# put a 1x3 canvas for the case of n=500
par(mfrow=c(1,3))

(1 point) Report your findings for c(mean(theta1hat), mean(theta2hat), mean(theta3hat)) and c(var(theta1hat), var(theta2hat), var(theta3hat)) when \(n=5\) and \(n=500\). Compare with the theoretical results obtained in Items 1 and 2.

Footnotes

I used \(J\) for sample size instead of \(n\), which is what we usually use in class. The main reason is that I have already used up \(n\) for the exercise.↩︎
If you forgot this concept, consult LM Section 3.10.↩︎