<- read.csv("data_cookie.csv")
cookie <- mean(cookie$numchoc)
ybar <- length(cookie$numchoc)
J <- sqrt(20)
c c(2*ybar + c^2/J - sqrt(4*ybar*c^2/J + c^4/J^2), 2*ybar + c^2/J + sqrt(4*ybar*c^2/J + c^4/J^2))/2
[1] 13.64160 16.71778
1 bonus point earned for doing this exercise properly.
You have personally observed that the number of chocolate chips on a Chips Ahoy cookie is not the same across all cookies. In other words, there is variability in the number of chocolate chips even though these cookies are mass-produced and have quality control standards. You will now go through an argument meant to motivate the choice of a parametric statistical model to describe the variability of \(X\).
Chips Ahoy, Part 1 gave you a chance to work through what could be a suitable model for the number of chocolate chip cookies. You have reached the conclusion that \(X\sim \mathrm{Poi}\left(\lambda\right)\), where \(\lambda=m/n\). Both \(m\) and \(n\) are unobservable from the point of view of the consumer. In this exercise, you will be constructing a rough confidence interval for \(\lambda\) from the point of view of the consumer. Let \(Y_1,\ldots, Y_J\) be IID draws from \(\mathrm{Poi}\left(\lambda\right)\), where \(J\) is the sample size1.
ANSWER: Observe that \(Y_1,\ldots, Y_J\) are IID draws from \(\mathrm{Poi}\left(\lambda\right)\). Therefore, \(\mathbb{E}\left(\overline{Y}\right)=\lambda\) and \(\mathsf{Var}\left(\overline{Y}\right)=\lambda/n\). By Chebyshev’s inequality, we have for any \(\varepsilon > 0\), \[\mathbb{P}\left(\left|\overline{Y}-\mathbb{E}\left(\overline{Y}\right)\right| \leq \varepsilon \right)\ \geq 1-\frac{\mathsf{Var}\left(\overline{Y}\right)}{\varepsilon^2} \Rightarrow \mathbb{P}\left(\left|\overline{Y}-\lambda\right| \leq \varepsilon \right)\ \geq 1-\frac{\lambda}{J\varepsilon^2}. \] If we choose \(\varepsilon=c\lambda/J\), we obtain \[\mathbb{P}\left(\left|\frac{\overline{Y}-\lambda}{\sqrt{\lambda/J}}\right| \leq c \right) \geq 1-\frac{1}{c^2}.\]
I do not show the solution for this part, as this is straightforward algebra.
First, solve the quadratic equation \[\lambda^2+\lambda\left(-2\overline{Y}-\frac{c^2}{J}\right)+\left(\overline{Y}\right)^2= 0.\] The solutions are given by \[L=\frac{2\overline{Y}+\dfrac{c^2}{J}-\sqrt{\dfrac{4\overline{Y}c^2}{J}+\dfrac{c^4}{J^2}}}{2}, \ \ \ U=\frac{2\overline{Y}+\dfrac{c^2}{J}+\sqrt{\dfrac{4\overline{Y}c^2}{J}+\dfrac{c^4}{J^2}}}{2}.\] Therefore, the solutions to the inequality \[\lambda^2+\lambda\left(-2\overline{Y}-\frac{c^2}{J}\right)+\left(\overline{Y}\right)^2\leq 0\] are given by \(L\leq \lambda \leq U\). Next, show that the desired guarantee is attained: \[\mathbb{P}\left(L\leq \lambda \leq U\right)=\mathbb{P}\left(\left(\frac{\overline{Y}-\lambda}{\sqrt{\lambda/J}}\right)^2 \leq c^2\right)=\mathbb{P}\left(\left|\frac{\overline{Y}-\lambda}{\sqrt{\lambda/J}}\right| \leq c \right) \geq 1-\dfrac{1}{c^2}.\]
We cannot even pretend to know \(\sigma^2\) for the Poisson case because \(\sigma^2=\lambda\) and \(\lambda\) is unknown.
read.csv()
. Report the R code you have used and your finding.We must set \(c^2=20\) because we want a finite-sample conservative 95% confidence interval for \(\lambda\). Below, we produce the calculations which apply the interval obtained in Item 3.
<- read.csv("data_cookie.csv")
cookie <- mean(cookie$numchoc)
ybar <- length(cookie$numchoc)
J <- sqrt(20)
c c(2*ybar + c^2/J - sqrt(4*ybar*c^2/J + c^4/J^2), 2*ybar + c^2/J + sqrt(4*ybar*c^2/J + c^4/J^2))/2
[1] 13.64160 16.71778
mean(results[1,] < lambda & lambda < results[2,])
. Discuss whether your finding matches the theoretical guarantee developed in Item 3.ANSWER: Below you will find the R code used. The theoretical guarantee matches the one found in Item 3. In fact, the lower bound 3/4 (when \(c=2\)) is quite conservative.
# Set a value for lambda to generate artificial datasets following a Poisson distribution
<- 14
lambda # Set c for the desired 1-1/c^2
<- 2
c # Create a function depending on sample size J
<- function(J)
cons.ci
{<- rpois(J, lambda)
y <- mean(y)
ybar c(2*ybar + c^2/J - sqrt(4*ybar*c^2/J + c^4/J^2), 2*ybar + c^2/J + sqrt(4*ybar*c^2/J + c^4/J^2))/2
}# Construct interval 10000 times, you can change 5 into something else
<- replicate(10^4, cons.ci(J))
results # Calculate how many
mean(results[1,] < lambda & lambda < results[2,])
[1] 0.9553
NOTE: It is not necessary to match the simulation design to the Chips Ahoy example, but you may do so. For one thing, we really do not know the real value of \(\lambda\).
R NOTE: The object J
in cons.ci(J)
points to the \(J\) available in memory which should match J <- length(cookie$numchoc)
. The J
in the function definition for cons.ci
is only a placeholder.
This exercise should force you to start working on the examples and exercises in the book. This exercise is a version of LM Examples 5.4.2, 5.4.6, 5.7.1, and LM Exercises 5.4.18, 5.7.4, 5.7.5. You may also need to use integration by parts at some point for the mathematical derivations.
Let \(Y_1,\ldots,Y_n\) be IID \(\mathsf{U}\left(0,\theta\right)\), where \(\theta>0\). The common pdf of these random variables is given by \[f\left(y\right)=\begin{cases}\dfrac{1}{\theta} & \ \ \ 0\leq y\leq \theta\\ 0 & \ \ \ \mathrm{otherwise} \end{cases},\] the common mean is \(\theta/2\), and the common variance is \(\theta^2/12\). The estimand for this exercise is \(\theta\).
You will be considering three estimators for \(\theta\): \[\widehat{\theta}_1=\left(n+1\right)Y_{\mathsf{min}}, \ \ \ \widehat{\theta}_2=2\overline{Y}, \ \ \ \widehat{\theta}_3=\dfrac{n+1}{n}Y_{\mathsf{max}}.\] You will learn later that \(\widehat{\theta}_1\), which depends on smallest order statistic2, \(Y_{\mathsf{min}}\) is a bias-adjusted estimator that applies the plug-in principle. You will also learn later that \(\widehat{\theta}_2=2\overline{Y}\) is a method of moments estimator. You will also learn later that \(\widehat{\theta}_3\), which depends on the largest order statistic \(Y_{\mathsf{max}}\), is a bias-adjusted maximum likelihood estimator.
ANSWER: First, let me write down all the calculus-based results I am going to need. The cdf for \(\mathsf{U}\left(0,\theta\right)\) is given by \[F\left(y\right)=\int_{-\infty}^y f\left(s\right)\,\,ds=\begin{cases} 0 & \mathrm{if}\ y<0\\ \displaystyle\int_{-\infty}^0 f\left(s\right)\,\,ds+\int_0^y f\left(s\right)\,\,ds = \int_0^y \frac{1}{\theta}\,\,ds=\frac{y}{\theta} & \mathrm{if}\ 0\leq y \leq \theta\\ \displaystyle\int_{-\infty}^\theta f\left(s\right)\,\,ds+\int_\theta^y f\left(s\right)\,\,ds = 1 & \mathrm{if}\ y > \theta \end{cases}\] Next, by Theorem 3.10.1, the pdfs of \(Y_{\mathsf{max}}\) and \(Y_{\mathsf{min}}\) are given by \[f_{Y_{\mathsf{max}}}\left(y\right)=\frac{ny^{n-1}}{\theta^n}, \ \ f_{Y_{\mathsf{min}}}\left(y\right)=\dfrac{n}{\theta}\left(1-\frac{y}{\theta}\right)^{n-1}\] for \(0\leq y\leq \theta\). Finally, I collect the following integration results below to tidy up the calculations later.
ANSWER:
We need to derive the pdf of \(Y_{\mathsf{min}}\) and to directly calculate some integrals. These results were collected earlier. So, we have
\[\mathbb{E}\left(\widehat{\theta}_1\right)=\left(n+1\right)\mathbb{E}\left(Y_{\mathsf{min}}\right)=\displaystyle\frac{\left(n+1\right)n}{\theta}\int_0^\theta s\left(1-\frac{s}{\theta}\right)^{n-1}\,\, ds= \frac{\left(n+1\right)n}{\theta} \cdot \frac{\theta^2}{n\left(n+1\right)} =\theta\] Because \(Y_1,\ldots, Y_n\) are IID \(\mathsf{U}\left(0,\theta\right)\), \(\mathbb{E}\left(\overline{Y}\right)=\theta/2\). Therefore,
\[\mathbb{E}\left(\widehat{\theta}_2\right)=2\mathbb{E}\left(\overline{Y}\right)=2\theta/2=\theta\] We also need to derive the pdf of \(Y_{\mathsf{max}}\) and to directly calculate some integrals. So, we have \[\displaystyle \mathbb{E}\left(\widehat{\theta}_3\right)=\frac{n+1}{n}\mathbb{E}\left(Y_{\mathsf{max}}\right)=\frac{n+1}{\theta^n}\int_0^\theta s^n\,\, ds = \frac{n+1}{\theta^n}\cdot\frac{\theta^{n+1}}{n+1}=\theta\]
The easiest to work out was \(\mathbb{E}\left(\widehat{\theta}_2\right)\) because you do not need the distribution of \(\overline{Y}\). The linearity of \(\overline{Y}\) substantially simplifies the calculations compared to the other two estimators.
NOTE: \(Y_{\mathsf{min}}\) took a lot more time to work out compared to \(Y_{\mathsf{max}}\), as the latter does not require integration by parts.
ANSWER: Using the previously collected results, we have
\[\mathbb{E}\left(\widehat{\theta}_1^2\right)= \left(n+1\right)^2\mathbb{E}\left(Y_{\mathsf{min}}^2\right)=\frac{\left(n+1\right)^2 n}{\theta} \cdot \frac{2\theta^3}{n\left(n+1\right)\left(n+2\right)}=\frac{2\left(n+1\right)\theta^2}{n+2}.\] As a result, \[\mathsf{Var}\left(\widehat{\theta}_1\right)=\frac{2\left(n+1\right)\theta^2}{n+2}-\theta^2=\frac{n}{n+2}\theta^2.\] Next, \(Y_1,\ldots, Y_n\) are IID \(\mathsf{U}\left(0,\theta\right)\), so \(\mathsf{Var}\left(\overline{Y}\right)=\theta^2/\left(12n\right)\). Therefore, \[\mathsf{Var}\left(\widehat{\theta}_2\right)=4\mathsf{Var}\left(\overline{Y}\right)=\frac{1}{3n}\theta^2.\] Finally, \[\mathbb{E}\left(\widehat{\theta}_3^2\right)=\left(\frac{n+1}{n}\right)^2\mathbb{E}\left(Y_{\mathsf{max}}^2\right)=\left(\frac{n+1}{n}\right)^2 \cdot \frac{n}{\theta^n}\int_0^\theta s^{n+1}\,\, ds=\frac{\left(n+1\right)^2}{n\left(n+2\right)}\theta^2.\] Therefore, \[\mathsf{Var}\left(\widehat{\theta}_3\right)=\frac{\left(n+1\right)^2}{n\left(n+2\right)}\theta^2-\theta^2=\frac{1}{n\left(n+2\right)}\theta^2\]
\(\widehat{\theta}_3\) has a variance that goes to zero more rapidly compared to \(\widehat{\theta}_1\) and \(\widehat{\theta}_2\).
ANSWER: The bias of \(Y_{\mathsf{min}}\) for \(\theta\) is \(\displaystyle\mathbb{E}\left(Y_{\mathsf{min}}\right)-\theta=-\frac{n}{n+1}\theta\). The bias of \(2\overline{Y}\) for for \(\theta\) is 0. The bias of \(Y_{\mathsf{max}}\) for \(\theta\) is \(\displaystyle\mathbb{E}\left(Y_{\mathsf{max}}\right)-\theta=-\frac{1}{n+1}\theta\). Therefore, only \(2\overline{Y}\) is unbiased for \(\theta\).
The limit of the bias of \(2\overline{Y}\) and \(Y_{\mathsf{max}}\) as \(n\to\infty\) are both equal to zero. Therefore, these two estimators are asymptotically unbiased for \(\theta\).
Since \(\displaystyle\mathbb{E}\left(Y_{max}\right)=\frac{n}{n+1}\theta\), we must have \(\displaystyle\mathbb{E}\left(\frac{n+1}{n}Y_{max}\right)=\theta\). Observe that \(\displaystyle\frac{n+1}{n}Y_{max}\) is exactly \(\widehat{\theta}_3\).
In a similar fashion, \(\displaystyle\mathbb{E}\left(Y_{min}\right)=\frac{1}{n+1}\theta\), we must have \(\displaystyle\mathbb{E}\left(\left(n+1\right)Y_{min}\right)=\theta\). Observe that \(\displaystyle\left(n+1\right)Y_{min}\) is exactly \(\widehat{\theta}_1\). Therefore, in both cases, we can justify the label bias-adjusted estimators.
Since \(\widehat{\theta}_1\), \(\widehat{\theta}_2\), and \(\widehat{\theta}_3\) are all unbiased for \(\theta\), they are also asymptotically unbiased for \(\theta\).
From Item 2, \[\lim_{n\to\infty}\mathsf{Var}\left(\widehat{\theta}_1\right)=\lim_{n\to\infty}\frac{n}{n+2}\theta^2=\theta^2 \neq 0.\] Therefore, \(\widehat{\theta}_1\) is not squared-error consistent for \(\theta\).
From Item 2, observe that both \(\widehat{\theta}_2\) and \(\widehat{\theta}_3\) have variances that go to zero as \(n\to\infty\). Because \(\widehat{\theta}_2\) and \(\widehat{\theta}_3\) are asymptotically unbiased and have variances that go to zero asymptotically, both these estimators are squared-error consistent for \(\theta\).
Below you see sample code to demonstrate the apply()
function. To learn more about it, type ?apply
. But the idea it is to “apply” a function (either built-in or user-defined) to either the rows or columns of an array. This is a convenient way to repeatedly do something to the columns or rows of an array without really specifying a for
loop. For example, you encountered colMeans()
which may be formulated in terms of apply()
.
<- 5
n <- 1
mu <- 4
sigma.sq <- 8 # number of realizations to be obtained
nsim # repeatedly obtain realizations
<- replicate(nsim, rnorm(n, mu, sqrt(sigma.sq)))
ymat ymat
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1.3904051 -0.03111589 2.369097 2.213403 2.22615523 -2.934541 2.3109174
[2,] 0.3284039 2.29255777 1.778515 3.661879 1.00847223 -2.319194 3.6014871
[3,] -2.1726322 0.03044256 -3.045596 1.200483 3.73690601 -1.535897 3.0837312
[4,] 0.6038961 0.26164353 5.022951 2.845416 0.01033531 2.566453 0.4153617
[5,] 0.7250735 1.18166540 2.054126 2.044209 2.23500607 3.795071 1.8485274
[,8]
[1,] 3.9297191
[2,] 2.2290166
[3,] 1.7287033
[4,] -0.6197381
[5,] -1.8608037
apply(ymat, 2, mean)
[1] 0.17502928 0.74703867 1.63581857 2.39307784 1.84337497 -0.08562156
[7] 2.25200495 1.08137944
apply(ymat, 1, mean)
[1] 1.4342550 1.5726422 0.3782676 1.3882898 1.5028593
apply()
, min()
, max()
, and runif()
commands for this item. Your tasks are (a) to apply the estimators \(\widehat{\theta}_1\), \(\widehat{\theta}_2\), and \(\widehat{\theta}_3\) to the artificial datasets (b) to produce histograms of the sampling distributions of \(\widehat{\theta}_1\), \(\widehat{\theta}_2\), and \(\widehat{\theta}_3\) (c) to report your own R code by modifying the code below. What do you notice about the sampling distributions of each estimator?<- 1
theta <- 10^4 # number of realizations to be obtained
nsim # repeatedly obtain realizations
<- replicate(nsim, runif(5, min = 0, max = theta))
ymat # calculate estimates
<- 6*apply(ymat, 2, min)
theta1hat <- 2*colMeans(ymat)
theta2hat <- 6/5*apply(ymat, 2, max)
theta3hat # put a 1x3 canvas for the case of n=5
par(mfrow=c(1,3))
hist(theta1hat, freq = FALSE)
hist(theta2hat, freq = FALSE)
hist(theta3hat, freq = FALSE)
c(mean(theta1hat), mean(theta2hat), mean(theta3hat))
[1] 0.9935800 0.9930326 0.9979351
c(var(theta1hat), var(theta2hat), var(theta3hat))
[1] 0.70735197 0.06603466 0.02905609
# n=500
<- replicate(nsim, runif(500, min = 0, max = theta))
ymat # calculate estimates
<- 501*apply(ymat, 2, min)
theta1hat <- 2*colMeans(ymat)
theta2hat <- 501/500*apply(ymat, 2, max)
theta3hat # put a 1x3 canvas for the case of n=500
par(mfrow=c(1,3))
hist(theta1hat, freq = FALSE)
hist(theta2hat, freq = FALSE)
hist(theta3hat, freq = FALSE)
c(mean(theta1hat), mean(theta2hat), mean(theta3hat))
[1] 1.0046896 1.0000454 0.9999998
c(var(theta1hat), var(theta2hat), var(theta3hat))
[1] 9.782793e-01 6.656409e-04 3.967086e-06
c(mean(theta1hat), mean(theta2hat), mean(theta3hat))
and c(var(theta1hat), var(theta2hat), var(theta3hat))
when \(n=5\) and \(n=500\). Compare with the theoretical results obtained in Items 1 and 2.Findings from the simulation could be found in Item 5, where we fixed \(\theta=1\) by design. We should see a value close to 1 for the c(mean(theta1hat), mean(theta2hat), mean(theta3hat))
regardless of whether \(n=5\) or \(n=500\).
For c(var(theta1hat), var(theta2hat), var(theta3hat))
, we should obtain values roughly around 0.714, 0.067, 0.029, respectively, when \(n=5\). When \(n=500\), we should obtain values roughly around 1, \(6.7\times 10^{-4}\), and \(4\times 10^{-6}\).
The theoretical and simulation results match very well for the case where \(\theta=1\).