<- read.csv("data_cookie.csv")
cookie <- mean(cookie$numchoc)
ybar <- length(cookie$numchoc)
J <- sqrt(20)
c c(2*ybar + c^2/J - sqrt(4*ybar*c^2/J + c^4/J^2), 2*ybar + c^2/J + sqrt(4*ybar*c^2/J + c^4/J^2))/2
[1] 13.64160 16.71778
1 bonus point earned for doing this exercise properly.
You have personally observed that the number of chocolate chips on a Chips Ahoy cookie is not the same across all cookies. In other words, there is variability in the number of chocolate chips even though these cookies are mass-produced and have quality control standards. You will now go through an argument meant to motivate the choice of a parametric statistical model to describe the variability of
Chips Ahoy, Part 1 gave you a chance to work through what could be a suitable model for the number of chocolate chip cookies. You have reached the conclusion that
ANSWER: Observe that
I do not show the solution for this part, as this is straightforward algebra.
First, solve the quadratic equation
We cannot even pretend to know
read.csv()
. Report the R code you have used and your finding.We must set
<- read.csv("data_cookie.csv")
cookie <- mean(cookie$numchoc)
ybar <- length(cookie$numchoc)
J <- sqrt(20)
c c(2*ybar + c^2/J - sqrt(4*ybar*c^2/J + c^4/J^2), 2*ybar + c^2/J + sqrt(4*ybar*c^2/J + c^4/J^2))/2
[1] 13.64160 16.71778
mean(results[1,] < lambda & lambda < results[2,])
. Discuss whether your finding matches the theoretical guarantee developed in Item 3.ANSWER: Below you will find the R code used. The theoretical guarantee matches the one found in Item 3. In fact, the lower bound 3/4 (when
# Set a value for lambda to generate artificial datasets following a Poisson distribution
<- 14
lambda # Set c for the desired 1-1/c^2
<- 2
c # Create a function depending on sample size J
<- function(J)
cons.ci
{<- rpois(J, lambda)
y <- mean(y)
ybar c(2*ybar + c^2/J - sqrt(4*ybar*c^2/J + c^4/J^2), 2*ybar + c^2/J + sqrt(4*ybar*c^2/J + c^4/J^2))/2
}# Construct interval 10000 times, you can change 5 into something else
<- replicate(10^4, cons.ci(J))
results # Calculate how many
mean(results[1,] < lambda & lambda < results[2,])
[1] 0.9553
NOTE: It is not necessary to match the simulation design to the Chips Ahoy example, but you may do so. For one thing, we really do not know the real value of
R NOTE: The object J
in cons.ci(J)
points to the J <- length(cookie$numchoc)
. The J
in the function definition for cons.ci
is only a placeholder.
This exercise should force you to start working on the examples and exercises in the book. This exercise is a version of LM Examples 5.4.2, 5.4.6, 5.7.1, and LM Exercises 5.4.18, 5.7.4, 5.7.5. You may also need to use integration by parts at some point for the mathematical derivations.
Let
You will be considering three estimators for
ANSWER: First, let me write down all the calculus-based results I am going to need. The cdf for
ANSWER:
We need to derive the pdf of
The easiest to work out was
NOTE:
ANSWER: Using the previously collected results, we have
ANSWER: The bias of
The limit of the bias of
Since
In a similar fashion,
Since
From Item 2,
From Item 2, observe that both
Below you see sample code to demonstrate the apply()
function. To learn more about it, type ?apply
. But the idea it is to “apply” a function (either built-in or user-defined) to either the rows or columns of an array. This is a convenient way to repeatedly do something to the columns or rows of an array without really specifying a for
loop. For example, you encountered colMeans()
which may be formulated in terms of apply()
.
<- 5
n <- 1
mu <- 4
sigma.sq <- 8 # number of realizations to be obtained
nsim # repeatedly obtain realizations
<- replicate(nsim, rnorm(n, mu, sqrt(sigma.sq)))
ymat ymat
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1.3904051 -0.03111589 2.369097 2.213403 2.22615523 -2.934541 2.3109174
[2,] 0.3284039 2.29255777 1.778515 3.661879 1.00847223 -2.319194 3.6014871
[3,] -2.1726322 0.03044256 -3.045596 1.200483 3.73690601 -1.535897 3.0837312
[4,] 0.6038961 0.26164353 5.022951 2.845416 0.01033531 2.566453 0.4153617
[5,] 0.7250735 1.18166540 2.054126 2.044209 2.23500607 3.795071 1.8485274
[,8]
[1,] 3.9297191
[2,] 2.2290166
[3,] 1.7287033
[4,] -0.6197381
[5,] -1.8608037
apply(ymat, 2, mean)
[1] 0.17502928 0.74703867 1.63581857 2.39307784 1.84337497 -0.08562156
[7] 2.25200495 1.08137944
apply(ymat, 1, mean)
[1] 1.4342550 1.5726422 0.3782676 1.3882898 1.5028593
apply()
, min()
, max()
, and runif()
commands for this item. Your tasks are (a) to apply the estimators <- 1
theta <- 10^4 # number of realizations to be obtained
nsim # repeatedly obtain realizations
<- replicate(nsim, runif(5, min = 0, max = theta))
ymat # calculate estimates
<- 6*apply(ymat, 2, min)
theta1hat <- 2*colMeans(ymat)
theta2hat <- 6/5*apply(ymat, 2, max)
theta3hat # put a 1x3 canvas for the case of n=5
par(mfrow=c(1,3))
hist(theta1hat, freq = FALSE)
hist(theta2hat, freq = FALSE)
hist(theta3hat, freq = FALSE)
c(mean(theta1hat), mean(theta2hat), mean(theta3hat))
[1] 0.9935800 0.9930326 0.9979351
c(var(theta1hat), var(theta2hat), var(theta3hat))
[1] 0.70735197 0.06603466 0.02905609
# n=500
<- replicate(nsim, runif(500, min = 0, max = theta))
ymat # calculate estimates
<- 501*apply(ymat, 2, min)
theta1hat <- 2*colMeans(ymat)
theta2hat <- 501/500*apply(ymat, 2, max)
theta3hat # put a 1x3 canvas for the case of n=500
par(mfrow=c(1,3))
hist(theta1hat, freq = FALSE)
hist(theta2hat, freq = FALSE)
hist(theta3hat, freq = FALSE)
c(mean(theta1hat), mean(theta2hat), mean(theta3hat))
[1] 1.0046896 1.0000454 0.9999998
c(var(theta1hat), var(theta2hat), var(theta3hat))
[1] 9.782793e-01 6.656409e-04 3.967086e-06
c(mean(theta1hat), mean(theta2hat), mean(theta3hat))
and c(var(theta1hat), var(theta2hat), var(theta3hat))
when Findings from the simulation could be found in Item 5, where we fixed c(mean(theta1hat), mean(theta2hat), mean(theta3hat))
regardless of whether
For c(var(theta1hat), var(theta2hat), var(theta3hat))
, we should obtain values roughly around 0.714, 0.067, 0.029, respectively, when
The theoretical and simulation results match very well for the case where