Homework 03

Exercise A: ID numbers and contributions

1 bonus point earned for doing this exercise properly.

Exercise B: Chips Ahoy, Part 3

You have personally observed that the number of chocolate chips on a Chips Ahoy cookie is not the same across all cookies. In other words, there is variability in the number of chocolate chips even though these cookies are mass-produced and have quality control standards. Chips Ahoy, Part 1 gave you a chance to work through what could be a suitable model for the number of chocolate chips on a cookie. Chips Ahoy, Part 2 gave you a chance to start from the model developed in Part 1 and to produce a finite-sample confidence interval based on Chebyshev’s inequality that is quite conservative given the Monte Carlo simulation.

In this exercise, you will be constructing a confidence interval based on likelihood theory. Afterwards, you will be making comparisons of the coverage properties of the finite-sample confidence interval based on Chebyshev’s inequality and a likelihood-based confidence interval.

Recall once again that the assumed parametric model is that \(Y_1,\ldots,Y_J \sim\ \mathsf{IID}\ \mathsf{Poi}\left(\lambda\right)\). Under “regularity” conditions which are satisfied for this model, the MLE \(\widehat{\lambda}_{\mathsf{MLE}}\) has the following asymptotic distribution as \(n\to\infty\) (or, if you wish, the MLE obeys the following central limit theorem): \[\frac{\widehat{\lambda}_{\mathsf{MLE}}-\lambda}{\sqrt{1/I\left(\widehat{\lambda}_{\mathsf{MLE}}\right)}} \overset{d}{\to} N\left(0,1\right)\]

  1. (2 points) Derive the log-likelihood function for \(\lambda\), treating the \(Y_i\)’s as random. Show the details of your work.

ANSWER: Because \(Y_1,\ldots,Y_J \sim\ \mathsf{IID}\ \mathsf{Poi}\left(\lambda\right)\), we have

\[L\left(\lambda\right)=\prod_{i=1}^J \frac{\exp\left(-\lambda\right)\lambda^{Y_i}}{Y_i!}.\] Taking logarithms and simplifying gives the following expression:

\[\mathcal{l}\left(\lambda\right)=\sum_{i=1}^J \left[\ln \exp\left(-\lambda\right)+\ln \lambda^{Y_i} - \ln Y_i!\right]=-J\lambda+\left(\sum_{i=1}^J Y_i\right)\ln \lambda-\sum_{i=1}^J \ln Y_i!\]

  1. (2 points) Derive the score function for \(\lambda\). Show directly without using the proof presented in the notes, that the expected value is equal to zero.

The score function is given by the first derivative of the log-likelihood:

\[\mathcal{l}^\prime\left(\lambda\right)=-J+\left(\sum_{i=1}^J Y_i\right)\frac{1}{\lambda}\]

I now show that \(\mathbb{E}\left(\mathcal{l}^\prime\left(\lambda\right)\right)=0\). Observe that \[\mathbb{E}\left(\mathcal{l}^\prime\left(\lambda\right)\right)=-J+\frac{1}{\lambda} \mathbb{E}\left(\sum_{i=1}^J Y_i\right)=-J+\frac{1}{\lambda} \sum_{i=1}^J \mathbb{E}\left(Y_i\right).\] Since \(Y_1,\ldots,Y_J \sim\ \mathsf{IID}\ \mathsf{Poi}\left(\lambda\right)\), we must have \(\mathbb{E}\left(Y_i\right)=\lambda\). As a result, \[\mathbb{E}\left(\mathcal{l}^\prime\left(\lambda\right)\right)=-J+\frac{1}{\lambda}J\lambda=0.\]

  1. (2 points) Derive the Fisher information \(I\left(\lambda\right)\). Show the details of your work. Do you notice something?

ANSWER: The Fisher information is the variance of the score function. Observe that \[\mathsf{Var}\left(\mathcal{l}^\prime\left(\lambda\right)\right)=\mathsf{Var}\left(-J+\left(\sum_{i=1}^J Y_i\right)\frac{1}{\lambda}\right)=\frac{1}{\lambda^2}\mathsf{Var}\left(\sum_{i=1}^J Y_i\right).\] Since \(Y_1,\ldots,Y_J \sim\ \mathsf{IID}\ \mathsf{Poi}\left(\lambda\right)\) and \(\mathsf{Var}\left(Y_i\right)=\lambda\), we have \[\mathsf{Var}\left(\mathcal{l}^\prime\left(\lambda\right)\right)=\frac{1}{\lambda^2}J\lambda=\frac{J}{\lambda}.\]

You may have noticed that the resulting Fisher information is the reciprocal of the \(\mathsf{Var}\left(\overline{Y}\right)\).

NOTE: Alternatively, you may choose to compute the expected value of the negative of the second derivative of the log-likelihood.

  1. (2 points) Use the asymptotic distribution of \(\widehat{\lambda}_{\mathsf{MLE}}\) to construct a 95% confidence interval for \(\lambda\). Make sure to simplify \(L\) and \(U\).

ANSWER: Because \[\frac{\widehat{\lambda}_{\mathsf{MLE}}-\lambda}{\sqrt{1/I\left(\widehat{\lambda}_{\mathsf{MLE}}\right)}} \overset{d}{\to} N\left(0,1\right),\] we can approximate \[\mathbb{P}\left(\frac{\widehat{\lambda}_{\mathsf{MLE}}-\lambda}{\sqrt{1/I\left(\widehat{\lambda}_{\mathsf{MLE}}\right)}} \leq z\right)\] by \(\mathbb{P}\left(Z \leq z\right)\), where \(Z\sim N(0,1)\).

From Item 3, we have \(I\left(\widehat{\lambda}_{\mathsf{MLE}}\right)=J/\widehat{\lambda}_{\mathsf{MLE}}\).

Since a 95% confidence interval for \(\lambda\) is the target, then we can use the standard normal probabilities as follows: \[\mathbb{P}\left(-2 \leq \frac{\widehat{\lambda}_{\mathsf{MLE}}-\lambda}{\sqrt{1/I\left(\widehat{\lambda}_{\mathsf{MLE}}\right)}} \leq 2\right) \approx 0.95\] In addition, the following inequalities are equivalent: \[-2 \leq \frac{\widehat{\lambda}_{\mathsf{MLE}}-\lambda}{\sqrt{1/I\left(\widehat{\lambda}_{\mathsf{MLE}}\right)}} \leq 2 \Leftrightarrow \widehat{\lambda}_{\mathsf{MLE}}-2\sqrt{\frac{\widehat{\lambda}_{\mathsf{MLE}}}{J}} \leq \lambda \leq \widehat{\lambda}_{\mathsf{MLE}}+2\sqrt{\frac{\widehat{\lambda}_{\mathsf{MLE}}}{J}}.\] Therefore, we have \[\mathbb{P}\left( \widehat{\lambda}_{\mathsf{MLE}}-2\sqrt{\frac{\widehat{\lambda}_{\mathsf{MLE}}}{J}} \leq \lambda \leq \widehat{\lambda}_{\mathsf{MLE}}+2\sqrt{\frac{\widehat{\lambda}_{\mathsf{MLE}}}{J}} \right) \approx 0.95\] So, \[L=\widehat{\lambda}_{\mathsf{MLE}}-2\sqrt{\frac{\widehat{\lambda}_{\mathsf{MLE}}}{J}}, \ \ U= \widehat{\lambda}_{\mathsf{MLE}}+2\sqrt{\frac{\widehat{\lambda}_{\mathsf{MLE}}}{J}} \]

NOTE: You could use \(\pm 1.96\) instead of \(\pm 2\).

  1. (3 points) Adapt the code found in the computational examples I shared with you, and use nlm() to report an estimate of \(\lambda\), a standard error for \(\lambda\), and a 95% confidence interval for \(\lambda\) using the cookie dataset we have collected together in class. Compare with the finite-sample 95% confidence interval you already obtained using Chebyshev’s inequality. Comment.

ANSWER:

# Load cookie dataset
cookie <- read.csv("data_cookie.csv")
# Assign to y the relevant column of the cookie dataset
y <- cookie$numchoc
# Setup the negative log-likelihood
# Alternatively, you can drop the extra y argument 
# and adjust the other lines accordingly but be careful
mlnl <- function(par, y)
{
  -sum(dpois(y, lambda = par, log = TRUE))
}
# Use nlm() to find optima, you can choose to experiment with different starting points
output <- nlm(mlnl, 10, hessian = TRUE, print.level = 2, y = y)
iteration = 0
Step:
[1] 0
Parameter:
[1] 10
Function Value
[1] 491.7133
Gradient:
[1] -65.2999

iteration = 1
Step:
[1] 6.217013
Parameter:
[1] 16.21701
Function Value
[1] 352.9323
Gradient:
[1] 8.804251

iteration = 2
Step:
[1] -0.738638
Parameter:
[1] 15.47838
Function Value
[1] 348.4972
Gradient:
[1] 3.116153

iteration = 3
Step:
[1] -0.4046536
Parameter:
[1] 15.07372
Function Value
[1] 347.9086
Gradient:
[1] -0.2363499

iteration = 4
Step:
[1] 0.02852789
Parameter:
[1] 15.10225
Function Value
[1] 347.9053
Gradient:
[1] 0.005886366

iteration = 5
Parameter:
[1] 15.10156
Function Value
[1] 347.9053
Gradient:
[1] 1.084807e-05

Relative gradient close to zero.
Current iterate is probably solution.
# Report estimate of lambda
output$estimate
[1] 15.10156
# Report a standard error
sqrt(solve(output$hessian))
          [,1]
[1,] 0.3435178
# Or: 
sqrt(output$estimate/length(y))
[1] 0.3434835
# Report a 95% confidence interval for lambda
c(output$estimate - 2*sqrt(solve(output$hessian)), output$estimate + 2*sqrt(solve(output$hessian)))
[1] 14.41452 15.78859
# 95% confidence interval for lambda based on Chebyshev's inequality
c <- sqrt(20)
ybar <- mean(y)
J <- length(y)
c(2*ybar + c^2/J - sqrt(4*ybar*c^2/J + c^4/J^2), 2*ybar + c^2/J + sqrt(4*ybar*c^2/J + c^4/J^2))/2
[1] 13.64160 16.71778

The likelihood-based confidence interval is shorter than the confidence interval based on Chebyshev’s inequality. But the latter applies even when the sample size is small.

  1. (4 points) Conduct a Monte Carlo simulation where you are to compare the coverage properties and the average lengths of the two procedures for constructing a confidence interval: (a) finite-sample confidence interval based on Chebyshev’s inequality (b) large-sample confidence interval based on likelihood theory. Coverage is how often you can “capture” the true value and length is \(U-L\). The idea is to extend the R code used for Homework 02. You have to fill up the missing parts accordingly. I have fixed the value of \(\lambda\) and \(J\) for this item. You should explore the properties of a 75% and a 95% interval. Discuss your findings.

ANSWER: For a 75% confidence interval: We have to adjust \(\pm 2\) to \(\pm 0.67\).

# Set a value for lambda to generate artificial datasets following a Poisson distribution
lambda <- 10
# Set c for the desired 1-1/c^2  
c <- 2
# Set up negative log-likelihood
# Make sure to point to the data 
mlnl <- function(par, y)
{
  -sum(dpois(y, lambda = par, log = TRUE))
}
# Create a function to record confidence intervals
# Argument is the sample size J
cons.ci <- function(J)
{
  # Random draws from Poisson
  y <- rpois(J, lambda)
  # Calculate the sample mean (which is really a method of moments estimator of lambda)
  ybar <- mean(y)
  # Calculate the MLE, include hessian = TRUE, and point to the data
  output <- nlm(mlnl, 10, hessian = TRUE, y = y)
  # Finite-sample confidence interval
  ci.finite <- c(2*ybar + c^2/J - sqrt(4*ybar*c^2/J + c^4/J^2), 2*ybar + c^2/J + sqrt(4*ybar*c^2/J + c^4/J^2))/2
  # Likelihood-based confidence interval
  # Make sure to use results from nlm() output
  ci.large <- c(output$estimate - 1.15*sqrt(solve(output$hessian)), output$estimate + 1.15*sqrt(solve(output$hessian)))
  # Collect four numbers lower and upper limits of the first interval
  # and the lower and upper limits of the second interval
  return(c(ci.finite, ci.large))
}
# Set the sample size for yourself 
J <- 20
# Construct interval 10000 times
results <- replicate(10^4, cons.ci(J))
# Calculate rate of capture
mean(results[1, ] < lambda & lambda < results[2, ])
[1] 0.9552
mean(results[3, ] < lambda & lambda < results[4, ])
[1] 0.7395
# Calculate the average length of the intervals
mean(results[2, ] - results[1, ])
[1] 2.833437
mean(results[4, ] - results[3, ])
[1] 1.625319

For a 95% confidence interval:

# Set a value for lambda to generate artificial datasets following a Poisson distribution
lambda <- 10
# Set c for the desired 1-1/c^2  
c <- sqrt(20)
# Set up negative log-likelihood
# Make sure to point to the data 
mlnl <- function(par, y)
{
  -sum(dpois(y, lambda = par, log = TRUE))
}
# Create a function to record confidence intervals
# Argument is the sample size J
cons.ci <- function(J)
{
  # Random draws from Poisson
  y <- rpois(J, lambda)
  # Calculate the sample mean (which is really a method of moments estimator of lambda)
  ybar <- mean(y)
  # Calculate the MLE, include hessian = TRUE, and point to the data
  output <- nlm(mlnl, 10, hessian = TRUE, y = y)
  # Finite-sample confidence interval
  ci.finite <- c(2*ybar + c^2/J - sqrt(4*ybar*c^2/J + c^4/J^2), 2*ybar + c^2/J + sqrt(4*ybar*c^2/J + c^4/J^2))/2
  # Likelihood-based confidence interval
  # Make sure to use results from nlm() output
  ci.large <- c(output$estimate - 2*sqrt(solve(output$hessian)), output$estimate + 2*sqrt(solve(output$hessian)))
  # Collect four numbers lower and upper limits of the first interval
  # and the lower and upper limits of the second interval
  return(c(ci.finite, ci.large))
}
# Set the sample size for yourself 
J <- 20
# Construct interval 10000 times
results <- replicate(10^4, cons.ci(J))
# Calculate rate of capture
mean(results[1, ] < lambda & lambda < results[2, ])
[1] 1
mean(results[3, ] < lambda & lambda < results[4, ])
[1] 0.9575
# Calculate the average length of the intervals
mean(results[2, ] - results[1, ])
[1] 6.396993
mean(results[4, ] - results[3, ])
[1] 2.825889

DISCUSSION: The coverage rate of the likelihood-based 75% confidence interval was close to the stated rate, while the coverage rate of the 75% confidence interval based on Chebyshev’s inequality was very conservative (or much larger than stated). The performance of the likelihood-based 95% confidence interval is similar to what was observed for the 75% confidence interval. What accounts for the difference in results is the much shorter average length of the likelihood-based interval.

Exercise C: Inference for the binomial parameter

This exercise should force you to start working on Sections 5.3 and 6.3 of the book, especially the IID Bernoulli case. The context of this exercise was based on Diez, Barr, and Cetinkaya-Rundel (2014).

Assisted Reproductive Technology (ART) is a collection of techniques that help facilitate pregnancy (e.g. in vitro fertilization). A 2008 report by the Centers for Disease Control and Prevention estimated that ART has been successful in leading to a live birth in 31% of cases. A new fertility clinic claims that their success rate is higher than average. A random sample of 30 of their patients yielded a success rate of 40%. You are to evaluate the claim that this new fertility clinic’s success rate is no different from what was reported in 2008.

We can assume that the we have IID random variables \(Y_1,\ldots, Y_{30}\) are IID \(\mathsf{Ber}\left(p\right)\), where \(p\) is the unknown estimand. The context described above is very similar to Examples 6.2.1 and 6.2.2. The difference from a statistical view is the distribution of the random variables. Your task is to calculate a \(p\)-value in different ways.1 Let \(\widehat{p}=\overline{Y}\). which is really just a sample proportion. Notice that the book uses \(\widehat{p}\) in two senses: as an estimator and an estimate.

In the current context, the required \(p\)-value is given by \[\mathbb{P}\left(\widehat{p} \geq 0.4 | p = 0.31\right).\]

  1. (2 points) Why do you think this is an appropriate \(p\)-value to calculate in the given context?

ANSWER: The sample proportions which would be considered “extreme” under the claim that \(p=0.31\) would be those sample proportions larger than or equal to 0.4 because a new clinic would typically advertise its successes rather than its failures.

  1. (2 points) What is the exact distribution of \(\displaystyle\sum_{i=1}^n Y_i\)? Show that you can simply rewrite the required \(p\)-value as \[\mathbb{P}\left(\widehat{p} \geq 0.4 | p = 0.31\right)=\mathbb{P}\left(\sum_{i=1}^{30} Y_i \geq 12 \bigg| p =0.31\right).\] Find this value using the appropriate commands in R. Be careful here and read the documentation to be sure.

ANSWER: The exact distribution of \(\displaystyle\sum_{i=1}^n Y_i\) is a Binomial distribution where \(n=30\) and \(p\) is left unknown. Since \(\widehat{p} \geq 0.4\) and \(\widehat{p}=\overline{Y}\), then \(\overline{Y}\geq 0.4\) which implies \(\displaystyle\sum_{i=1}^{30} Y_i \geq 12\).

# One way to produce the exact answer using R
pbinom(q = 11, size = 30, prob = 0.31, lower.tail = FALSE)
[1] 0.1908786
# Another acceptable way 
1 - pbinom(q = 11, size = 30, prob = 0.31, lower.tail = TRUE)
[1] 0.1908786
  1. (3 points) Consider R code based on LM Example 6.2.2 with modifications presented below. Fill in the missing parts of the code and use it to calculate a simulated \(p\)-value. Report the code you used and your result.

ANSWER:

nsim <- 10^4
# Create artificial data 
art.mat <- replicate(nsim, rbinom(30, 1, 0.31))
# Calculate the simulated success rate
sim.success <- colMeans(art.mat)
# Obtain a simulated p-value
mean(sim.success >= 0.4)
[1] 0.1967
  1. (4 points) Yet another way to calculate the required \(p\)-value is to either apply LM Theorem 4.3.1 or to also include a continuity correction (see LM page 240). Find these two values by hand and verify using the appropriate commands in R.

ANSWER: Let \(Z\sim N\left(0,1\right)\). \[\mathbb{P}\left(\widehat{p} \geq 0.4 | p = 0.31\right) = \mathbb{P}\left(\frac{\widehat{p}-p}{\sqrt{\frac{p\left(1-p\right)}{n}}} \geq \frac{0.4-0.31}{\sqrt{\frac{0.31(1-0.31)}{30}}}\Bigg| p=0.31\right) \approx \mathbb{P}\left(Z\geq 1.07\right)=1-0.8577=0.1423\] The latter calculation is based on Table A.1 of LM.

If we employ the continuity correction, then \[\mathbb{P}\left(\widehat{p} \geq 0.4 | p = 0.31\right) = \mathbb{P}\left(\sum_{i=1}^{30} Y_i \geq 12 \bigg| p = 0.31\right) = \mathbb{P}\left(\sum_{i=1}^{30} Y_i \geq 11.5 \bigg| p = 0.31\right).\] After that, we calculate the normal approximation to the latter probability: \[\mathbb{P}\left(\sum_{i=1}^{30} Y_i \geq 11.5 \bigg| p = 0.31\right) = \mathbb{P}\left(\widehat{p} \geq 11.5/30 | p = 0.31\right)= \mathbb{P}\left(\frac{\widehat{p}-p}{\sqrt{\frac{p\left(1-p\right)}{n}}} \geq \frac{11.5/30-0.31}{\sqrt{\frac{0.31(1-0.31)}{30}}}\Bigg| p=0.31\right)\approx \mathbb{P}\left(Z\geq 0.87\right)\approx 1-0.8078=0.1922.\]

# One way to form the commands
pnorm(q = 12, mean = 30*0.31, sd = sqrt(0.31*(1-0.31)*30), lower.tail = FALSE)
[1] 0.1432448
pnorm(q = 11.5, mean = 30*0.31, sd = sqrt(0.31*(1-0.31)*30), lower.tail = FALSE)
[1] 0.1925675
# Another way to form the commands
pnorm(q = (0.4-0.31)/sqrt(0.31*(1-0.31)/30), lower.tail = FALSE)
[1] 0.1432448
pnorm(q = (11.5/30-0.31)/sqrt(0.31*(1-0.31)/30), lower.tail = FALSE)
[1] 0.1925675
  1. (1 point) Given what you have seen so far, do you think the continuity correction is needed? Why?

ANSWER: Yes, the continuity correction is needed to have a better match with the exact probability provided by the binomial distribution. The simulated \(p\)-value matches the probability even better than the normal approximation!

NOTE: The “rule of thumb” in the comment found in LM page 240 is satisfied, but as you can see, the continuity correction is still needed. I would advise against using this “rule of thumb”.

  1. (1 point) Based on the \(p\)-values calculated in different ways, do you think there is evidence to support the claim that this new fertility clinic’s success rate is no different from what was reported in 2008? What is your reasoning?

NOTE: Answers here could vary depending on what the student would pre-specify as large or small.

ANSWER: My answer is that the data support the claim that this new fertility clinic’s success rate is no different from what was reported in 2008 because the \(p\)-value of about 0.19 indicates that a success rate of 0.40 is typical given past performance of ART, which was taken to be \(p=0.31\). Alternatively, 0.4 is about 1.06 of a standard error above \(p=0.31\). Yet another way is to look at the simulated distribution of the sample proportions under \(p=0.31\), presented as a histogram or as a table. Observe that seeing \(\widehat{p}=0.4\) is something typical under \(p=0.31\).

hist(sim.success, freq = FALSE)
# Past performance
abline(v = 0.31, lty = 2)
# New success rate
abline(v = 0.4, col = "#D55E00")

table(sim.success)
sim.success
0.0333333333333333 0.0666666666666667                0.1  0.133333333333333 
                 1                  4                 32                162 
 0.166666666666667                0.2  0.233333333333333  0.266666666666667 
               418                721               1051               1351 
               0.3  0.333333333333333  0.366666666666667                0.4 
              1556               1521               1216                841 
 0.433333333333333  0.466666666666667                0.5  0.533333333333333 
               576                327                128                 56 
 0.566666666666667                0.6  0.633333333333333 
                24                 14                  1 

Consider a different approach to evaluating the claim. Someone makes the following suggestion: Since we really do not know \(p\), we need a way to control how often, in repeated samples, we make the mistake of rejecting \(p=0.31\) even if \(p=0.31\) is true. Let us control this rate at 5%. Based on this target rate, we should reject the claim whenever \(\displaystyle\sum_{i=1}^{30} Y_i \geq 14\) (or if you wish \(\widehat{p} \geq 14/30 \approx 0.47\)).

If we adopt this suggestion, we will have to reject the claim because we actually observe a sum equal to 12 (or if you wish we observe a proportion equal to 0.4). You are now to going to explore the properties of the suggestion in repeated samples. In what sense is the suggestion useful?

  1. (2 points) Given that you know the exact distribution of \(\displaystyle\sum_{i=1}^{30} Y_i\), write down (do not calculate!!) an expression for \[\mathbb{P}\left(\sum_{i=1}^{30} Y_i \geq 14\bigg| p=0.31\right).\] Use R to calculate the exact value of this probability.

An expression directly comes from the probabilities coming from a \(\mathsf{Bin}\left(30, 0.31\right)\): \[\mathbb{P}\left(\sum_{i=1}^{30} Y_i \geq 14\bigg| p=0.31\right)=\sum_{k=14}^{30} \begin{pmatrix}30\\ k\end{pmatrix} 0.31^k\times 0.69^{30-k}.\]

# One way
pbinom(q = 13, size = 30, prob = 0.31, lower.tail = FALSE)
[1] 0.05198685
# Another way 
1 - pbinom(q = 13, size = 30, prob = 0.31, lower.tail = TRUE)
[1] 0.05198685
  1. (2 points) Because the previous probability is the same as \[\mathbb{P}\left(\widehat{p} \geq \frac{14}{30} \bigg| p=0.31\right),\] you could use a Monte Carlo simulation to understand what this probability actually means. Below you will see pretty much the same code as in Item 3. Fill up the missing parts of the R code.
nsim <- 10^4
# Create artificial data again
art.mat <- replicate(nsim, rbinom(30, 1, 0.31))
# Calculate the simulated total successes
sim.success <- colMeans(art.mat)
# Estimate the desired probability using simulation
mean(sim.success >= 14/30)
[1] 0.052
  1. (3 points) Let us see if you understood what you found in Item 8. What distribution was used to generate the 10000 artificial datasets? In how many out of the 10000 artificial datasets did you reject the claim? What do you think was achieved by applying the suggestion?

The \(\mathsf{Bin}\left(30, 0.31\right)\) was used to generate the 10000 artificial datasets. I rejected the claim in about 520 out of the 10000 artificial datasets. By using the suggestion that one should “reject the claim whenever \(\displaystyle\sum_{i=1}^{30} Y_i \geq 14\)”, we were able to control how often, in repeated samples, we make the mistake of rejecting \(p=0.31\) even if \(p=0.31\) is true.

Footnotes

  1. Don’t confuse the \(p\) in \(p\)-value with the estimand \(p\).↩︎