During the first four weeks of the course, I introduced almost everything related to the sample mean of IID normal random variables \(N\left(\mu,\sigma^2\right)\). In particular, I showed how to use the techniques you already know from probability theory to evaluate claims about the population mean \(\mu\).
During the next four weeks of the course, we tried to move beyond the normal case to other parametric models in terms of parameter estimation and hypothesis testing. Now, we are moving on to more complicated data analysis techniques involving comparisons of population means but returning to a baseline assumption of normality.
Multi-sample problems
Let \(Y_{ij}\) where \(j=1,\ldots,k\) and \(i=1,\ldots,n_j\). You could put this in tabular form like LM Table 12.1. Now consider the following special cases:
One-sample problem where \(k=1\), with sample size \(n=n_1\)
Two-sample problem where \(k=2\), with sample sizes \(n_1\) and \(n_2\) such that \(n=n_1+n_2\).
Multi-sample problem with sample sizes \(n_1, n_2, \ldots, n_k\) such that \(n=n_1+n_2+\cdots+n_k\).
The case of equal sample sizes where \(n_1=n_2=\cdots=n_k\)
The total sample size is \[n=\sum_{j=1}^k n_j.\]
Algebraic aspects
Most of the statistical theory related to multi-sample problems involve decomposition of sums of squares. The decomposition is algebraic in nature and has nothing to do with statistics. But statistical theory is used to give more meaning to the components of the decomposition. Below you are going to be introduced to notation and to work out the steps of the algebra.
Define the \(k\)batch or treatment means as \[\overline{Y}_{\bullet j}=\frac{1}{n_j}\sum_{i=1}^{n_j} Y_{ij},\ \ j=1,\ldots,k\]
Define the grand or overall mean as \[\overline{Y}_{\bullet\bullet}=\frac{1}{n}\sum_{j=1}^k \sum_{i=1}^{n_j}Y_{ij}=\frac{1}{n}\sum_{j=1}^k n_j \overline{Y}_{\bullet j}\]
Define the total sum of squares or total variation, denoted by SSTOT in LM, to be \[\mathrm{SSTOT}=\sum_{j=1}^k\sum_{i=1}^{n_j} \left(Y_{ij}-\overline{Y}_{\bullet\bullet}\right)^2\]
Define the treatment sum of squares or model sum of squares or batch sum of squares or between-samples variation, denoted by SSTR in LM, to be \[\mathrm{SSTR}=\sum_{j=1}^k \sum_{i=1}^{n_j} \left(\overline{Y}_{\bullet j}-\overline{Y}_{\bullet\bullet}\right)^2=\sum_{j=1}^k n_j\overline{Y}_{\bullet j}^2-n\overline{Y}_{\bullet\bullet}^2\]
Define the error sum of squares or residual sum of squares or within-samples variation, denoted by SSE in LM, to be \[\mathrm{SSE}=\sum_{j=1}^k\sum_{i=1}^{n_j} \left(Y_{ij}-\overline{Y}_{\bullet j}\right)^2\]
Observe that the within sum of squares could be written as \[\mathrm{SSE}=\sum_{j=1}^k\sum_{i=1}^{n_j} \left(Y_{ij}-\overline{Y}_{\bullet j}\right)^2=\sum_{j=1}^k \left(n_j-1\right)S_j^2\] where \(S_j^2\) is the within-sample variance: \[S_j^2=\frac{1}{n_j-1} \sum_{i=1}^{n_j} \left(Y_{ij}-\overline{Y}_{\bullet j}\right)^2\]
Decomposition of sums of squares is key to understanding the statistical procedures. In fact, you have been seeing decompositions already. One example is the following algebraic relationship you have already seen before: \[\sum_{i=1}^{n}Y_{i}^2=\sum_{i=1}^{n} \left(Y_{i}-\overline{Y}\right)^2+n\overline{Y}^2,\] which may be considered as a special case of \[\sum_{j=1}^k \sum_{i=1}^{n_j}Y_{ij}^2 =\sum_{j=1}^k\sum_{i=1}^{n_j} \left(Y_{ij}-\overline{Y}_{\bullet\bullet}\right)^2+n\overline{Y}_{\bullet\bullet}^2\]
Another decomposition of interest is \[\sum_{j=1}^k\sum_{i=1}^{n_j} \left(Y_{ij}-\overline{Y}_{\bullet\bullet}\right)^2=\sum_{j=1}^k\sum_{i=1}^{n_j} \left(\overline{Y}_{\bullet j}-\overline{Y}_{\bullet\bullet}\right)^2+\sum_{j=1}^k\sum_{i=1}^{n_j} \left(Y_{ij}-\overline{Y}_{\bullet j}\right)^2\]
A similar decomposition is related to the previous one: \[\sum_{j=1}^k \sum_{i=1}^{n_j}Y_{ij}^2 = \left[\sum_{j=1}^k n_j\overline{Y}_{\bullet j}^2-n\overline{Y}_{\bullet\bullet}^2 \right]+ \sum_{j=1}^k\sum_{i=1}^{n_j} \left(Y_{ij}-\overline{Y}_{\bullet j}\right)^2 +n\overline{Y}_{\bullet\bullet}^2\]
Statistical aspects
For the moment, assume that \(\left(Y_{1j}, Y_{2j}, \ldots, Y_{n_j,j}\right)\) is a random sample from some distribution. Assume also that each of these random samples are mutually independent. Assume further that for all \(j=1,\ldots,k\) and \(i=1,\ldots,n_j\), we have \(\mathbb{E}\left(Y_{ij}\right)=\mu_j\) and \(\mathsf{Var}\left(Y_{ij}\right)=\sigma^2\). Let the overall average be denoted by \(\mu\), which has a slightly different meaning in our context: \[\mu=\frac{1}{n}\sum_{j=1}^k n_j\mu_j\] We have the following results:
The batch or treatment means have the following first two moments: \[\mathbb{E}\left(\overline{Y}_{\bullet j}\right)=\mu_j\ \ \mathsf{Var}\left(\overline{Y}_{\bullet j}\right)=\frac{\sigma^2}{n_j}\]
The grand mean has the following first two moments: \[\mathbb{E}\left(\overline{Y}_{\bullet\bullet}\right)=\frac{1}{n}\sum_{j=1}^k n_j\mu_j=\mu, \ \ \mathsf{Var}\left(\overline{Y}_{\bullet\bullet}\right)=\frac{\sigma^2}{n}\]
LM Theorem 12.2.1 calculates the first moment of SSTR: \[\mathbb{E}\left(\mathrm{SSTR}\right)=\left(k-1\right)\sigma^2+\sum_{j=1}^k n_j\left(\mu_j-\mu\right)^2\]
SSE has the following first moment: \[\mathbb{E}\left(\mathrm{SSE}\right)=\left(n-k\right)\sigma^2\]
The pair represented by the \(j\)th batch mean and the \(j\)th sample variance \(\left(\overline{Y}_{\bullet j}, S^2_j\right)\) are mutually independent across \(j\).
Parts of LM Chapters 7, 9, and 12 impose a normality assumption. Because of this distributional assumption, we have the following results:
The \(j\)th batch mean and the \(j\)th sample variance are independent of each other. Therefore, \(\overline{Y}_{\bullet 1}, \ldots, \overline{Y}_{\bullet k}\) and SSE are mutually independent of each other.
SSE and SSTR are independent of each other. The proof may be found in LM Theorem 12.2.3b.
We know the full distribution of each of the batch means. For \(j=1,\ldots,k\), we have \[\overline{Y}_{\bullet j}=\frac{1}{n_j} \sum_{i=1}^{n_j} Y_{ij} \sim N\left(\mu_j, \frac{\sigma^2}{n_j}\right).\]
We know the full distribution of the grand mean: \[\overline{Y}_{\bullet\bullet}=\frac{1}{n}\sum_{j=1}^k\sum_{i=1}^{n_j} Y_{ij} \sim N \left( \frac{1}{n}\sum_{j=1}^k n_j\mu_j, \frac{\sigma^2}{n}\right).\]
We also know the full distribution of \(\mathrm{SSTR}/\sigma^2\), but this was relegated to the appendix of Chapter 12. Refer to the first part of the proof found in LM Theorem 12.A.2.2. \(\mathrm{SSTR}/\sigma^2\) has a non-central chi-square distribution with \(k-1\) degrees of freedom and non-centrality parameter \[\gamma=\frac{1}{\sigma^2}\sum_{j=1}^k n_j\left(\mu_j-\mu\right)^2.\]
We know the full distribution of \(\mathrm{SSE}/\sigma^2\). The proof may be found in LM Theorem 12.2.3a. In particular, \(\mathrm{SSE}/\sigma^2\) has a chi-square distribution with \(n-k\) degrees of freedom.
Using the theory so far, you can already work on special cases for one-sample and two-sample problems.
Hypothesis testing involving normal population means
One null hypothesis that the LM focuses on is the equality of means, i.e. \(\mu_1=\cdots=\mu_k=\mu_0\) where \(\mu_0\) is a fixed value. This fixed value may is typically unknown, but may be pre-specified. Under this null, we have further results under normality:
\(\mathrm{SSTOT}/\sigma^2\) has a chi-square distribution with \(n-1\) degrees of freedom.
\(\mathrm{SSTR}/\sigma^2\) has a chi-square distribution with \(k-1\) degrees of freedom. A proof which applies moment generating functions may be found in Appendix 12.A.1.
The statistic \[\frac{SSTR/\left(k-1\right)}{SSE/\left(n-k\right)}\] has an \(F\)-distribution with \(k-1\) numerator degrees of freedom and \(n-k\) denominator degrees of freedom. The proof may be found in LM Theorem 12.2.5a.
When you reject \(\mu_1=\cdots=\mu_k=\mu_0\), it does not really tell you which of the \(\mu_j\)’s are responsible for the lack of equality among means. This leads to “searching” for the identity of the \(\mu_j\), which is the basis of Section 12.3 about multiple comparisons using Tukey’s method. These multiple comparisons go through all possible comparisons of pairs of individual means. Because these are done after seeing a rejection of the null and not having a clear idea of what direction to search in, these are called post hoc comparisons.
On the other hand, if you do not plan on testing \(\mu_1=\cdots=\mu_k=\mu_0\), but instead focus on planned and pre-specified subhypotheses, then the desired contrast may be tested directly. The approach laid out in Section 12.4 discusses a procedure which is valid under normality and homoscedasticity.
Modern interpretations of the analysis of variance
The biggest practical issues with the analysis of variance stem from the equal variances assumption (called homoscedasticity) and the normality assumption. Without the equal variances assumption, everything you have seen falls apart. There is a lot of research and simulation studies on these aspects.
It is rare to see the analysis of variance taught to economics and finance audiences. You will see this technique applied more in other social sciences like psychology. For economics and finance audiences, analysis of variance typically “drops out” after computing a linear regression. There is growing recognition that the many tests of hypotheses (especially those discussed in LM Chapters 9 to 14) are special cases of a linear regression. So, what makes the analysis of variance special?
Statisticians have debated what the analysis of variance is all about. Consider Speed (1987) along with the discussions and Gelman (2005) also with discussions. The latter paper probably distills the essence of the key idea behind the analysis of variance: It is a tool to summarize groups of estimates when estimating complex models. In that sense, the focus is not on hypothesis testing at all. A compressed version of the latter paper was also written by Gelman for economists.
Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
returning NA
Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
returning NA
Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
returning NA
Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
returning NA
Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
returning NA
Chloramphenicol Erythromycin Penicillin Streptomycin Tetracycline
NA NA NA NA NA
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: User-defined Contrasts
Fit: aov(formula = age ~ training.grp, data = infant)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
1 == 0 -1.2517 0.8756 -1.43 0.169
(Adjusted p values reported -- single-step method)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: User-defined Contrasts
Fit: aov(formula = age ~ training.grp, data = infant)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
1 == 0 -0.6417 0.9183 -0.699 0.493
(Adjusted p values reported -- single-step method)