Welcome to Mathematical Statistics course webpage!

Course description

Mathematical statistics is one language you can use to discuss statistical topics and applications. In this course, we discuss how to construct estimation and inference procedures which could be applied to real data. We discuss what would be desirable in the design of these procedures. Along the way, we discuss how these procedures can be used to shed light on research questions you may have.

This course webpage will serve as a “living” class syllabus. Course materials (notes, homework assignments, etc.) will be linked to from here and will be regularly updated.

Goals and prerequisites

The main goal of the course is for you to gain a deeper conceptual, theoretical, and computational understanding of how statistics is applied in scientific, professional, and industrial contexts. I want you to know what statistical methods are available to you and understand in what contexts you should be applying these methods. Although this course may feature statistical methods which seem too simple, understanding these simpler methods ultimately is a step towards gaining confidence in properly using more complex statistical methods.

For the prerequisites of the course, you should have already taken and passed a first course in probability theory, and a course in differential and integral calculus. We will be doing a review of some aspects of the prerequisites by applying what you have learned to what is called normal models.

About the course instructor and teaching assistant

My name is Andrew Adrian Pua. I am teaching this course to economics and finance majors, along with Huihui Li (data science majors) and Wei Zhong (statistics majors). The latter is the main coordinator of the course.


I am available to answer questions in four ways:

  • in class: Ask immediately before, during, or after the lecture.

  • via the public mathstat2023 DingTalk group: Asking questions in the DingTalk group is for the benefit of everyone.

  • emailing me at andrewypua at outlook dot com: Expect responses within two working days. If I have not responded, please remind me through email or in class. You may also choose to send an email to set up an appointment. How do you write an email properly? Here is some good advice.

  • physically at the Economics Building B405 during office hours Tuesdays and Thursdays 1300 to 1400 and 1700 to 1900.

Teaching assistant

Your teaching assistant (TA) is Zheng Zhesheng. His office hours are TBA.

Course textbook and references

The coordinator of the course requires the use of the following textbook (I will call it LM):

Larsen, R. J., & Marx, M. L. (2018). An Introduction to Mathematical Statistics and Its Applications (6th ed.). Pearson.

The book is available as a reprinted Chinese edition from the China Machine Press 机械工业出版社. The Chinese title is 数理统计及其应用(英文版,原书第6版). I bought a secondhand copy at 多抓鱼 for 43 yuan, but it might be rare. Other copies are available at 京东 and 淘宝.

The course focuses on Chapters 5, 6, 7, 9, 10, and 12. In the notes and during the lecture, I will make references to other chapters of the book.

The coordinator also suggests the following main English language references:

  • (More graduate level) Casella, G., & Berger, R. L. (2002). Statistical Inference (2nd ed.). Brooks/Cole.
  • (Undergraduate to graduate level) Hogg, R. V., McKean, J. W., & Craig, A. T. (2013). Introduction to Mathematical Statistics. Pearson College Division.

There are other references out there and the choice of reference depends on your taste. Follow Roger Koenker’s advice: “In my experience it is always better to find a book that seems slightly below your comfort level and then try to conscientiously read it – by which I mean fill in the details of the arguments along the way and do a reasonable selection of the problems.” Personally, I learned the hard way that not everything written in a book is always right.

  • (Best read as a supplement, a reviewer comments that this book “demonstrates that statistics is not merely a branch of mathematics”) Cox, D. R. and Donnelly, C. A. (2011). Principles of Applied Statistics. Cambridge University Press.

  • (Thinner than standard books, with coverage of more recent topics, author writes the book for the “mathematically” literate and completely avoids normal models and parametric families) Arias-Castro, E. (2022). Principles of Statistical Analysis: Learning from Randomized Experiments. Cambridge University Press. (Free, legal pre-publication version here, R notebook here)

  • (My personal favorite, undergraduate introductory level but makes you think more) Freedman, D. A., Pisani, R., and Purves, R. (1998). Statistics (4th ed.). W. W. & Norton Co. 

  • (Undergraduate introductory level, but very different from the usual business statistics book) Stine, R. A. and D. P. Foster. (2011). Statistics for Business: Decision Making and Analysis (3rd ed.). Pearson.

  • (My personal favorite, short chapters, undergraduate to graduate level) Wasserman, L. A. (2004). All of Statistics: A Concise Course in Statistical Inference. Springer-Verlag.

  • (Undergraduate level, first third of the book may be relevant) Goldberger, A. S. (1998). Introductory Econometrics. Harvard University Press.

  • (My personal favorite, undergraduate to graduate level targeted towards economics, first hald of the book may be relevant) Goldberger, A. S. (1991). A Course in Econometrics. Harvard University Press.

  • (Short chapters, meant for undergraduate students) Dekking, F. M., Kraaikamp, C., Louphaä, H. P., and Meester, L. E. (2005). A Modern Introduction to Probability and Statistics. Springer-Verlag. available within XMU only

  • (Standard, undergraduate to graduate level, personally prefer this over LM) DeGroot, M. H. and Schervish, M. J. (2011). Probability and Statistics (4th ed.). Pearson.

  • (Standard, undergraduate to graduate level) Devore, J. L., K. N. Berk, and M. A. Carlton. (2021). Modern Mathematical Statistics with Applications (3rd ed.). Springer-Verlag. available within XMU only

  • (Thinner than standard books, undergraduate to graduate level) Abramovich, F. and Ritov, Y. (2013). Statistical Theory: A Concise Introduction. CRC Press.

  • (The first half of Part I directly tied to the course, but the remainder may be of applied interest) Kennett, R. S. and Zacks, S. (2021). Modern Industrial Statistics: With Applications in R, MINITAB and JMP (3rd ed.). John Wiley & Sons.

  • (Undergraduate to graduate level targeted towards economics) Amemiya, T. (1994). Introduction to Statistics and Econometrics. Harvard University Press.

  • (Graduate level but tailored to the social sciences, suitable for undergraduates with stronger backgrounds) Aronow and Miller, (2019). Foundations of Agnostic Statistics. Cambridge University Press.

  • (Graduate level but targeted towards economics and finance) Gallant, A. R. (1997). An Introduction to Econometric Theory: Measure Theoretic Probability and Statistics with Applications to Economics. Princeton University Press.

  • (Not for everyone, unconventional but dated) Fraser, D. A. S. (1958). Statistics: An Introduction. John Wiley & Sons.

  • (Not for everyone, can be demanding even for the familiar, with emphasis on the likelihood) Fraser, D. A. S. (1976). Probability & Statistics: Theory and Applications. Duxbury.

  • (Not very standard, heavily uses linear algebra, undergraduate to graduate level) Stone, C. J. (1996). A Course in Probability and Statistics. Wadsworth.

  • (Graduate level) Bickel, P. J. and Doksum, K. A. (2015). Mathematical Statistics: Basic Ideas and Selected Topics Volume I. CRC Press.

Course grading

There are five components of the assessment and grading:

  1. Closed-book final exam: This component is worth 40% of final grade. Coverage is comprehensive. Bring a non-programmable scientific calculator.

  2. Closed-book midterm exam: This component is worth 30% of final grade. Coverage is likely Chapters 5 and 6, but this is yet to be confirmed. Bring a non-programmable scientific calculator.

  3. Homework, in-class work, work in pursuit of the final project, and statistics diary: This component is a catch-all for what you put in regularly into the course. The large number of items belonging to this component should reduce the pressure on you to copy someone else’s (or let someone else copy) solutions to the homework. I encourage you to make your own mistakes for the homework. This component is worth 15% of final grade.

    1. Homework is based on material from the textbook and other references. They are supposed to be handed in during the specified date and time.
    2. In-class work is based on activities and worksheets which would be submitted during class time.
    3. Work in pursuit of the final project involve documenting the work leading to the final project. Examples include selecting a topic, writing up your understanding of the topic, preliminary results, etc. More details to follow later.
    4. The statistics diary is a modified version of Andrew Gelman’s suggestion in his blog. His blog has examples of students’ work on their diaries. Another example of a diary can be found here. Submit entries at our SPOC website.
  4. Final project: The final project is a short English-language paper based on a topic agreed upon between you and the instructor. This component is worth 10% of final grade. Details about the project can be found here.

  5. Quiz: Quizzes are unannounced but are designed to be short to facilitate immediate feedback. You are allowed to refer to the notes and the textbook. This component is worth 5% of final grade.

Course policies

  1. Follow instructions. Ask for clarification when something is unclear.
  2. I do not take attendance regularly, but if you are marked or found absent six or more times, then you automatically are not allowed to take the final exam. You are invited to sleep in class if necessary. If you need to ask for leave, completely fill up the relevant form here, scan it along with supporting evidence/documents, email to me, and then I will take care of the rest. Do this as early as possible. If it is not possible, send me an email letting me know the situation and we can take care of the paperwork later on.
  3. You may use electronics such as a laptop or a tablet but not phones. It is easy to be distracted so do not open anything else with your laptop or tablet aside from class-related material. If I catch you doing something else unrelated to class, then you will be marked absent immediately.
  4. With the advent of new technologies making it easier for anyone to not work hard, it is becoming more difficult to determine whether a student actually knows or even understands anything related to the course. Furthermore, it becomes harder to assess the individual contribution of a student in submitted work. Therefore, using AI-based tools, translation software, solution manuals (especially those explicitly for instructors only), or other material that is explicitly forbidden will have severe consequences. As always, cheating in whatever form will also have severe consequences.
  5. In her syllabi, Deirdre McCloskey writes, “All grades are final. No amount of pleading will change your grade unless I make a mistake in adding up grades. Life is much more unfair than this!” Sending me a message pleading or begging will not help.

Information about using the materials

Ownership and citations

Lecture Materials for Mathematical Statistics (2023 version) by Andrew Adrian Pua is licensed under Attribution-ShareAlike 4.0 International

To cite these slides, please use

Pua, Andrew Adrian (Year, Month Day). Lecture Materials for Mathematical Statistics (2023 Version). https://mathstat.neocities.org/.

Finding typos or unclear portions

If you find typos or unclear portions in the notes, please let me know. I will be monitoring your contributions during the semester and I will acknowledge you in these notes. If you make substantial contributions, I will treat you to some non-alcoholic drinks at Cafe Avion located inside the university.

Resources on time management and learning to learn

I would ask you to take an opportunity to reexamine how you learn and study things. It does not matter if your motivation is only to pass the exam or something greater. It would be good for society if you study for something much greater. I have found the following resources to be helpful to students I have taught in the past. Of course, I am not sure if it would work for you, but do keep an open mind.

Course diary

June 6 and 8

  1. Going back to our first and second cookie datasets, our M&M dataset, slides with analysis
  2. Practice exercises, hints and partial solutions
  3. Notes on the bootstrap

May 30, June 1, and 6


  1. Extending the binomial distribution
  2. Testing equality of proportions
  3. Testing whether a pre-specified distribution is compatible with the data
  4. Testing for independence

Assigned readings: Testing \(H_0: \ p_X=p_Y\) (Section 9.3), Introduction (Section 10.1) The Multinomial Distribution (Section 10.2), Goodness-of-Fit Tests: All Parameters Known (Section 10.3), Goodness-of-Fit Tests: All Parameters Unknown (Section 10.4), Contingency Tables (Section 10.5), Notes on goodness-of-fit testing

Activities: Does the color distribution of M & M’s in China match the color distribution in the US (at least based on company information in 2008)?


  • Test the goodness-of-fit of the Poisson for Case Study 4.2.2 and the goodness-of-fit of the exponential for Case Study 4.2.4.
  • All exercises in LM Section 10.2
  • Pay attention to Case Study 10.3.2 and Example 10.3.1 (for the latter, you encountered something similar for the median test in Example 5.3.2)
  • All exercises in LM Section 10.3: Pay attention to the computational shortcut in 10.3.1, the different ways of setting up a test in 10.3.4 and 10.3.5.
  • All exercises in LM Section 10.4: Pay attention to 10.4.14 as the distribution here may not be very familiar to you.
  • Pay attention to Case Study 10.4.3 as the data shows up as discrete yet the goodness of fit test is for checking whether normality is compatible with the data.
  • Pay attention to Case Study 10.5.2 as the data are originally continuous variables then converted into two categorical variables!
  • All exercises in LM Section 10.5: Most questions here are really very typical. Perhaps the slightly different one is 10.5.5.

May 30:

  • Examples of goodness-of-fit situations: features, what to pay attention to, similarities and differences
  • The form of the goodness-of-fit statistic and its distribution in large samples
  • In-class exercise to test whether the color distribution of M & M’s in China match the color distribution in the US (at least based on company information in 2008)“M&M data: Every student had one dataset and was asked to calculate the test statistic, along with the critical value at the 5% level, and if possible a \(p\)-value.
  • R code based on one of your classmates’ dataset:
observed <- c(1, 3, 4, 2, 4, 1)
null.prob <- c(0.13, 0.13, 0.24, 0.2, 0.16, 0.14)
chisq.test(observed, p = null.prob) # Note there is a warning
qchisq(0.95, 5) # critical value which should match the table from textbook

June 1:

  • Why are there warnings when we applied goodness-of-fit testing to the M&M dataset? We can also apply the test to the totals for the entire class. Complete dataset here
# Load M&M dataset
mandm <- read.csv("mandm-01.csv")
# obtain totals
observed <- apply(mandm[, 2:7], 2, sum)
null.prob <- c(0.13, 0.13, 0.24, 0.2, 0.16, 0.14)
chisq.test(observed, p = null.prob)
  • Applying chi-square testing to test goodness-of-fit for continuous distributions: Not very straightforward and many arbitrary choices are involved.
  • How to do equiprobable partitioning: complicated computationally but may be preferable so that you can achieve \(np_{i0}\geq 5\)
# Example 10.3.1 equiprobable partitioning
# finding right endpoint to guarantee probability equal to 0.2
f <- function(c) 3*c^2-2*c^3-0.2 
# use uniroot() to find a root in an interval which is known to us 
# check Table 10.3.5 as to why this interval is a good place to search
uniroot(f, c(0.2, 0.4))
  • Choosing an appropriate plug-in when proposed statistical model depends on unknown parameters: Ideally, you use a consistent estimator with low variance as a plug-in. Typically, MLE would be the right choice. But as discussed in class, it should be MLE applied to the grouped data rather than the ungrouped data. In practice, people use the latter!
  • You also pay a price for not knowing the parameters! The degrees of freedom are reduced by the number of parameters estimated.
  • Applications: Benford’s law (a nice elementary explanation of why the law exists in the first place is found here, a nice example for using Benford’s law for detecting corruption in election campaigns and discusses what kind of numbers will Benford’s law potentially apply can be found here), Case Study 10.3.3 (data which are too good to be true)
  • Application on testing independence of two categorical variables: Why is this an application? Convert the problem into a goodness-of-fit testing problem involving one categorical variable. Finished with an introduction to the sex bias article.

May 4, 9, 11, 16, 18, 23, and 25


  1. Distributions connected to the normal
  2. Normal one-sample model, again
  3. Normal multi-sample models
  4. Analysis of variance
  5. Multiple comparisons vs contrasts

Assigned readings:

  1. Section 3.2.2 of Notes on normal models, Notes on the analysis of variance
  2. Deriving the Distribution of \(\dfrac{\overline{Y}-\mu}{S/\sqrt{n}}\) (Section 7.3), Drawing Inferences About \(\mu\) (Section 7.4), Testing \(H_0: \ \mu_X=\mu_Y\) (Section 9.2), Introduction (Section 12.1), The \(F\)-Test (Section 12.2), Testing Subhypotheses with Contrasts (Section 12.4)
  3. Drawing Inferences About \(\sigma^2\) (Section 7.5), Testing \(H_0: \ \sigma^2_X=\sigma^2_Y\) – The \(F\)-Test (Section 9.3)
  4. Confidence Intervals for the Two-Sample Problem (Section 9.5), Multiple Comparisons: Tukey’s Method (Section 12.3)
  5. Contrasts (Section 12.4) and data transformations (Section 12.5)


  • Homework 04, almost complete solutions to HW04
  • Repeat the in-class exercise but this time for \(V_2=\dfrac{1}{2}Y_1-\dfrac{1}{2}Y_2\) and everything else held constant. What happens to your answers? Students forget the Jacobian but you can indirectly recover it if you follow the exercise. Try it!
  • (Optional) Look back at the in-class exercise. Is the orthogonal transformation unique?
  • (Extremely painful, but only requires perseverance and an extension of the normal distribution to three dimensions) Repeat the in-class exercise but this time for \(n=3\) and consider the transformation of \(Y_1,Y_2,Y_3\) into \(V_1,V_2,V_3\) as follows: \[\begin{eqnarray*}V_1 &=& \frac{1}{\sqrt{3}}Y_1+\frac{1}{\sqrt{3}}Y_2+\frac{1}{\sqrt{3}}Y_3 \\ V_2 &=&-\frac{1}{\sqrt{2}}Y_1+\frac{1}{\sqrt{2}}Y_2 \\ V_3 &=&-\frac{1}{\sqrt{6}}Y_1-\frac{1}{\sqrt{6}}Y_2+\frac{2}{\sqrt{6}}Y_3 \end{eqnarray*}\] You also have to think about a multivariate normal distribution instead of a bivariate normal.
  • LM Exercises 7.3.1, 7.3.14, 7.3.15: More about practicing quick “tricks” with integration
  • LM Exercises 7.3.7, 7.3.8, 7.3.9, 7.3.11, 7.3.12, 7.4.1, 7.4.2, 7.5.1, 7.5.2, 7.5.3, 7.5.4: More about reading tables, practice this too!
  • LM Exercises 7.3.2 to 7.3.6, 7.5.5 to 7.5.8, 7.5.11 to 7.5.14: These are about the theoretical connections provided by knowing the chi-squared distribution or random variables having a chi-squared distribution. 7.5.5 is linked to the work of Wilson and Hilferty (1931). Related to 7.5.5, ask why do you think theis exercise was relevant in the past in the time when software was not widely available!
  • LM Exercises 7.4.4 to 7.4.6, 7.4.12, 7.4.13, 7.4.15: confidence intervals from a theoretical point of view
  • LM Exercises 7.4.7 to 7.4.11, 7.4.14, 7.4.16: These exercises are about calculation of confidence intervals using the data. But pay attention to interesting questions like 7.4.9 and 7.4.16.
  • LM Exercises 7.4.17 to 7.4.22: hypothesis testing exercises
  • LM Exercise 7.4.23 and 7.4.27: probably the most interesting exercises related to hypothesis testing
  • LM Exercises 7.4.24 to 7.4.26: a good place to apply R, how do you simulate from the indicated distributions?
  • LM Exercises 7.5.9, 7.5.10, 7.5.15 to 7.5.17: calculating confidence intervals and testing claims for \(\sigma^2\) using data
  • LM Theorem 7.A.2.1 in the Appendix to Chapter 7: Work out the details.
  • LM Appendix 7.A.3: Work out the details, especially the form of the GLRT.
  • LM Exercises 12.2.1 to 12.2.6: All computations
  • LM Exercises 12.2.7, 12.2.9: Exercises to check whether you understood the algebraic relationships
  • LM Exercise 12.2.8: An interesting question meant for you to think about the assumptions
  • LM Exercise 12.2.10: We did this in class for the case of 3 groups.
  • LM Exercises 12.2.11 to 12.2.13: Connections to Section 9.2
  • all exercises in Chapter 9
  • all exercises in LM Section 12.3, 12.4
  • LM Exercise 12.5.2: presentation here is a bit different, pay attention to the assumptions of ANOVA

May 4:

  • Mostly the big picture – Why should we care about finding the distribution of \(\dfrac{\overline{Y}-\mu}{S/\sqrt{n}}\)?
  • Summarizing the key results: Under IID normality, we have a pivotal quantity \(\dfrac{\overline{Y}-\mu}{S/\sqrt{n}} \sim T_{n-1}\) which can be used to construct confidence intervals , \(\dfrac{\left(n-1\right)S^2}{\sigma^2}\sim \chi^2_{n-1}\), and the independence of \(\overline{Y}\) and \(S\). The latter is NOT intuitive at all and is central to the statistical analysis of normal data.

May 9:

  • In-class exercise meant to provide a way to prove that \(\overline{Y}\) and \(S^2\) are statistically independent under IID normality, at least for \(n=2\). The more general idea for the proof may be found in the Section 3.2.2 of Notes on normal models or in the Appendix for Chapter 7.
  • The beginnings of the analysis of variance (ANOVA) table: Decompose the sum of squares for the data \(\sum_{i=1}^n Y_i^2\) into two independent parts \(V_1^2\) and \(\displaystyle\sum_{i=2}^n V_i^2\). The algebraic decomposition does not require normality, but the independence does. In addition, \(\displaystyle\sum_{i=2}^n V_i^2\) is a sum of independent random variables as well.
  • How to make sense of an ANOVA table in terms of the degrees of freedom and the \(F\)-ratio (or \(t\)-ratio)?
  • In terms of the in-class exercise, \(V_1=\sqrt{2}\cdot\overline{Y}\) and \(V_2=S\). Furthermore, \(V_1\sim N\left(\sqrt{2}\mu,\sigma^2\right)\), \(V_2\sim N\left(0,\sigma^2\right)\), and \(V_1\) and \(V_2\) are independent. We can write \[\left(\dfrac{\overline{Y}-\mu}{S/\sqrt{2}}\right)^2=\dfrac{\left(\dfrac{V_1-\sqrt{2}\mu}{\sigma}\right)^2/1}{\left(\dfrac{V_2^2}{\sigma^2}\right)/1}\] has an \(F\) distribution with 1 numerator degree of freedom and 1 denominator degree of freedom.
  • The previous idea extends to \(n>2\), with a similar table and a similar distributional result. In particular, we can write \[\left(\dfrac{\overline{Y}-\mu}{S/\sqrt{n}}\right)^2=\dfrac{\left(\dfrac{V_1-\sqrt{n}\mu}{\sigma}\right)^2/1}{\left(\dfrac{\sum_{i=2}^n V_i^2}{\sigma^2}\right)/\left(n-1\right)}\] has an \(F\) distribution with 1 numerator degree of freedom and \(n-1\) denominator degree of freedom.
  • A \(t\)-distributed random variable is related to an \(F\)-distributed random variable. In particular, the square of a \(t\)-distributed random variable with \(n\) degrees of freedom has the same distribution as an \(F\)-distributed random variable with 1 numerator degree of freedom and \(n\) denominator degrees of freedom.

May 11:

  • Summarizing Chapter 7 once more: Pay attention to the key results and the pivotal quantities which are now available. Pay attention to the new distributions connected to the normal – are they symmetric? what is their support? what are their moments? how do we calculate probabilities?
  • Bridge to next chapters: the ANOVA table
  • In-class exercise about testing the equality of means for three groups: The main point is for you to become more confident about your understanding of the topics connecting Chapters 7, 9 and 12. The algebra, which may be tedious, is for improving your ability to interpret expressions and get a sense of where things could be going.

May 16:

  • Finished the in-class exercise: intuition for the form of the pivotal quantity, how to show the distribution of the pivotal quantity
  • Connect with the notation in Sections 12.1 and 12.2: focus on the meaning of the symbols rather than the symbols themselves
  • Assigned data collection task which will be Quiz 02

May 18:

  • Review of what we have done and connecting to the book
  • Key ideas behind ANOVA: testing the null of equality of means is really looking into different “cuts” of the sum of squares of every observation; emphasized the relative nature of the test statistic; how the numerator reflects between-sample variation and how the denominator reflects within-sample variation
  • Different ways to think about the ANOVA tables and computational shortcuts
  • Why don’t economics and finance use ANOVA nowadays?
  • What “fishing” and/or data snooping does to hypothesis tests: What is the problem and how do we make adjustments?

May 23:

  • How do we exactly adjust for data snooping (in the sense of testing multiple hypotheses or constructing multiple confidence intervals all using the same data)? Introduced how to make \(P\)-value adjustments using the most basic and general approach by Bonferroni. Bonferroni corrections work in situations even beyond ANOVA, but is extremely conservative.
  • Key aspects of the Bonferroni correction are the probability of a union of events being upper bounded by the sum of individual probabilities and that \(p\)-values have a \(\mathsf{U}(0,1)\) distribution under the null.
  • LM Section 12.3 introduces an alternative called Tukey’s method. The key is to construct a studentized range. But this studentized range has to have a very particular form and is restricted to the normal case. Worked on those details, studied and explained the proof of LM Theorem 12.3.1.
  • Constructing simultaneous confidence intervals for all pairwise differences is slightly similar to the confidence intervals in Section 9.5. But they differ in the sense that Tukey’s approach is “honest” because it accounts for the fact that you went “fishing” for a significant difference between any two groups.
  • Worked on Case Study 12.3.1: Pay attention to the quantiles of the Tukey distribution, especially the notation. If you are using unfamiliar tables, you have to check how the tables were created! How do you create Figure 12.3.2?

May 25:

  • Reminders about the project
  • How are the confidence intervals in LM Section 12.3 different from LM Section 9.5?
  • Reminder about the special case of the test statistic for LM Section 12.2 when there are only two groups
  • Discuss the R implementation of Case Study 12.3.1: document some weird behavior of some commands in R
  • What if you are not “fishing” and you actually already know, even before seeing the data, what subhypotheses to test? Enter the idea of contrasts. The distribution of the contrast estimator under the null repeats what you have seen before for the simplest case of testing \(\mu=\mu_0\) where \(\sigma^2\) is unknown. There is a \(t\)-distributed and an \(F\)-distributed version of the test.
  • Ended with an activity about the color distribution of M&M’s: How to set up the null, what is the intuitive comparison you have to do to determine if the data are compatible with a pre-specified distribution?

Apr 18, 20, 23, 25, 27, and May 4


  1. Testing claims and calibrating decision rules
  2. The \(p\)-value, again
  3. The likelihood ratio test

Activities: Predicting the number and suit of 12 cards randomly chosen from a deck of 52 cards

Assigned readings: The Decision Rule (Section 6.2), Type I and II Errors (Section 6.4), A Notion of Optimality: The Generalized Likelihood Ratio (Section 6.5), Taking a Second Look at Hypothesis Testing (Statistical Significance versus “Practical” Significance) (Section 6.6), Some notes on hypothesis testing

Exercises in LM:

  • 6.2.2: What are the null and alternative hypotheses here? Seeing \(\alpha=0.06\) may feel strange.
  • 6.2.3, 6.2.5: To check whether you understood what changing \(\alpha\) or changing the type of alternative could mean, try to explain your answer rather than just giving a yes or no.
  • 6.2.4: Typical question similar to Examples 6.2.1 and 6.2.2, but nicely articulates (what I also did in class) what are held constant when evaluating a claim like \(\mu=\mu_0\).
  • 6.2.6: Look carefully at the critical region which was proposed.
  • 6.2.7: Probably the most interesting exercise available for this section! Do it, especially (b).
  • 6.2.9: May feel mindless.
  • 6.2.10 and 6.2.11: Need to setup the appropriate null and alternative hypotheses.
  • 6.3.1: More often than not (not just in the exam but in actual practice), you would not even see the setup in (a). You will be asked to setup things in a way that reflects (a).
  • 6.3.2: This is an interesting context for an exercise.
  • 6.3.5: This is an exercise meant to give some connection between confidence intervals and hypothesis testing, but both have different use cases.
  • Revisit Case Study 4.3.1, Examples 5.3.2 and 5.3.3. You have seen these formulated as confidence interval problems, but these could be formulated as hypothesis testing problems. Try the reformulation for yourself! Connect to Exercises 6.3.3 and 6.3.5.
  • Case Study 6.3.2 and Exercise 6.3.6 should be done together.
  • 6.3.7 and 6.3.9 are typical questions where the decision rule is already provided. Your job is to assess the probabilities of both errors.
  • 6.3.8 is very interesting but not used a lot in practice. It gives you a way to deal with the discrete nature of the data.
  • All exercises in Section 6.4 should be done. I highlight some here: 6.4.10 (similar to FastBurger), 6.4.11, 6.4.12 (is this binomial?), 6.4.15 and 6.4.19 (curious why Type II error was emphasized), 6.4.16 (two sample binomial case). Perhaps the most interesting exercises are 6.4.21 and 6.4.22.
  • All exercises in Section 6.5 should be done. 6.5.1 and 6.5.2 are typical exercises which also deal with non-normal, non-binomial cases. 6.5.3 and 6.5.4 are partially solved in the notes. Solutions for 6.5.5 and 6.5.6 are relatively hard to write down.

Apr 18 (remaining 45 minutes):

  • Recap of your first encounter with testing claims and a \(p\)-value calculation
  • How do we actually set up a hypothesis testing problem? Be very aware that there is always a model in the setup!
  • We need to determine the null (the status quo), the alternative (something has changed), a test statistic whose behavior is known under the null (preferably one whose distribution under the null is pivotal), a standard to decide whether there is support for the null or the alternative.
  • This standard needs to be designed properly. I introduced a way to design this standard through a decision rule which allows you to control the probability of a Type I error to a pre-specified level.

Apr 20:

  • Recap of the ideas behind designing a decision rule for hypothesis testing: The observed data did not play a role in the design of the decision rule.
  • What is a significance level and why is the \(p\)-value called the observed significance level?
  • It is difficult to simultaneously make both Type I and Type II error as small as possible. Type II error depends on the alternative (which has a great range compared to the null). The treatment of the null versus the alternative is not equal!
  • When do you actually use hypothesis testing? Usually done in the context of trying to make discoveries or for probing what could be present amidst the noise.
  • It is easy to manipulate the “template” for hypothesis testing. What happens when you design a decision rule that guarantees \(\alpha=0.05\) but you “peek” into the data to fish for a “discovery”? The decision rule designed was “Reject the null when \(\bigg| \dfrac{\overline{Y}-494}{124/\sqrt{86}} \bigg| \geq 1.96\)” so that \(\alpha=0.05\). But then if you do not reject the null, you “peeked” into the data once again and checked whether \(\dfrac{\overline{Y}-494}{124/\sqrt{86}} > 1.65\). If the latter condition is satisfied, we count that as a rejection. The original guarantee of \(\alpha=0.05\) no longer holds! Check the simulation below.
# Function with no arguments, but could be modified
# This code is likely to be slow
# Context is LM Example 6.2.1 and 6.2.2
peeking <- function()
  # The null is true
  data <- rnorm(86, 494, 124)
  # Calculate test statistic
  test.stat <- (mean(data)-494)/(124/sqrt(86))
  # Two-tailed test first
  if(abs(test.stat) > qnorm(0.975))
    return(1) # 1 means reject null
  } else
    # Then one-tailed test
    if(test.stat > qnorm(0.95))
      return(1) # 1 means reject null
    } else
      return(0) # 0 means fail to reject null
# collect all results
results <- replicate(10^4, peeking())
  • We did an activity where you predicted the number and the suit of 12 cards drawn randomly from a deck of 52 cards.

    • The drawn cards were : 9 clubs, 9 diamonds, 9 clubs (again), 6 clubs, A diamonds, 3 spades, 9 diamonds (again), 3 diamonds, K diamonds, Q diamonds, 5 spades, K clubs.
    • Some students misheard ace of diamonds and thought it was eight of diamonds.
    • I was not very specific about what counts as a correct prediction. So I am going to be specific now. The ordering of the drawn cards should not matter. For example, if six of clubs came out on the 4th draw and you placed six of clubs as your 12th prediction, that still counts as a correct prediction. We had duplicates in the draw. If you wrote 9 clubs twice, then that counts as two correct predictions. If you only wrote it once, then that just counts as one correct prediction.
    • The activity is to determine whether you have ESP or not. What is the statistical model under the null? What is the statistical model under the alternative? Try calculating a \(p\)-value for the test.

Apr 23:

  • Recap of Examples 6.2.1 and 6.2.2: What a hypothesis test is really doing again? How do the definitions in the book look like in the example?
  • It is possible to consider two-sided alternatives, but be careful about how you allocate \(\alpha\) in the lower and the upper tails. For the normal case, the symmetry makes things easy.
  • Weird example about FastBurger: How do you setup something that is outside the normal and the binomial case. Sometimes, the problem or context will make decision rule available. Minimizing the sum of the probabilities of both types of errors may be difficult.
  • Just like in estimation, there could be many decision rules out there. We typically will choose the one that has good control over Type I error and has the highest power for a broad range of alternatives.
  • Pay attention to Example 6.4.1, which I think is a very useful way to think about how to plan and design experiments or studies for the purpose of discovery and detection.
  • In some sense, the course is almost finished because the remaining chapters for the course involve different types of hypothesis tests. The most unique involve tests about the goodness of fit and independence of random variables.
  • Try the following exercise, where \(\sigma\) is unknown. Here you have to use an asymptotically pivotal quantity like \(\dfrac{\overline{Y}-\mu}{S/\sqrt{n}}\overset{d}{\to} N\left(0,1\right)\) to test the claim: Suppose a senator introduces a bill to simplify the tax code. The senator claims that his bill is revenue-neutral. This means that tax revenues will stay the same. Suppose the Treasury Department will evaluate the senator’s claim. The Treasury Department has more than a million representative set of tax returns. An employee from the Treasury Department chooses a random sample of 100 tax returns from this tax file. The employee will then recompute the taxes to be paid under the simplified tax code and compare it with the taxes paid under the old tax code. The employee finds that the sample average of the differences obtained from the 100 tax files was -219 dollars and that the sample standard deviation of the differences is 752 dollars. How would you use the data to evaluate whether there is support for the senator’s claim or for the employee’s claim?
  • Revisited our activity: How do you setup the problem? Every student has their own \(p\)-value. With enough students, we can find a rejection of the null of no ESP even if most would agree that ESP is hard to believe.

Apr 25:

  • Large-sample tests in the binomial case are easy to implement but suffers from approximation problems given the discrete nature of the data (hence the need for a continuity correction), and the sometimes weird combinations of \(n\) and \(p\) may produce problems (see Brown, Cai, and DasGupta (2001)). One way to check is to use simulation. Don’t rely too much on rules of thumb.
  • Exact tests in the binomial case are more complicated: Constructing a decision rule for one-sided or two-sided tests require indirect calculations (specifically, you need a tabulation of the probability mass function under the null). It is also difficult to exactly attain the desired Type I error rate (see Example 6.3.1, Exercise 6.3.2 where you have two possible decision rules, Exercise 6.3.8 for a randomized decision rule but the idea is harder to use for Exercise 6.3.2). R code used for Exercise 6.3.2:
# Tabulate pmf under the null
dbinom(0:35, 35, 0.67)
# Figure out possible critical values
# Lower threshold
sum(dbinom(0:17, 35, 0.67)) 
# or 
pbinom(17, 35, 0.67)
# Upper threshold
sum(dbinom(29:35, 35, 0.67)) 
# or
pbinom(28, 35, 0.67, lower.tail = FALSE)
# Another set of thresholds
pbinom(18, 35, 0.67)
pbinom(29, 35, 0.67, lower.tail = FALSE)
  • Exact tests in the binomial case are more complicated: The one-sided \(p\)-value may be direct to calculate, but the two-sided \(p\)-value is not, as explained in class. You simply do not have the symmetry you enjoy in the normal case.
  • Exact tests for non-normal, non-binomial cases are even more complicated. That is why if you notice the exercises were always looking at sample sizes equal to 1. Work on Examples 6.4.2, 6.4.3, and 6.4.4 so that you can see the ingredients you need to be able to construct a decision rule. Example 6.4.4 is more curious for reasons discussed in class.
  • Pay attention to the gamma distribution whose story extends the exponential case. We will see this distribution again in Chapter 7.
  • Pay attention to Exercises 6.4.21 and 6.4.22: These require your ability to construct the distribution of the sum and the product of two IID random variables. But answering these using the computer is actually much easier! Try it.
  • Introduce the idea behind why the likelihood function could be a reasonable starting point for constructing decision rules.

Apr 27:

  • Took some time explaining the value and possible complications of hand calculations for Exercises 6.4.21 and 6.4.22. Hardest thing to do is to calculate integrals properly. Pay attention to cases. But these two exercises are simpler to answer with R, which leads to simulated critical values. The important thing is to know how to simulate the null distribution and then find the suitable critical value for your test. R code can be found below:
# Implement Exercise 6.4.21 in R
# Number of simulations
nsim <- 10^4
# Simulated distribution of Y1+Y2 under the null theta=2
# If you do not impose the null, then it would be difficult to simulate!
sumy <- replicate(nsim, sum(runif(2, 0, 2)))
# Show histogram 
# You need the point at which 5% of the probability is below that point based on this histogram
hist(sumy, freq = FALSE)
# Manual calculations
temp <- hist(sumy, freq = FALSE)
# Points on the horizontal axis
# Densities (heights)
# We did trial and error in class
# Simulated critical value will be different every time (but close enough to each other)
# Here is a shortcut for the simulated critical value
quantile(sumy, 0.05)
  • Motivate once again why the likelihood function is a good starting point for testing hypotheses: Likelihood function is the probability of observing the data as a function of the parameter. When there is a claim about a parameter, we can evaluate whether this claim makes it more likely to observe the data we have. Higher likelihood values would indicate support for the claim. But it can be harder to implement this intuition when the hypotheses are not simple, for example, if the alternative is an interval of values.

  • Three objects are related to the likelihood function: the MLE, the score, and the likelihood function itself. There is a testing approach for each of this. The simplest to implement is possibly a direct test using the MLE. It is almost automatic. The score approach uses the fact that the score has zero mean and variance equal to the Fisher information. The likelihood function is the one emphasized in the book. Under quadraticity of the log-likelihood, \(-2\) times the difference between the log-likelihood at the MLE \(\widehat{\theta}\) and the log-likelihood at the claimed value \(\theta_0\) (NOT necessarily the true value!) is asymptotically pivotal. In particular, the asymptotic distribution is actually the square of a standard normal. This is a convenient motivation to study Chapter 7!

  • More importantly, the difference between log-likelihoods is ultimately related to the likelihood ratio. This led to the discussion of the generalized likelihood ratio test (GLRT). The test statistic involves a ratio of the supremum of the likelihood function under the null parameter space to the supremum of the likelihood function under the alternative parameter space. They can be complicated to compute, as illustrated in class and as seen in the repeated focus of the exercises on two-sided alternatives.

  • I used supremum instead of maximum (LM sneaks in an equal sign under the alternative, but that is ok). This is to prevent those situations where the maximizer is at the boundaries.

  • We worked on the uniform case discussed in the book. This example is interesting because you have to be careful with constructing the likelihood function and the likelihood ratio. In addition, the likelihood ratio is a function of the fraction \(W=Y_{\mathsf{max}}/\theta_0\). It turns out that this random variable is a pivotal quantity. \(W\) has density equal to \(f_W\left(w\right)=nw^{n-1}\) where \(0<w<1\). Notive that the density does not have any unknown parameters! Thus, you can use the distribution of \(W\) to find the appropriate threshold. This distribution has shown up many times in the exercises of the book! It is actually a member of the family of Beta distributions. To learn more, see how these univariate distributions are related to each other.

  • We ended with an in-class exercise on Exercise 6.5.2. Time yourself! Here is my solution:

    • A likelihood function is given by \[L\left(\lambda\right)=\prod_{i=1}^{10}\lambda\exp\left(-\lambda Y_i\right)=\lambda^{10}\exp\left(-\lambda\sum_{i=1}^{10}Y_i\right).\]
    • The related log-likelihood function is given by \[\mathcal{l}\left(\lambda\right)=10\ln\lambda-\lambda\sum_{i=1}^{10}Y_i.\] You can choose to simplify things a bit if you want: \[\mathcal{l}\left(\lambda\right)=10\ln\lambda-10\lambda \overline{Y}.\]
    • The maximized likelihood function under the null \(\lambda=\lambda_0\) (a singleton) is given by \[L\left(\lambda_0\right)=\lambda_0^{10}\exp\left(-10\lambda_0 \overline{Y}\right).\]
    • The MLE \(\widehat{\lambda}_{\mathsf{MLE}}\) is the solution to \[\mathcal{l}^\prime\left(\widehat{\lambda}_{\mathsf{MLE}}\right)=0 \Rightarrow \frac{10}{\widehat{\lambda}_{\mathsf{MLE}}}-10\overline{Y}=0.\] So, \(\widehat{\lambda}_{\mathsf{MLE}}=\dfrac{1}{\overline{Y}}\).
    • The maximized likelihood function under the alternative \(\lambda\neq\lambda_0\) is given by \[L\left(\widehat{\lambda}_{\mathsf{MLE}}\right)=\widehat{\lambda}_{\mathsf{MLE}}^{10}\exp\left(-10\widehat{\lambda}_{\mathsf{MLE}}\overline{Y}\right)=\left(\dfrac{1}{\overline{Y}}\right)^{10}\exp(-10).\]
    • Thus, \[\Lambda = \left(\lambda_0\overline{Y}\right)^{10}\exp\left(-10\left(\lambda_0\overline{Y}-1\right)\right)\] is the required generalized likelihood ratio. To find the required integral which would have to be evaluated to determine the critical value \(\lambda^*\), this requires more work.But the idea is to find \(\lambda^*\) to solve the following equality: \[\mathbb{P}\left(\Lambda \leq \lambda^*\big|\lambda=\lambda_0\right)=0.05\]
  • For Exercise 6.5.2, that was all that was required. But it is a good idea to explore the random variable \(W=\lambda_0\overline{Y}\). In a similar manner as what you have seen for the uniform case, try to determine the distribution of \(W\). It is connected to Chapter 7 as well and uses Section 4.6.

  • Note that for Exercise 6.5.2, we don’t have to worry too much here because we are in the two-sided case. Try the one-sided alternative \(\lambda < \lambda_0\) as an exercise.

May 4:

  • Showed some of the difficulties encountered to really answer Exercise 6.5.2: Finding the required integral takes work. The idea is to let \(W=\lambda_0\overline{Y}\) in the generalized likelihood ratio. Afterwards, you need to obtain the distribution of \(W\). It can be shown that \(W\) has a gamma distribution. For some reason, I did not show this in class. I am not sure why. I will fix this when we see each other again.
  • We had a quiz and the answers are here.

Mar 23, 28, 30, Apr 4, 6, 11, and 13


  1. General-purpose estimation principles: Maximum likelihood, method of moments, and least squares
  2. Sufficiency as another guiding principle for estimation
  3. Maximum likelihood estimation in R

Activities: Regrettably, none for the moment.

Assigned readings: Estimating Parameters (Section 5.2), Minimum-Variance Estimators: The Cramér-Rao Lower Bound (Section 5.5), Sufficient Estimators (Section 5.6), Notes on likelihood functions, Properties of MLE, Computational examples and issues, Method of moments, Other topics related to the likelihood, Example 5.4.2


  • Work out Examples 5.2.1 (Poisson case), 5.2.2 (Gamma case), 5.2.5 (Normal case). The latter has also been worked out in the notes.
  • The special examples are Examples 5.2.3 (truncated Poisson with fused outcomes) and 5.2.4 (looks exponential, but the support depends on the unknown parameter!). For the latter, make sure to put it in your “zoo” of weird examples like the uniform case. For 5.2.3, fused outcomes mean that 4, 5, and 6 are taken together. How does this affect the log-likelihood?
  • Work out Case Study 5.2.1 (Geometric case) on modeling ups and downs of a financial market.
  • Exercises 5.2.1, 5.2.3: Pretty standard, plot the log-likelihood using R, use R to find optima as well.
  • Exercise 5.2.2: Not very standard, but checks whether you understood what maximum likelihood really is. \(p\) takes on only two values, which is different from 5.2.1 where \(p\) could take on any value between 0 and 1.
  • Exercise 5.2.4: Looks like a Poisson?
  • Exercises 5.2.5, 5.2.6, 5.2.7: Pretty standard, but is there a special name for this density? If there is, determine whether you can use a built-in function in R. If there is not, then you have the code the log-likelihood from scratch. Also ask yourself: How does one generate random draws from this distribution? Pay attention to 5.2.7, as it is an example where you can set up a model for data which involve proportions.
  • Exercises 5.2.8 and 5.2.9: These are similar to the examples in LM. But pay attention to how to account for fused outcomes. Try finding optima in R.
  • Exercises 5.2.10 to 5.2.12: Look at the support.
  • Exercises 5.2.13 to 5.2.16: These exercises really involve two parameters, but the problem fixes one of the parameters and takes them as known.
  • For each of the exercises involving MLE: calculate the expected value of the score and the Fisher information.
  • Homework 03 (typo fixed, thanks to Ziyi!), suggested solutions
  • Exercises 5.2.17, 5.2.18, 5.2.21, 5.2.23, 5.2.25, 5.2.26: Here you might have to derive the required moments first and make sure it is linked to \(\theta\). Also pay attention to how many moments you would need. For 5.2.18, the distribution may be familiar. Look up the Beta distribution.
  • Exercise 5.2.19, 5.2.20, 5.2.22: Compare with the corresponding MLE.
  • Exercise 5.2.24: Answered already in the notes.
  • All exercises in Section 5.5: Notice the common things about them (notice that the sample mean shows up again and again). Exercise 5.5.7 is already answered in the notes for MLE.
  • Exercise 5.6.2: Use what we did in class and find a particular dataset which will produce a conditional probability which still depends on \(p\).
  • Exercise 5.6.6: Can you use the conditioning argument here? Consider \(W=\exp\left(\log W \right)\). Pay attention to the supports!!
  • Exercises 5.6.1, 5.6.4, 5.6.5 are typical exercises. If possible, use all the approaches found in the notes so that you can practice all of them to determine which are easy and which are difficult approaches.
  • Exercises 5.6.7 and 5.6.8 are the type of the exercises where the support depends on the unknown parameter.
  • Exercises 5.6.9 to 5.6.11: 5.6.9 is extremely important in how statistics has developed around the 70s and 80s. 5.6.10 and 5.6.11 are specific cases of 5.6.9. Try them!

Mar 23:

  • Review of joint densities and the specific case of IID normal random variables (15 minutes)
  • Point out the sum of squares algebra (connections with Chapter 12) and the joint density depending solely on \(\overline{y}\) and \(s^2\) (connections with Section 5.6 and privacy) (5 minutes)
  • What is a likelihood function? Key idea, the tricky part in the continuous case, maximizing the log-likelihood instead of the likelihood directly (25 minutes)

Mar 28:

  • Why Homework 02 was assigned (15 minutes)
  • Recap of the setup and the algorithm for calculating MLE by hand (15 minutes)
  • Calculating MLE using R: How does the computer look for optima? Numerical issues abound, especially in higher dimensions. If interested, look into numerical optimization and specifically focus on gradient descent methods. (15 minutes)
  • From the simple IID \(N\left(\mu,\sigma^2\right)\) example, there are many points of note which make MLE an attractive approach to estimation but also pitfalls of the approach. (25 minutes)
  • Visual demonstration for the case of IID \(N\left(\mu,\sigma^2\right)\) where \(\sigma^2\) is known: R code used in class is displayed below (new commands involve sapply(), plot(), seq(), abline()). The first and second derivatives of the log-likelihood play a big role, especially in establishing the nice statistical properties of MLE! (15 minutes)
  • Pointing out a problem with a comment in LM about the likelihood not being a function of the data, some of the examples to work out (5 minutes)
# Draw random numbers from N(1, 4)
n <- 5
mu <- 1
sigma.sq <- 4 # change this to 40 if you want to look at the curvature of the log-likelihood
y <- rnorm(n, mu, sqrt(sigma.sq))
# Set up MINUS the log-likelihood (reused code from example)
# BUT sigma.sq is known, rather than a parameter to be estimated
mlnl <- function(par)
  sum(-dnorm(y, mean = par, sd = sqrt(sigma.sq), log = TRUE))
# New part where I want a plot of the log-likelihood for mu
# Place a grid of values for mu, adjust length.out if you wish
mu.val <- seq(-10, 10, length.out = 1000)
# Compute the log-likelihood at every value in mu.val
# The MINUS sign is to enable me to display the log-likelihood rather than the negative of the log-likelihood
log.like <- -sapply(mu.val, mlnl)
# Create a plot: vertical axis are the log-likelihood values, horizontal axis are the values of mu
# type = "l" is to connect the dots, try removing it
plot(mu.val, log.like, type = "l")
# Draw a vertical line at 1, hence v = 1
# Make sure the line is colored red and is dotted
abline(v = 1, col = "red", lty = "dotted")

Mar 30:

  • Weird example discussed in LM Example 5.2.4, be careful of geometric distribution in LM Case Study 5.2.1, pay attention to fused outcomes (LM Example 5.2.3, LM Exercises 5.2.8 and 5.2.9) when forming the likelihood, practical examples where using nlm() blindly may become problematic (45 minutes)
  • R code from Mar 28 was cleaned up to make things look nicer. Discuss what the R code is doing and what the picture is trying to tell you. Differentiate between the log-likelihood and the MLE (which are both random) versus the expected log-likelihood and the truth (which are fixed). Introduce new notation and intuition for looking at the first two derivatives of the log-likelihood. (15 minutes)
  • Look into the properties of the score function and the concept of Fisher information. (15 minutes)
  • How the quadratic shape of the log-likelihood is responsible for what makes MLE work in a broad number of settings (15 minutes)
# Draw random numbers from N(1, 4)
n <- 5
mu <- 1
sigma.sq <- 4 # change this to 40 if you want to look at the curvature of the log-likelihood
y <- rnorm(n, mu, sqrt(sigma.sq))
# Set up MINUS the log-likelihood (reused code from example)
# BUT sigma.sq is known, rather than a parameter to be estimated
mlnl <- function(par)
  sum(-dnorm(y, mean = par, sd = sqrt(sigma.sq), log = TRUE))
# New part where I want a plot of the log-likelihood for mu
# Place a grid of values for mu, adjust length.out if you wish
mu.val <- seq(-10, 10, length.out = 1000)
# Compute the log-likelihood at every value in mu.val
# The MINUS sign is to enable me to display the log-likelihood rather than the negative of the log-likelihood
log.like <- -sapply(mu.val, mlnl)
# Create a plot: vertical axis are the log-likelihood values, horizontal axis are the values of mu
# type = "l" is to connect the dots, try removing it
# fix the vertical axis for a nicer effect
plot(mu.val, log.like, type = "l", ylim = c(-200, 0))
# Draw a vertical line at the MLE for mu
# Make sure the line is colored blue-ish and dotted
abline(v = nlm(mlnl, 0)$estimate, col = "#0072B2", lty = "dotted")
# Draw a vertical line at mu, hence v = mu
# Make sure the line is colored orange-ish and is dotted
abline(v = mu, col = "#D55E00", lty = "dotted")
# Draw a curve representing the expected log-likelihood
curve(-n/2*log(2*pi)-n/2*log(sigma.sq)-n/2-n/(2*sigma.sq)*(1-x)^2, add = TRUE, col = "#CC79A7", lty = "dashed", lwd = 3)

Apr 4:

  • Recap: How to apply the whole toolkit provided by MLE and focused on two ex