A Template

Author

Andrew Pua

Published

May 18, 2023

What is this template?

In your final report, you should remove this section, as the material here is only for you to understand how to do things in Quarto.

This template is a radically simplified version of Heiss (2022). What you see now is a rendered version of a qmd file. This rendered version is an HTML file similar to the HTML pages you see at the course website. A qmd file is a text file which contains how the report is going to be formatted, how the elements of the report are going to be “weaved” together to form a final readable form. In the case of this template, it is an HTML file. You have to submit as part of your project so that I can run it on my own computer and generate the same HTML file that you did.

You have to install RStudio. You do not need to install R anymore if you have installed it before. Afterwards, you have to install Quarto. Ater installing Quarto, you may want to look at a short tutorial here. If you want to follow that tutorial on your computer, you may have to install the specified packages.

But you can also just open the file template-final-report.qmd in RStudio and click on Render. Of course, you have to have the file references.bib for the rendering to work.

Reading more about Quarto

You might find the Quarto guide on the following topics to be of interest:

Authoring: probably the highest priority
Citations and footnotes
Cross-references and hyperlinks
Showing and hiding code
Wrapping code
Themes: not really a priority, but this is quite a rabbit hole
HTML reference

How to incorporate R calculations into your document

You can weave R code into your document.

n_lights <- 2 + 2
n_lights

[1] 4

You can also simply display the code without asking R to evaluate the code.

n_lights <- 2 + 2
n_lights

You can load data, calculate things, make plots.

# Code used in class, modify accordingly for your situation
cookie <- read.csv("https://mathstat.neocities.org/data_cookie.csv")
# Shows the entire dataset
cookie

      X id box scale chipnum weight numchoc
1     1  1   3    NA       1  10.23      24
2     2  1   3    NA       2  10.65      18
3     3  2   1     7       1  10.50      16
4     4  2   1     7       2  11.03      18
5     5  3   1     5       1  10.82      19
6     6  3   1     5       2  10.57      14
7     7  4   1     4       1   9.97      13
8     8  4   1     4       2  10.57      16
9     9  5   2     4       1  11.44      14
10   10  5   2     4       2  11.24      17
11   11  6   4     4       1  11.08      17
12   12  6   4     4       2  10.12      12
13   13  7   1     4       1  10.69      15
14   14  7   1     4       2   9.47      13
15   15  8   2     7       1  11.60       9
16   16  8   2     7       2  10.04      13
17   17  9   2     7       1  10.91      18
18   18  9   2     7       2  10.52      14
19   19 10   3     2       1  10.34      17
20   20 10   3     2       2  10.48      17
21   21 11   2    NA       1  10.17      15
22   22 11   2    NA       2  11.00      19
23   23 12   1     4       1  10.74       9
24   24 12   1     4       2  11.22       8
25   25 13   1     5       1  10.37      23
26   26 13   1     5       2  11.04      17
27   27 14   1     5       1  11.11      14
28   28 14   1     5       2  11.22      18
29   29 15   1     4       1  10.85      10
30   30 15   1     4       2  10.28      15
31   31 16   3     2       1  10.72      17
32   32 16   3     2       2  11.25      15
33   33 17   2     4       1  11.22      13
34   34 17   2     4       2  10.83      12
35   35 18   2     7       1  10.14      16
36   36 18   2     7       2  10.11      10
37   37 19   2     7       1  10.54      10
38   38 19   2     7       2  11.29       9
39   39 20   1     5       1  10.53      15
40   40 20   1     5       2  10.05      16
41   41 21   2     7       1  10.27      19
42   42 21   2     7       2  11.36      10
43   43 22   1     5       1  10.07      18
44   44 22   1     5       2  10.34      15
45   45 23   1     4       1  10.98       9
46   46 23   1     4       2  10.51      10
47   47 24   1     4       1  11.05      15
48   48 24   1     4       2  10.25      16
49   49 25   4     4       1  10.18      13
50   50 25   4     4       2  10.66      14
51   51 26   3     2       1   9.39      18
52   52 26   3     2       2  10.07      18
53   53 27   4     6       1  10.48      14
54   54 27   4     6       2  10.47      11
55   55 28   2     4       1  11.64      19
56   56 28   2     4       2   9.86      14
57   57 29   3     1       1  10.44      22
58   58 29   3     1       2  10.75      18
59   59 30   3     2       1  10.50      15
60   60 30   3     2       2   9.94      18
61   61 31   4    NA       1  10.95      15
62   62 31   4    NA       2  10.73       8
63   63 32   4    NA       1  10.63      12
64   64 32   4    NA       2  10.49      14
65   65 33   3     3       1  10.07      15
66   66 33   3     3       2  10.53      13
67   67 34   3    NA       1  10.21      17
68   68 34   3    NA       2  10.68      12
69   69 35   2     7       1  10.22      27
70   70 35   2     7       2   9.58      23
71   71 36   2     7       1  11.30      21
72   72 36   2     7       2  10.94      19
73   73 37   3     2       1  10.52      16
74   74 37   3     2       2  10.17      17
75   75 38   3     1       1  10.13      15
76   76 38   3     1       2  11.13      16
77   77 39   3     1       1  10.06      14
78   78 39   3     1       2  11.32      14
79   79 40   3     1       1  11.57      22
80   80 40   3     1       2  10.19      16
81   81 41   4    NA       1  10.97      13
82   82 41   4    NA       2   9.53      10
83   83 42   1     5       1  10.42      12
84   84 42   1     5       2  10.08      11
85   85 43   4    NA       1  10.38      20
86   86 43   4    NA       2  10.65      11
87   87 44   1     5       1  11.17      13
88   88 44   1     5       2   9.74      12
89   89 45   1     5       1  11.20      17
90   90 45   1     5       2   9.45      13
91   91 46   1     5       1  13.85      15
92   92 46   1     5       2  13.98      15
93   93 47   3     1       1  11.36      19
94   94 47   3     1       2  11.31      19
95   95 48   2     4       1  10.04      15
96   96 48   2     4       2  11.74      14
97   97 49   4     6       1  10.62      15
98   98 49   4     6       2  11.37      19
99   99 50   3     2       1  11.08      19
100 100 50   3     2       2  10.72      20
101 101 51   4     1       1  10.85      16
102 102 51   4     1       2   9.70      17
103 103 52   4    NA       1   9.29      15
104 104 52   4    NA       2  10.31      17
105 105 53   4     6       1  10.52      10
106 106 53   4     6       2  10.48      20
107 107 54  NA     7       1  11.44      14
108 108 54  NA     7       2   9.79      12
109 109 55   3     1       1   9.99      17
110 110 55   3     1       2  10.32      19
111 111 56   4    NA       1  10.06      10
112 112 56   4    NA       2  10.41      11
113 113 57   4     4       1   9.69      14
114 114 57   4     4       2  10.76      13
115 115 58   4    NA       1  10.85       9
116 116 58   4    NA       2  10.79      10
117 117 59   2     4       1  10.11      16
118 118 59   2     4       2   9.41      16
119 119 60   2     4       1  11.08      18
120 120 60   2     4       2  11.80      20
121 121 61   2     4       1  11.20      17
122 122 61   2     4       2  10.61      13
123 123 62   3     2       1   9.44       9
124 124 62   3     2       2  10.96      17
125 125 63   4    NA       1  10.69       7
126 126 63   4    NA       2  10.42      12
127 127 64   4     6       1  10.40      15
128 128 64   4     6       2  10.06      20

# Presents histogram of number of chocolate chip cookies
hist(cookie$numchoc)
# Store the histogram as an object
temp <- hist(cookie$numchoc)

# What is stored in temp?
temp

$breaks
 [1]  6  8 10 12 14 16 18 20 22 24 26 28

$counts
 [1]  3 15 12 24 28 24 15  3  3  0  1

$density
 [1] 0.01171875 0.05859375 0.04687500 0.09375000 0.10937500 0.09375000
 [7] 0.05859375 0.01171875 0.01171875 0.00000000 0.00390625

$mids
 [1]  7  9 11 13 15 17 19 21 23 25 27

$xname
[1] "cookie$numchoc"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

# Confidence interval calculation in HW02
ybar <- mean(cookie$numchoc)
J <- length(cookie$numchoc)
c <- sqrt(20)
c(2*ybar + c^2/J - sqrt(4*ybar*c^2/J + c^4/J^2), 2*ybar + c^2/J + sqrt(4*ybar*c^2/J + c^4/J^2))/2

[1] 13.64160 16.71778

Here I repeatedly generate \(10^4\) artificial datasets from \(\mathsf{U}\left(0,\theta\right)\). Set \(n\in \{5, 500\}\) and \(\theta=1\). I used apply(), min(), max(), and runif() commands. I compute the estimators \(\widehat{\theta}_1\), \(\widehat{\theta}_2\), and \(\widehat{\theta}_3\) to the artificial datasets, display histograms, and show the means and standard deviations in a table.

set.seed(20230519) # for reproducibility, exact results even in a different computer
theta <- 1
nsim <- 10^4 # number of realizations to be obtained
# repeatedly obtain realizations
ymat <- replicate(nsim, runif(5, min = 0, max = theta))
# calculate estimates
theta1hat.5 <- 6*apply(ymat, 2, min)
theta2hat.5 <- 2*colMeans(ymat)
theta3hat.5 <- 6/5*apply(ymat, 2, max)
# put a 1x3 canvas for the case of n=5
par(mfrow=c(1,3))
hist(theta1hat.5, freq = FALSE)
hist(theta2hat.5, freq = FALSE)
hist(theta3hat.5, freq = FALSE)

# n=500
ymat <- replicate(nsim, runif(500, min = 0, max = theta))
# calculate estimates
theta1hat.500 <- 501*apply(ymat, 2, min)
theta2hat.500 <- 2*colMeans(ymat)
theta3hat.500 <- 501/500*apply(ymat, 2, max)
# put a 1x3 canvas for the case of n=500
par(mfrow=c(1,3))
hist(theta1hat.500, freq = FALSE)
hist(theta2hat.500, freq = FALSE)
hist(theta3hat.500, freq = FALSE)

	\(\widehat{\theta}_1\)	\(\widehat{\theta}_2\)	\(\widehat{\theta}_3\)
\(n=5\)
Mean	1	1	1
SD (times 10)	8.43	2.57	1.69
\(n=500\)
Mean	1.02	1	1
SD (times 10)	10.05	0.26	0.02

So, all estimators of \(\theta\) are unbiased. The variances of the estimators are:

\[\mathsf{Var}\left(\widehat{\theta}_1\right)=\frac{n}{n+2}\theta^2,\ \mathsf{Var}\left(\widehat{\theta}_2\right) =\frac{1}{3n}\theta^2,\ \mathsf{Var}\left(\widehat{\theta}_3\right)=\frac{1}{n\left(n+2\right)}\theta^2\] Therefore, we expect the SD of the sampling distributions of the estimators to be \(\displaystyle\sqrt{\frac{n}{n+2}}\), \(\displaystyle\sqrt{\frac{1}{3n}}\), \(\displaystyle\sqrt{\frac{1}{n\left(n+2\right)}}\). The results from the Monte Carlo simulation, which were multiplied by 10 for presentation purposes, match the theoretical results quite well. In particular, we should see results close to 8.45, 2.58, and 1.69 for \(n=5\). We should also see results close to 9.98, 0.26, and 0.02 for \(n=500\).

How to add mathematics into the document

You can definitely mix text and mathematics. For example:

Let \(Y_1,\ldots,Y_n\) be IID \(\mathsf{U}\left(0,\theta\right)\), where \(\theta>0\). The common pdf of these random variables is given by \[f\left(y\right)=\begin{cases}\dfrac{1}{\theta} & \ \ \ 0\leq y\leq \theta\\ 0 & \ \ \ \mathrm{otherwise} \end{cases},\] the common mean is \(\theta/2\), and the common variance is \(\theta^2/12\). The estimand for this exercise is \(\theta\).

You can have single line displayed mathematical expressions:

\[\mathbb{E}\left[\left(\widehat{\theta}-\theta\right)^2\right]=\mathbb{E}\left[\left(\widehat{\theta}-\mathbb{E}\left(\widehat{\theta}\right)\right)^2\right]+\left(\mathbb{E}\left(\widehat{\theta}\right)-\theta\right)^2=\mathsf{Var}\left(\widehat{\theta}\right)+\left(\mathbb{E}\left(\widehat{\theta}\right)-\theta\right)^2.\]

You can have multi-line displayed mathematical expressions:

\[\begin{eqnarray}\mathbb{P}\left(\left|\overline{Y}-\mathbb{E}\left(\overline{Y}\right)\right|\geq \frac{c\sigma}{\sqrt{n}}\right) \leq \frac{\mathsf{Var}\left(\overline{Y}\right)}{c^2\sigma^2/n} &\Rightarrow & \mathbb{P}\left(\left|\frac{\overline{Y}-\mu}{\sigma/\sqrt{n}}\right|\geq c\right) \leq \frac{1}{c^2} \\ &\Rightarrow & \mathbb{P}\left(\left|\frac{\overline{Y}-\mu}{\sigma/\sqrt{n}}\right|\leq c\right) \geq 1-\frac{1}{c^2}.\end{eqnarray}\]

You can mix with lists too:

I collect the following integration results below to tidy up the calculations later.

For positive integers \(k\), we have \[\int \left(1-\frac{s}{\theta}\right)^k \,\, ds = -\frac{\theta}{k+1}\left(1-\frac{s}{\theta}\right)^{k+1}\] This result is obtained using integration by substitution.
For positive integers \(m\) and \(k\), we have \[\int s^m\left(1-\frac{s}{\theta}\right)^k \,\, ds = -\frac{\theta s^m}{k+1}\left(1-\frac{s}{\theta}\right)^{k+1}+\frac{m\theta}{k+1} \int s^{m-1}\left(1-\frac{s}{\theta}\right)^{k+1}\,\, ds\]

You can also have tables with mathematics:

Source	Sum of squares (SS)	Degrees of freedom (df)	Mean sum of squares (MS)	Expected MS
Mean	\(V_1^2=n\overline{Y}^2\)	\(1\)	\(n\overline{Y}^2\)	\(\sigma^2+n\mu^2\)
Deviations	\(\displaystyle\sum_{i=2}^n V_i^2=\displaystyle\sum_{i=1}^n\left(Y_i-\overline{Y}\right)^2\)	\(n-1\)	\(S^2\)	\(\sigma^2\)
Total	\(\displaystyle\sum_{i=1}^n V_i^2=\displaystyle\sum_{i=1}^n Y_i^2\)	\(n\)

How to add and cite references

Notice that in the qmd file there is a line bibliography: references.bib. What this means is that there is a separate file containing bibliographic information. Your job is to maintain this bib file and populate it with entries which you will be eventually citing in your report. The format of the entries is the BibTeX format.

Fortunately, almost every publisher of journal articles has citation tools that make things convenient. For example, when you look up the citation information of Hamermesh and Parker (2005) at this website, you will can click on Cite and find Export citation to BibTeX. Sometimes a file will open in your browser, but most of the time a very small file with a .bib or a .txt file extension will be downloaded. Open this file using a text editor and copy/paste the contents to references.bib. Sometimes it might be a good idea to clean up the contents to make it shorter, but still containing the most important pieces of information including the digital object identifier or DOI.¹ For other publishers, your mileage may vary. I leave this for you to explore. But you got to admit that this gives everyone no excuse for not citing sources.

If you use R packages that were never used in class, please cite them. For example, if you want to cite the multcomp package, then you type print(citation("sandwich"), bibtex=TRUE) into the R console. It will generate a BibTeX entry. Put that entry into references.bib. Do not forget to put a key. Usually these keys are provided or automatically generated. These keys are markers or identifiers which enable you to tie your citation in the qmd file to the entry in references.bib. If this key is not provided or if you want to modify it, you can find it in the first line of every BibTeX entry. This first line typically starts with something like @Manual{Rcite, and for this example, Rcite is the key.

Open the qmd file which produced this HTML file so that you can see how to cite sources and how the keys play a role.

For reproducibility purposes, I would include something like “All analyses were performed using R Statistical Software v4.2.1 (R Core Team 2022) and tables are displayed using stargazer v5.2.3 (Hlavac 2022).

Look at the bottom: Instant reference list!

If you are using IPUMS

If you choose this project option, you may need to install and load the ipumsr package. The purpose of the package is to help you load an IPUMS dataset and to use its built-in commands to help you navigate the data. This may take some time to learn.

The code chunk below is not evaluated in Quarto because of eval: false, but this can be removed.

if (!require("ipumsr")) stop("Reading IPUMS data into R requires the ipumsr package. It can be installed using the following command: install.packages('ipumsr')")

Here I use some commands in ipumsr to load the data:

# Download the XML file from IPUMS and use it to load the definitions of the variables
ddi <- read_ipums_ddi("usa_00008.xml")
# Load the data based on the previous command
data <- read_ipums_micro(ddi)

Here I create a variable age.arrival which may be interesting and I create a subset of the data that I have loaded previously.

# Calculate age at arrival
data$age.arrival <- data$YRIMMIG - data$BIRTHYR
# born in Mexico, household head, living in California, married with spouse present, and white
filtered <- subset(data, BPL == 200 & RELATE == 1 & STATEFIP == 6 & MARST == 1 & RACE == 1)
# total number of observations used
nrow(filtered)
# show distribution of age at arrival
table(filtered$age.arrival)
# show distribution for males and females
table(filtered$age.arrival, filtered$SEX)

The previous code chunk were left unevaluated. To be able to evaluate these code chunks, you have to remove the line #| eval: false and you need access to the XML file usa00008.xml and the data file usa00008.dat.gz.

Preamble

Here you can organize the commands needed to load packages and do other formatting options (if needed).

# To clean R's memory, leave this alone
rm(list=ls())

Introduction and motivation

Here you explain why you have selected the chosen project option. Describe the problem(s) and research question(s) related to your chosen project option. Why do you think the problem(s) and the question(s) are important? What do you think is gained by addressing the problem(s) and/or answering the research question(s)? It is important to express into your own words, your own thoughts, and most preferably your own selected context.

Plan

Here you have to discuss what you plan to do for your report. Again, aim for something in between “a simple project done very well” and “a complicated project with flaws”.

Progress

Main body

Adjust the title to suit your situation. Your chosen project option will eventually determine what will be in this main body. Feel free to use subsections.

Summary

Summarize what you have done here and what you hope you could have done if you had more time.

References

Hamermesh, Daniel S., and Amy Parker. 2005. “Beauty in the Classroom: Instructors’ Pulchritude and Putative Pedagogical Productivity.” Economics of Education Review 24 (4): 369–76. https://doi.org/10.1016/j.econedurev.2004.07.013.

Heiss, Andrew. 2022. “Hikmah Quarto Templates.” https://github.com/andrewheiss/hikmah-academic-quarto.

Hlavac, Marek. 2022. Stargazer: Well-Formatted Regression and Summary Statistics Tables. Bratislava, Slovakia: Social Policy Institute. https://CRAN.R-project.org/package=stargazer.

R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Footnotes

There is a slight problem with some publisher when they export citations to BibTeX. The DOI entry has to modified so that you remove the first part https://doi.org/.↩︎