<- 2 + 2
n_lights n_lights
[1] 4
In your final report, you should remove this section, as the material here is only for you to understand how to do things in Quarto.
This template is a radically simplified version of Heiss (2022). What you see now is a rendered version of a qmd file. This rendered version is an HTML file similar to the HTML pages you see at the course website. A qmd file is a text file which contains how the report is going to be formatted, how the elements of the report are going to be “weaved” together to form a final readable form. In the case of this template, it is an HTML file. You have to submit as part of your project so that I can run it on my own computer and generate the same HTML file that you did.
You have to install RStudio. You do not need to install R anymore if you have installed it before. Afterwards, you have to install Quarto. Ater installing Quarto, you may want to look at a short tutorial here. If you want to follow that tutorial on your computer, you may have to install the specified packages.
But you can also just open the file template-final-report.qmd
in RStudio and click on Render. Of course, you have to have the file references.bib
for the rendering to work.
You might find the Quarto guide on the following topics to be of interest:
You can weave R code into your document.
<- 2 + 2
n_lights n_lights
[1] 4
You can also simply display the code without asking R to evaluate the code.
<- 2 + 2
n_lights n_lights
You can load data, calculate things, make plots.
# Code used in class, modify accordingly for your situation
<- read.csv("https://mathstat.neocities.org/data_cookie.csv")
cookie # Shows the entire dataset
cookie
X id box scale chipnum weight numchoc
1 1 1 3 NA 1 10.23 24
2 2 1 3 NA 2 10.65 18
3 3 2 1 7 1 10.50 16
4 4 2 1 7 2 11.03 18
5 5 3 1 5 1 10.82 19
6 6 3 1 5 2 10.57 14
7 7 4 1 4 1 9.97 13
8 8 4 1 4 2 10.57 16
9 9 5 2 4 1 11.44 14
10 10 5 2 4 2 11.24 17
11 11 6 4 4 1 11.08 17
12 12 6 4 4 2 10.12 12
13 13 7 1 4 1 10.69 15
14 14 7 1 4 2 9.47 13
15 15 8 2 7 1 11.60 9
16 16 8 2 7 2 10.04 13
17 17 9 2 7 1 10.91 18
18 18 9 2 7 2 10.52 14
19 19 10 3 2 1 10.34 17
20 20 10 3 2 2 10.48 17
21 21 11 2 NA 1 10.17 15
22 22 11 2 NA 2 11.00 19
23 23 12 1 4 1 10.74 9
24 24 12 1 4 2 11.22 8
25 25 13 1 5 1 10.37 23
26 26 13 1 5 2 11.04 17
27 27 14 1 5 1 11.11 14
28 28 14 1 5 2 11.22 18
29 29 15 1 4 1 10.85 10
30 30 15 1 4 2 10.28 15
31 31 16 3 2 1 10.72 17
32 32 16 3 2 2 11.25 15
33 33 17 2 4 1 11.22 13
34 34 17 2 4 2 10.83 12
35 35 18 2 7 1 10.14 16
36 36 18 2 7 2 10.11 10
37 37 19 2 7 1 10.54 10
38 38 19 2 7 2 11.29 9
39 39 20 1 5 1 10.53 15
40 40 20 1 5 2 10.05 16
41 41 21 2 7 1 10.27 19
42 42 21 2 7 2 11.36 10
43 43 22 1 5 1 10.07 18
44 44 22 1 5 2 10.34 15
45 45 23 1 4 1 10.98 9
46 46 23 1 4 2 10.51 10
47 47 24 1 4 1 11.05 15
48 48 24 1 4 2 10.25 16
49 49 25 4 4 1 10.18 13
50 50 25 4 4 2 10.66 14
51 51 26 3 2 1 9.39 18
52 52 26 3 2 2 10.07 18
53 53 27 4 6 1 10.48 14
54 54 27 4 6 2 10.47 11
55 55 28 2 4 1 11.64 19
56 56 28 2 4 2 9.86 14
57 57 29 3 1 1 10.44 22
58 58 29 3 1 2 10.75 18
59 59 30 3 2 1 10.50 15
60 60 30 3 2 2 9.94 18
61 61 31 4 NA 1 10.95 15
62 62 31 4 NA 2 10.73 8
63 63 32 4 NA 1 10.63 12
64 64 32 4 NA 2 10.49 14
65 65 33 3 3 1 10.07 15
66 66 33 3 3 2 10.53 13
67 67 34 3 NA 1 10.21 17
68 68 34 3 NA 2 10.68 12
69 69 35 2 7 1 10.22 27
70 70 35 2 7 2 9.58 23
71 71 36 2 7 1 11.30 21
72 72 36 2 7 2 10.94 19
73 73 37 3 2 1 10.52 16
74 74 37 3 2 2 10.17 17
75 75 38 3 1 1 10.13 15
76 76 38 3 1 2 11.13 16
77 77 39 3 1 1 10.06 14
78 78 39 3 1 2 11.32 14
79 79 40 3 1 1 11.57 22
80 80 40 3 1 2 10.19 16
81 81 41 4 NA 1 10.97 13
82 82 41 4 NA 2 9.53 10
83 83 42 1 5 1 10.42 12
84 84 42 1 5 2 10.08 11
85 85 43 4 NA 1 10.38 20
86 86 43 4 NA 2 10.65 11
87 87 44 1 5 1 11.17 13
88 88 44 1 5 2 9.74 12
89 89 45 1 5 1 11.20 17
90 90 45 1 5 2 9.45 13
91 91 46 1 5 1 13.85 15
92 92 46 1 5 2 13.98 15
93 93 47 3 1 1 11.36 19
94 94 47 3 1 2 11.31 19
95 95 48 2 4 1 10.04 15
96 96 48 2 4 2 11.74 14
97 97 49 4 6 1 10.62 15
98 98 49 4 6 2 11.37 19
99 99 50 3 2 1 11.08 19
100 100 50 3 2 2 10.72 20
101 101 51 4 1 1 10.85 16
102 102 51 4 1 2 9.70 17
103 103 52 4 NA 1 9.29 15
104 104 52 4 NA 2 10.31 17
105 105 53 4 6 1 10.52 10
106 106 53 4 6 2 10.48 20
107 107 54 NA 7 1 11.44 14
108 108 54 NA 7 2 9.79 12
109 109 55 3 1 1 9.99 17
110 110 55 3 1 2 10.32 19
111 111 56 4 NA 1 10.06 10
112 112 56 4 NA 2 10.41 11
113 113 57 4 4 1 9.69 14
114 114 57 4 4 2 10.76 13
115 115 58 4 NA 1 10.85 9
116 116 58 4 NA 2 10.79 10
117 117 59 2 4 1 10.11 16
118 118 59 2 4 2 9.41 16
119 119 60 2 4 1 11.08 18
120 120 60 2 4 2 11.80 20
121 121 61 2 4 1 11.20 17
122 122 61 2 4 2 10.61 13
123 123 62 3 2 1 9.44 9
124 124 62 3 2 2 10.96 17
125 125 63 4 NA 1 10.69 7
126 126 63 4 NA 2 10.42 12
127 127 64 4 6 1 10.40 15
128 128 64 4 6 2 10.06 20
# Presents histogram of number of chocolate chip cookies
hist(cookie$numchoc)
# Store the histogram as an object
<- hist(cookie$numchoc) temp
# What is stored in temp?
temp
$breaks
[1] 6 8 10 12 14 16 18 20 22 24 26 28
$counts
[1] 3 15 12 24 28 24 15 3 3 0 1
$density
[1] 0.01171875 0.05859375 0.04687500 0.09375000 0.10937500 0.09375000
[7] 0.05859375 0.01171875 0.01171875 0.00000000 0.00390625
$mids
[1] 7 9 11 13 15 17 19 21 23 25 27
$xname
[1] "cookie$numchoc"
$equidist
[1] TRUE
attr(,"class")
[1] "histogram"
# Confidence interval calculation in HW02
<- mean(cookie$numchoc)
ybar <- length(cookie$numchoc)
J <- sqrt(20)
c c(2*ybar + c^2/J - sqrt(4*ybar*c^2/J + c^4/J^2), 2*ybar + c^2/J + sqrt(4*ybar*c^2/J + c^4/J^2))/2
[1] 13.64160 16.71778
Here I repeatedly generate \(10^4\) artificial datasets from \(\mathsf{U}\left(0,\theta\right)\). Set \(n\in \{5, 500\}\) and \(\theta=1\). I used apply()
, min()
, max()
, and runif()
commands. I compute the estimators \(\widehat{\theta}_1\), \(\widehat{\theta}_2\), and \(\widehat{\theta}_3\) to the artificial datasets, display histograms, and show the means and standard deviations in a table.
set.seed(20230519) # for reproducibility, exact results even in a different computer
<- 1
theta <- 10^4 # number of realizations to be obtained
nsim # repeatedly obtain realizations
<- replicate(nsim, runif(5, min = 0, max = theta))
ymat # calculate estimates
.5 <- 6*apply(ymat, 2, min)
theta1hat.5 <- 2*colMeans(ymat)
theta2hat.5 <- 6/5*apply(ymat, 2, max)
theta3hat# put a 1x3 canvas for the case of n=5
par(mfrow=c(1,3))
hist(theta1hat.5, freq = FALSE)
hist(theta2hat.5, freq = FALSE)
hist(theta3hat.5, freq = FALSE)
# n=500
<- replicate(nsim, runif(500, min = 0, max = theta))
ymat # calculate estimates
.500 <- 501*apply(ymat, 2, min)
theta1hat.500 <- 2*colMeans(ymat)
theta2hat.500 <- 501/500*apply(ymat, 2, max)
theta3hat# put a 1x3 canvas for the case of n=500
par(mfrow=c(1,3))
hist(theta1hat.500, freq = FALSE)
hist(theta2hat.500, freq = FALSE)
hist(theta3hat.500, freq = FALSE)
\(\widehat{\theta}_1\) | \(\widehat{\theta}_2\) | \(\widehat{\theta}_3\) | |
---|---|---|---|
\(n=5\) | |||
Mean | 1 | 1 | 1 |
SD (times 10) | 8.43 | 2.57 | 1.69 |
\(n=500\) | |||
Mean | 1.02 | 1 | 1 |
SD (times 10) | 10.05 | 0.26 | 0.02 |
So, all estimators of \(\theta\) are unbiased. The variances of the estimators are:
\[\mathsf{Var}\left(\widehat{\theta}_1\right)=\frac{n}{n+2}\theta^2,\ \mathsf{Var}\left(\widehat{\theta}_2\right) =\frac{1}{3n}\theta^2,\ \mathsf{Var}\left(\widehat{\theta}_3\right)=\frac{1}{n\left(n+2\right)}\theta^2\] Therefore, we expect the SD of the sampling distributions of the estimators to be \(\displaystyle\sqrt{\frac{n}{n+2}}\), \(\displaystyle\sqrt{\frac{1}{3n}}\), \(\displaystyle\sqrt{\frac{1}{n\left(n+2\right)}}\). The results from the Monte Carlo simulation, which were multiplied by 10 for presentation purposes, match the theoretical results quite well. In particular, we should see results close to 8.45, 2.58, and 1.69 for \(n=5\). We should also see results close to 9.98, 0.26, and 0.02 for \(n=500\).
You can definitely mix text and mathematics. For example:
Let \(Y_1,\ldots,Y_n\) be IID \(\mathsf{U}\left(0,\theta\right)\), where \(\theta>0\). The common pdf of these random variables is given by \[f\left(y\right)=\begin{cases}\dfrac{1}{\theta} & \ \ \ 0\leq y\leq \theta\\ 0 & \ \ \ \mathrm{otherwise} \end{cases},\] the common mean is \(\theta/2\), and the common variance is \(\theta^2/12\). The estimand for this exercise is \(\theta\).
You can have single line displayed mathematical expressions:
\[\mathbb{E}\left[\left(\widehat{\theta}-\theta\right)^2\right]=\mathbb{E}\left[\left(\widehat{\theta}-\mathbb{E}\left(\widehat{\theta}\right)\right)^2\right]+\left(\mathbb{E}\left(\widehat{\theta}\right)-\theta\right)^2=\mathsf{Var}\left(\widehat{\theta}\right)+\left(\mathbb{E}\left(\widehat{\theta}\right)-\theta\right)^2.\]
You can have multi-line displayed mathematical expressions:
\[\begin{eqnarray}\mathbb{P}\left(\left|\overline{Y}-\mathbb{E}\left(\overline{Y}\right)\right|\geq \frac{c\sigma}{\sqrt{n}}\right) \leq \frac{\mathsf{Var}\left(\overline{Y}\right)}{c^2\sigma^2/n} &\Rightarrow & \mathbb{P}\left(\left|\frac{\overline{Y}-\mu}{\sigma/\sqrt{n}}\right|\geq c\right) \leq \frac{1}{c^2} \\ &\Rightarrow & \mathbb{P}\left(\left|\frac{\overline{Y}-\mu}{\sigma/\sqrt{n}}\right|\leq c\right) \geq 1-\frac{1}{c^2}.\end{eqnarray}\]
You can mix with lists too:
I collect the following integration results below to tidy up the calculations later.
You can also have tables with mathematics:
Source | Sum of squares (SS) | Degrees of freedom (df) | Mean sum of squares (MS) | Expected MS |
---|---|---|---|---|
Mean | \(V_1^2=n\overline{Y}^2\) | \(1\) | \(n\overline{Y}^2\) | \(\sigma^2+n\mu^2\) |
Deviations | \(\displaystyle\sum_{i=2}^n V_i^2=\displaystyle\sum_{i=1}^n\left(Y_i-\overline{Y}\right)^2\) | \(n-1\) | \(S^2\) | \(\sigma^2\) |
Total | \(\displaystyle\sum_{i=1}^n V_i^2=\displaystyle\sum_{i=1}^n Y_i^2\) | \(n\) |
Notice that in the qmd file there is a line bibliography: references.bib
. What this means is that there is a separate file containing bibliographic information. Your job is to maintain this bib
file and populate it with entries which you will be eventually citing in your report. The format of the entries is the BibTeX format.
Fortunately, almost every publisher of journal articles has citation tools that make things convenient. For example, when you look up the citation information of Hamermesh and Parker (2005) at this website, you will can click on Cite and find Export citation to BibTeX. Sometimes a file will open in your browser, but most of the time a very small file with a .bib
or a .txt
file extension will be downloaded. Open this file using a text editor and copy/paste the contents to references.bib
. Sometimes it might be a good idea to clean up the contents to make it shorter, but still containing the most important pieces of information including the digital object identifier or DOI.1 For other publishers, your mileage may vary. I leave this for you to explore. But you got to admit that this gives everyone no excuse for not citing sources.
If you use R packages that were never used in class, please cite them. For example, if you want to cite the multcomp
package, then you type print(citation("sandwich"), bibtex=TRUE)
into the R console. It will generate a BibTeX entry. Put that entry into references.bib
. Do not forget to put a key. Usually these keys are provided or automatically generated. These keys are markers or identifiers which enable you to tie your citation in the qmd file to the entry in references.bib
. If this key is not provided or if you want to modify it, you can find it in the first line of every BibTeX entry. This first line typically starts with something like @Manual{Rcite,
and for this example, Rcite
is the key.
Open the qmd file which produced this HTML file so that you can see how to cite sources and how the keys play a role.
For reproducibility purposes, I would include something like “All analyses were performed using R Statistical Software v4.2.1 (R Core Team 2022) and tables are displayed using stargazer
v5.2.3 (Hlavac 2022).
Look at the bottom: Instant reference list!
If you choose this project option, you may need to install and load the ipumsr
package. The purpose of the package is to help you load an IPUMS dataset and to use its built-in commands to help you navigate the data. This may take some time to learn.
The code chunk below is not evaluated in Quarto because of eval: false
, but this can be removed.
if (!require("ipumsr")) stop("Reading IPUMS data into R requires the ipumsr package. It can be installed using the following command: install.packages('ipumsr')")
Here I use some commands in ipumsr
to load the data:
# Download the XML file from IPUMS and use it to load the definitions of the variables
<- read_ipums_ddi("usa_00008.xml")
ddi # Load the data based on the previous command
<- read_ipums_micro(ddi) data
Here I create a variable age.arrival
which may be interesting and I create a subset of the data that I have loaded previously.
# Calculate age at arrival
$age.arrival <- data$YRIMMIG - data$BIRTHYR
data# born in Mexico, household head, living in California, married with spouse present, and white
<- subset(data, BPL == 200 & RELATE == 1 & STATEFIP == 6 & MARST == 1 & RACE == 1)
filtered # total number of observations used
nrow(filtered)
# show distribution of age at arrival
table(filtered$age.arrival)
# show distribution for males and females
table(filtered$age.arrival, filtered$SEX)
The previous code chunk were left unevaluated. To be able to evaluate these code chunks, you have to remove the line #| eval: false
and you need access to the XML file usa00008.xml
and the data file usa00008.dat.gz
.
Here you can organize the commands needed to load packages and do other formatting options (if needed).
# To clean R's memory, leave this alone
rm(list=ls())
Here you explain why you have selected the chosen project option. Describe the problem(s) and research question(s) related to your chosen project option. Why do you think the problem(s) and the question(s) are important? What do you think is gained by addressing the problem(s) and/or answering the research question(s)? It is important to express into your own words, your own thoughts, and most preferably your own selected context.
Here you have to discuss what you plan to do for your report. Again, aim for something in between “a simple project done very well” and “a complicated project with flaws”.
Adjust the title to suit your situation. Your chosen project option will eventually determine what will be in this main body. Feel free to use subsections.
Summarize what you have done here and what you hope you could have done if you had more time.
There is a slight problem with some publisher when they export citations to BibTeX. The DOI entry has to modified so that you remove the first part https://doi.org/
.↩︎