Remarks on student projects

What I have learned from reading your projects and what you can do moving forward

Most students chose writing up a tutorial based on any of Chapters 11, 13, and 14 of LM. After that, the next most frequent choice is a deeper exploration of some articles of the American Statistician. Finally, there were a few who chose to create case studies based on some similar cases found in the American Statistician.

I thank most of the students for submitting work that they should be proud of. The project was supposed to be a learning opportunity for you to build confidence in venturing into the potentially risky unknown. Some students preferred to stay safe in their project choices, which is ok but the choices may not help them grow. I also thank some students for handing in projects which made me ask my own questions for further research and for future teaching material.

In this page, I lay out what I found from reading all your projects and what you can do when you have other future projects which are likely to become more high-stakes. You would notice that English use and grammatical issues are not the biggest problems!

Based on your projects, I found that students:

Do not make clear what their contributions were

Remember that it is your project. Therefore, it is important to show what you are contributing on top of what is already available. Why is your contribution important? What do you do differently that is important for others to read and know about?

Use legitimate sources but do not completely integrate or process with own understanding (or do not dig deep into the material leading to a superficial understanding)

Many students consult sources beyond LM or the articles from the American Statistician. This is a good thing, but a missing step every student forgets is to integrate and process with one’s own understanding of the material. For example, some students who chose Chapter 11 on linear regression did not even notice that the conditional expectation is generally nonlinear and that the case of joint normality is extremely special. The question is if the conditional expectation is nonlinear, what exactly will linear regression help us learn? In other words, what is the estimand that linear regression is estimating. Students should have noticed that the example of a conditional expectation in the book does not match any of the curvilinear forms discussed!

Do not follow instructions

The project options are explicit. You have to choose from them and not submit anything that deviates from what was explicitly laid out. If you choose to deviate, consult the instructor first. Ask for permission, rather than forgiveness. Furthermore, the project options make clear the targets of the projects (for example, who is the audience, what are expected, etc).

Cite sources just for the sake of citing sources

What this means is that citations (like Larsen and Marx 2018) are placed at the end without making clear what part of the work was due to the cited source. When you cite a source, make sure to actually read the source and determine whether it helps you make your point. The main idea of the article cited should ideally match with what you want to say and you have to make clear how exactly the source played a role in your project.

Copy R code from CSDN (for example)

The origin of the R codes, especially if these codes are not yours, has to be disclosed. Remember that it is your work that is being judged. If you include code that is not yours, yet you claim (implicitly or explicitly) as your own, then it becomes difficult to judge what exactly is truly yours.

Copy material from LM or from the article directly into your work

Take note that you are not asked to re-type what could already be seen directly from books or articles. You have to exercise judgment and craft your project in such a way that it demonstrates your processing and understanding of the material. You can always refer to the book or cited article. But of course, it does not mean that you cannot highlight certain portions of the book or article. Inf act, you can but you have to more selective or if possible do some detailed derivations, especially given the audience specified for the project.

Do not make sure that their qmd file and other supplemental materials (pictures, csv files, etc) are complete and could be rendered on an independent computer

Many students ignore this aspect because they think they could finish this at the end. Usually, the polishing and the complete editing take just as much time as the writing. The point of having you do everything in Quarto is to automate a lot of the polishing and to ensure reproducibility (which is a minimal requirement for documentation purposes) at the very beginning. This frees up time for you to focus on what would be more important, which are the content and contributions!

Do not revise according to suggestions

The revision to your initial report was handed to you personally. The least you could do is to listen and act on those suggestions. If a suggestion does not seem feasible, you have to discuss why it is not feasible given your context. On example is the request to include embed-resources: true in the qmd file. If you cannot even do this, how would you expect anyone to want to engage with you or even to take you seriously? If you think the suggestions for revision are wrong, then you have to explain why the suggestions would not be suitable and demonstrate that you have actually engaged with the suggestions.

Do not choose sources carefully, especially internet sources

When you choose web sources, look at the internet source carefully and consider the reputation of the site. For example, using a bookseller’s website as a hyperlink to a book is probably not the most suitable. Choose the publisher of the book instead. As for the reputation of the site, it really needs you to be more exposed to a variety of sources other than local Chinese sources, so that you can effectively judge which sources are reputable. For example, chegg.com is one of those websites which are in a gray area. It has been used for learning in the sense of getting answers to homework, but that is not true learning at all. Of course, chegg.com is a platform so one could argue it is not their fault.

Misunderstand what reproduce means

Some of the projects ask you to reproduce simulation or findings. What that means is that you either use your own code and follow the narrative description in the paper to produce the simulation results of the paper or you use the code available in the supplemental material (and cite directly!!) to reproduce the simulations or findings. Note that “reproduce” does not mean typing most of the article into your qmd file!!

Do not document the origins of the data

This issue is for those who created case studies, but equally applies to other types of projects. You have already encountered two classmates who already faked their data for the M&M exercise. Specifying the origins of the data allow any reader to actually try out the analysis for themselves. I think every beginning student must have encountered difficulties with how to track down data used for a particular piece of research. It is a general public good to be upfront about the origins of the data. Otherwise, there is no way to engage in a conversation about your project. Remember that in the statistical method, the data are in the first three steps of PPDAC.

Do not define (unifying and making clear) the notation

This is less serious but it affects the flow of your writing. Make sure to define what is meant by the notation, especially if it is not common knowledge for the audience specified for the project. Furthermore, when you use notation, make sure that you unify the notation. For example, if subscripts x and y are used for to distinguish two independent samples, then don’t use subscripts 1 and 2 later (especially if the article or book uses this).

Do not format citations carefully (or in an automated manner with checking)

This is less serious but really takes a lot of time, especially if done at the end. One of the major conveniences we have nowadays is instant bibliography generation and the ability to unify the formatting of citations. I would suggest that students start to consult style guides, such as the APA style guide or the Chicago Manual of Style.