Syllabus
Course Times and Office Hours
Lecture
We meet in 116 Natural Sciences Tuesdays and Thursdays, 1:30 - 2:50 pm. This class is in-person and will not have an online component.
Office hours - Prof. K
Fridays 10:15 - 11:45 Thursdays 11:00 am - 12 noon (and by appointment). If you are planning on coming after 11:45, please let me know.
Office Hours for Prof. K are in-person, 105 Old Botany (Old Botany is on W. Circle Drive near the Grand River parking ramp. 105 is on the first floor.)
Sunday evenings office hours are via Slack after 7:30pm or so till about 9pm.
Office Hours - Teaching Assistant
Nick Armstrong - armst427@msu.edu
TA Office Hours: Wednesdays 2-3pm and via Slack (if you are arriving after 2:30pm, please email Nick to let him know. Otherwise, he will depart the Zoom at 2:30)
TA Office Hours Zoom: https://msu.zoom.us/j/94474676913 (Passcode: 592204)
Slack
We will use Slack as a forum for asking questions about the course, including course policies and, primarily, help questions with R. Students are encouraged to help answer each others’ questions, and to use the forum as a first-step for seeking help. I, and our TA, will monitor slack and answer questions regularly. You can join our Slack channel with this link by September 20th
What is This Course and Can / Should You Take It?
Innovations in statistical learning have created many engineering breakthroughs. From real time voice recognition to automatic categorization (and in some cases production) of news stories, machine learning is transforming the way we live our lives. These techniques are, at their heart, novel ways to work with data, and therefore they should have implications for social science. This course explores the intersection of statistical learning (or machine learning) and social science and aims to answer two primary questions about these new techniques:
How does statistical learning work and what kinds of statistical guarantees can be made about the performance of statistical-learning algorithms?
How can statistical learning be used to answer questions that interest social science researchers, such as testing theories or improving social policy?
In order to address these questions, we will cover so-called “standard” techniques such as supervised and unsupervised learning, statistical learning theory and nonparametric and Bayesian approaches. If it were up to me, this course would be titled “Statistical Learning for Social Scientists”—I believe this provides a more appropriate guide to the content of this course. And while this class will cover these novel statistical methodologies in some detail, it is not a substitute for the appropriate class in Computer Science or Statistics. Nor is this a class that teaches specific skills for the job market. Rather, this class will teach you to think about data analytics broadly. We will spend a great deal of time learning how to interpret the output of statistical learning algorithms and approaches, and will also spend a great deal of time on better understanding the basic ideas in statistical learning. This, of course, comes at some cost in terms of time spent on learning computational and/or programming skills.
Enrollment for credit in this course is simply not suitable for those unprepared in or uninterested in elementary statistical theory no matter the intensity of interest in machine learning or “Big Data”. Really.
You will be required to understand elementary mathematics in this course and should have at least some exposure to statistical theory. The class is front-loaded technically: early lectures are more mathematically oriented, while later lectures are more applied.
The topics covered in this course are listed later in this document. I will assign readings sparingly from Introduction to Statistical Learning, henceforth referred to as ISL. This text is available for free online and, for those who like physical books, can be purchased for about $25. Importantly, the lectures deviate a fair bit from the reading, and thus you will rely on your course notes much more than you might in other classes.
If—after you have read this document and preferably after attending the first lecture—you have any questions about whether this course is appropriate for you, please come talk to me.
What This Course is Not
The focus of this course is conceptual. The goal is to create a working understanding of when and how tools from computer science and statistics can be profitably applied to problems in social science. Though students will be required to apply some of these techniques themselves, this course is not…
…a replacement for EC420, EC422, or a course in causal inference.
As social scientists, we are most often concerned with causal inference in order to anaOpelyze and write policies. Statistical learning and the other methods we will discuss in this course are generally not well-suited to these problems, and while I’ll give a short overview of standard methods, this is only to build intuitions. Ultimately, this course has a different focus and you should still pursue standard methodological insights from your home departments.
…a course on the computational aspects of the underlying methods.
There are many important innovations that have made machine learning techniques computationally feasible. We will not discuss these, as there are computer science courses better equipped to cover them. When appropriate, we will discuss whether something is computable, and we will even give rough approximations of the amount of time required (e.g. P vs NP). But we will not discuss how optimizers work or best practices in programming.
…a primer on the nitty-gritty of how to use these tools or a way to pad your resume.
The mechanics of implementation, whether it be programming languages or learning to use APIs, will not be covered in any satisfying level of depth. Students will be expected to learn most of the programming skills on their own. Specifically, while there will be some material to remind you of basic R
commands, this is not a good course for people who are simply looking to learn the mechanics of programming. This course is designed to get you to use both traditional analytics and, eventually, machine learning tools. We will do some review of basic programming, and you will have the opportunity to explore topics that interest you through a final project, but ultimately this is a course that largely focuses on the theoretical and practical aspects of statistical learning as applied to social science and not a class on programming.
Perhaps most importantly, this course is an attempt to push undergraduate education toward the frontiers in social science. Accordingly, please allow some messiness. Some topics may be underdeveloped for a given person’s passions, but given the wide variety of technical skills and overall interests, this is a near certainty. Both the challenge and opportunity of this area comes from the fact that there is no fully developed, wholly unifying framework. Our collective struggle—me from teaching, you from learning—will ultimately bear fruit.
Success in this Course
I promise, you are equipped to succeed in this course.
Learning R
can be difficult at first. Like learning a new language—Spanish, French, or Chinese—it takes dedication and perseverance. Hadley Wickham (the chief data scientist at RStudio and the author of some amazing R packages you’ll be using like) ggplot2—made this wise observation:
It’s easy when you start out programming to get really frustrated and think, “Oh it’s me, I’m really stupid,” or, “I’m not made out to program.” But, that is absolutely not the case. Everyone gets frustrated. I still get frustrated occasionally when writing R code. It’s just a natural part of programming. So, it happens to everyone and gets less and less over time. Don’t blame yourself. Just take a break, do something fun, and then come back and try again later.
Even experienced programmers (like me) find themselves bashing their heads against seemingly intractable errors.1 If you’re finding yourself bashing your head against a wall and not making progress, try the following. First, take a break. Sometimes you just need space to see an error. Next, talk to classmates. Finally, if you genuinely cannot see the solution, e-mail the TA. But, honestly, it’s probably just a typo.
Course materials
The course website can be found at https://ssc442Kirkpatrick.netlify.app (but you know that. You’re on it right now.)
All of the readings and software in this class are free. There are free online version of all the texts including Introduction to Statistical Learning (2nd Ed) and R
/ RStudio are free (don’t pay for RStudio). We will reference outside readings and there exist paper versions of some “books” but you won’t need to buy anything2
R and RStudio/Posit
You will do all of your analysis with the open source (and free!) programming language R
. You will use RStudio (which is undergoing a slow-mo rebrand to “Posit” while the functionality remains the same) as the main program to access R. Think of R
as an engine and RStudio as a car dashboard—R
handles all the calculations produces the actual statistics and graphical output, while RStudio provides a nice interface for running R
code.
R
is free, but it can sometimes be a pain to install and configure. To make life easier, you can (and should!) use the free Posit.cloud (formerly Rstudio.cloud) service, which lets you run a full instance of RStudio in your web browser. This means you won’t have to install anything on your computer to get started with R
. We recommend this for those who may be switching between computers and are trying to get some work done. That said, while Posit.cloud is convenient, it can be slow and it is not designed to be able to handle larger datasets or more complicated analysis and graphics. You also can’t use your own custom fonts with RStudio.cloud.3 And, generally speaking, you should have (from the prerequisite course) sufficient experience to make your R
work. If not, over the course of the semester, you’ll probably want to get around to installing R
, RStudio, and other R
packages on your computer and wean yourself off of RStudio.cloud. If you plan on making a career out of data science, you should consider this a necessary step.
You can find instructions for installing R
, RStudio/Posit, and all the tidyverse packages here. And you may find some other goodies.
Online help
Data science and statistical programming can be difficult. Computers are stupid and little errors in your code can cause hours of headache (even if you’ve been doing this stuff for years!).
Fortunately there are tons of online resources to help you with this. Two of the most important are StackOverflow (a Q&A site with hundreds of thousands of answers to all sorts of programming questions) and RStudio Community (a forum specifically designed for people using RStudio and the tidyverse (i.e. you)).
Searching for help with R
on Google can sometimes be tricky because the program name is, um, a single letter. Google is generally smart enough to figure out what you mean when you search for “r scatterplot”, but if it does struggle, try searching for “rstats” instead (e.g. “rstats scatterplot”). Likewise, whenever using a specific package, try searching for that package name instead of the letter “r” (e.g. “ggplot scatterplot”). Good, concise searches are generally more effective.
Help with Using R: There are some excellent additional tutorials on R available through Rstudio/Posit Clould Primers.
Evaluations and Grades
Your grade in this course will be based on attendance/participation, labs, weekly writings, and a final project.
The general breakdown will be approximately 55% for labs, participation, and weekly writings, and 45% for projects (see below for specific details). The primary focus of the course is a final project; this requires two “mini-projects” to ensure you’re making satisfactory progress. Assignment of numeric grades will follow the standard, where ties (e.g., 91.5%) are rounded to favor the student. Evaluations (read: grades) are designed not to deter anyone from taking this course who might otherwise be interested, but will be taken seriously.
Weekly writings are intended to be an easy way to get some points. Labs will be short homework assignments that require you to do something practical using a basic statistical language. Support will be provided for the R
language only. You must have access to computing resources and the ability to program basic statistical analyses. As mentioned above, this course will not teach you how to program or how to write code in a specific language. If you are unprepared to do implement basic statistical coding, please take (or retake) PLS202. I highly encourage seeking coding advice from those who instruct computer science courses – it’s their job and they are better at it than I am. I’ll try to provide a good service, but I’m really not an expert in computer science.
More in-depth descriptions for all the assignments are on the assignments page. As the course progresses, the assignments themselves will be posted within that page.
To Recap:
Assignment | Points | Percent |
---|---|---|
Class Participation | 25 | 5% |
Weekly Writings (14-2 x 8 ea), drop two lowest | 96 | 18% |
Labs (13-1 x 15 ea), drop one lowest | 180 | 34% |
Mini project 1 | 50 | 9% |
Mini project 2 | 50 | 9% |
Final project | 135 | 25% |
Total | 536 | — |
Grade | Range | Grade | Range |
---|---|---|---|
4.0 | 92-100% | 2.0 | 72-76% |
3.5 | 87-91% | 1.5 | 67-72% |
3.0 | 82-87% | 1.0 | 62-67% |
2.5 | 77-81% | 0.0 | bad-66% |
Class Participation
Participation can take many forms. Most preferred is active participation during class – asking clarifying questions or responding to lecture questions. In the latter half of the semester, we will work in groups in class on small coding exercises and will share results at the end. Sharing your group’s results and code will always count towards participation. Finally, I will often bribe give extra credit points for answering specific questions in class, which I will clearly state as extra credit. Wrong answers get the same credit as right answers. We are here to learn . If you knew everything already, you wouldn’t be in the class.
Academic honesty
Violation of MSU’s Spartan Code of Honor will result in a grade of 0.0 in the course. Moreover, I am required by MSU policy to report suspected cases of academic dishonesty for possible disciplinary action.4
Generative AI
Generative AI is both a computing resource and potential avenue for cheating in violation of the Spartan Code of Honor (see Academic Integrity, below). For this course, some use of generative AI is permitted or forbidden as follows:
For the purposes of learning R coding, data cleaning and processing, visualization, and other coding applications of R, you are fully permitted to use ChatGPT or other generative AI models provided you indicate such a use at the top of your problem set, and include a comment
# used generative AI
in your code. When “stuck” on coding, I have always encouraged students to use Stack Overflow to find similar problems and then translate the found solutions into the problem at hand. It is one of the most important skills in coding. Generative AI facilitates this process, and thus is a tool. Since R itself is a tool to help you learn and apply the material, I view using generative AI for help in specific coding tasks as a suitable use. You must, in all cases, understand what your code is doing, and be able to justify its use. If you cannot tell me, on review, what a line of code is used for, then you have not used generative AI properly.Using generative AI to solve a complete problem set or problem set question beyond the mechanics and specifics of R coding is absolutely prohibited. Using generative AI to solve a problem verbatim without understanding the coding solution is absolutely prohibited. Violation of this policy is a violation of the Spartan Code of Honor and will result in immediate disciplinary action. See Academic honest above. I will not hesitate to file an academic dishonesty report.
Grading
All grades are considered final. Any request for a re-grade beyond simple point-tallying mistakes will require that the entire assignment be re-graded. Any points previously awarded may be changed in either direction in the re-grade.
Resources
Mental health concerns or stressful events may lead to diminished academic performance or reduce a student’s ability to participate in daily activities. Services are available to assist you with addressing these and other concerns you may be experiencing. You can learn more about the broad range of confidential mental health services available on campus via the Counseling & Psychiatric Services (CAPS) website at www.caps.msu.edu.
Accommodations
If you need a special accommodation for a disability, religious observance, or have any other concerns about your ability to perform well in this course, please contact me immediately so that we can discuss the issue and make appropriate arrangements. MSU has a specific policy for religious observance available here.
Michigan State University is committed to providing equal opportunity for participation in all programs, services and activities. Requests for accommodations by persons with disabilities may be made by contacting the Resource Center for Persons with Disabilities at 517-884-RCPD or on the web at rcpd.msu.edu. Once your eligibility for an accommodation has been determined, you will be issued a verified individual services accommodation (“VISA”) form. Please present this form to me at the start of the term and/or two weeks prior to the accommodation date (test, project, etc). Requests received after this date will be honored whenever possible.
COVID-19
Things are hard right now. You most likely know people who have lost their jobs, have tested positive for COVID-19, have been hospitalized, or perhaps have even died. You all have increased (or possibly decreased) work responsibilities and increased family care responsibilities. You might be caring for extra people right now, and you are likely facing uncertain job prospects.
I’m fully committed to making sure that you learn everything you were hoping to learn from this class! I will make whatever accommodations I can to help you finish your exercises, do well on your projects, and learn and understand the class material. Under ordinary conditions, I am flexible and lenient with grading and course expectations when students face difficult challenges. Under pandemic conditions, that flexibility and leniency is intensified.
If you feel like you’re behind or not understanding everything, do not suffer in silence! Please contact me. I’m available at e-mail.
Mandated Reporting
Essays, journals, and other materials submitted for this class are generally considered confidential pursuant to the University’s student record policies. However, students should be aware that University employees, including instructors, may not be able to maintain confidentiality when it conflicts with their responsibility to report certain issues to protect the health and safety of MSU community members and others. As the instructor, I must report the following information to other University offices (including the Department of Police and Public Safety) if you share it with me: • Suspected child abuse/neglect, even if this maltreatment happened when you were a child; • Allegations of sexual assault, relationship violence, stalking, or sexual harassment; and • Credible threats of harm to oneself or to others. These reports may trigger contact from a campus official who will want to talk with you about the incident that you have shared. In almost all cases, it will be your decision whether you wish to speak with that individual. If you would like to talk about these events in a more confidential setting, you are encouraged to make an appointment with the MSU Counseling and Psychiatric Services.
Acknowledgements
This syllabus and course structure was developed by Prof. Ben Bushong with iterative refinements by myself. All credit goes to Prof. Bushong. All errors are my own.
Miscellanea
All class material will be posted on https://ssc442kirkpatrick.netlify.app. D2L will be used sparingly for submission of weekly writings and assignments and distribution of grades.
Using Office Hours
Please use my drop-in Office House Fridays 10:15 - 12:00 and by appointment. I will always stay from 10:15 to 11:00, but if nobody is present at 11:00, I will likely depart my office. If you wish to arrive after 11, just email me to let me know and I will be here.
It would be remarkable if you didn’t need some assistance with the material, and I am here to help. One of the benefits of open office hours is to accommodate many students at once; if fellow students are in my office, please join in and feel very free to show up in groups.
As a general rule, please first seek course-related help from the TA or my drop-in office hours. However, if my scheduled office hours are always infeasible for you, let me know, and then I may encourage you to make appointments with me. I ask that you schedule your studying so that you are prepared to ask questions during office hours – office hours are not a lecture and if you’re not prepared with questions we will end up awkwardly staring at each other for an hour until you leave.
Some gentle requests regarding office hours and on contacting me. First, my office hours end sharply at the end, so don’t arrive 10 minutes before the scheduled end and expect a full session. Please arrive early if you have lengthy questions, or if you don’t want to risk not having time due to others’ questions. You are free to ask me some stuff by e-mail, (e.g. a typo or something on a handout), but please know e-mail sucks for answering many types of questions. “How do I do this lab?” or “What’s up with R
?” are short questions with long answers. Come to office hours.
Contacting Me
Email is a blessing and a curse. Instant communication is wonderful, but often email is the wrong medium to have a productive conversation about course material. Moreover, I get a lot of emails. This means that I am frequently triaging emails into two piles: “my house is burning down” and “everything else”. Your email is unlikely to make the former pile. So… asking questions about course material is always best done in-class or in office hours. Students always roll their eyes when professors say things like that, but it’s true that if you have a question, it’s very likely someone else has the same question.
That said, email is still useful. If you’re going to use it, you should at least use if effectively. There’s a running joke in academia that professors only read an email until they find a question. They then respond to that question and ignore the rest of the email. I won’t do this, but I do think it is helpful to assume that the person on the receiving end of an email will operate this way. By keeping this in mind, you will write a much more concise and easy to understand email.
Some general tips:
- Always include
[SSC442]
in your subject line (brackets included). - Use a short but informative subject line. For example:
[SSC442] Final Project Grading
- Use your University-supplied email for University business. This helps me know who you are.
- One topic, one email. If you have multiple things to discuss, and you anticipate followup replies, it is best to split them into two emails so that the threads do not get cluttered.
- Ask direct questions. If you’re asking multiple questions in one email, use a bulleted list.
- Don’t ask questions that are answered by reading the syllabus! This drives me nuts.
- I’ve also found that students are overly polite in emails. I suppose it may be intimidating to email a professor, and you should try to match the style that the professor prefers, but I view email for a course as a casual form of communication. Said another way: get to the point. Students often send an entire paragraph introducing themselves, but if you use your University email address, and add the course name in the subject, I will already know who you are. Here’s an example of a perfectly reasonable email:
Subject: [SSC442] Lab, Question 2, Typo
Hi Prof. Kirkpatrick,
There seems to be a typo in the Lab on Question 2. The problem says to use a column of data that doesn’t seem to exist. Can you correct this or which should we use?
Thanks, Student McStudentFace
Letters of Recommendation / References
If you are applying for further study or another pursuit that requires letters of recommendation and you’d like me to recommend you, I will be glad to write a letter on your behalf if your final grade is a 4.0. Grades below a 4.0 may be handled on a case-by-case basis. In addition, you should have held at least three substantial conversations with me about the course material or other academic subjects over the course of the semester.
By the end of the course, you will realize that 1) I make many many many errors; 2) that I frequently cannot remember a command or the correct syntax; and 3) that none of this matters too much in the big picture because I know the broad approaches I’m trying to take and I know how to Google stuff. Learn from my idiocy.↩︎
If you’ve got money to burn, you can buy me a burrito.↩︎
This bothers me way more than it should.↩︎
So just don’t cheat or plagiarize. This is an easy problem to avoid.↩︎