Syllabus - CICS 197R - Spring 2021

Course Details

CICS 197R: Special Topics - Introduction to Data Analysis in R 1 credit. Pass/fail grading for undergraduates Meets T/R 2:30-3:45 from Feb 2 through Mar 11

Open to undergraduates from SPIRE, and graduate students with an over-ride (space allowing): https://www.cics.umass.edu/overrides

Instructor

Jasper McChesney Email: jmcchesney@cs.umass.edu

Regarding email: Please include "197R" in the subject line of all emails. I typically will not answer messages outside of normal business hours, and will not respond instantly even during them.

Learning Objectives

After completing this course, I expect you will have a solid foundation to pursue practical data work in R. You will be able to load, clean, and reshape data; calculate summary statistics and create statistical models; and design useful visualizations. You be able to discuss programming concepts such as data types and structures, functions, and flow of control. You will be able to write basic functions to perform data work, following good coding practices. Most importantly, you will know enough to be able to continue your education in R, programming, and data analysis on your own, in whatever discipline you work.

Diversity and Accomodations

I intend this course to be a welcoming environment for all kinds of students to learn programming and data analysis, regardless of previous programming experience, academic background, or personal characteristics.

If you have a documented disability on file with Disability Services, you may be eligible for reasonable accommodations in this course; please notify me as early as possible so we can make the proper arrangements.

If you are living in another time zone, you should still ensure you can attend most of the live sessions. In general, during these strange times, I'm happy to be flexible. But you do still need to make your presence known, and learn what you need to learn.

Format and Requirements

Lectures and Practice Problems

Lectures are pre-recorded and can be viewed each week from within Moodle. For each class, I will assign a small set of videos, typically totalling around 30 minutes.

There will also be a problem set for you to attempt. These will not be collected or graded, as they're for you to practice what you've seen in the lectures. Answer keys are provided for each: you're encouraged to look at the answer to each question once you believe you've addressed it, and then move on to the next problem. We'll discuss these problem sets in class (see below).

You should watch the assigned videos and work through the problems before each class they relate to, so you're prepared for the discussion and in-class exercises.

Class Sessions

We will meet virtually 12 times in 6 weeks, Tuesday and Thursday from 2:30 to 3:45, for 75 minutes sessions on Zoom. These sessions will be recorded (via Zoom) for later viewing.

Discussion

One part of each session (20-30 minutes) will be a varied discussion of the current material, driven partly by class input. Typical topics include:

Running Data Study

We will also analyze a large dataset together throughout the six weeks. Each session, we will try to apply what we've been learning to the example data study. Typically this will mean I give an in-class assignment, for you to try coding. Sometimes you will be in a break-out Zoom call to discuss the assignment with a small group, but otherwise you'll attempt it by yourself. Often I will ask you to submit answers so I have a sense of where the class is in its understanding. These are thus "diagnostic" in function, but no grade is assigned to the answer itself (I merely not that you attempted it).

Some of these exercises will not strictly be about writing code, but also exploring the data and the context around it, and how to present results to an imagined audience -- in that case, you will produce graphs or text and put them into a live Google document.

Final Exam

There will be a final take-home examination, where you will demonstrate the main skills you've learned in the class. It will be due roughly a week after class ends. You submission will consist of both code and a summary document with graphs and explanations of your analyses.

Basis of Grading

For undergraduates, the course has a mandatory pass/fail grading basis. Graduate students will receive a normal letter grade unless auditing. The final exam will be the primary basis for determining your grade. It does not have to be perfect to earn a pass (or A), but must show good understanding of each major area we've covered during the course, command of the R syntax, an ability to write functions and scripts, and the ability to think about data tasks through coding.

The only other basis of grading will be participation during live class meetings. This will include attempting the diagnostic problems (correct answers are not required), and discussing homework and other problems. Students with marginal final exam scores may nonetheless pass if they participate earnestly and regularly.

Collaboration and Academic Honesty

Homework may be completed with help from class-mates, the internet, or elsewhere (though I suggest attempting it on your own first). The final should be done on your own, but you are free to consult general R references as needed. (You're not allowed to ask specific questions about the exam online.)

Materials

Course Sites

Most materials, including video lectues and assignments, will be posted to Moodle.

The course web page provides certain overview documents: https://people.cs.umass.edu/~jmcchesney/197R/

Software

You will need to install RStudio (available for Windows, Mac, and Linux): https://rstudio.com/products/rstudio/download/#download

Reference

There is no required text for the course. If you want a text, I suggest the free YaRrr! A Pirate's Guide to R. (Silly title aside, it was written by a PhD candidate in the sciences, and is serious in intent.)

For detailed questions about real problems you face, you can always turn to the online forum Stack Overflow.

Schedule of Topics

The first half of the course will be devoted more to the technicalities of the R language, while the second half will broaden to more real-world applications. Note that homework and lectures for each topic should be viewed before class.

#DateTopics
12/2Expressions, Functions, and Writing Code
22/4Introduction to Vectors
32/9Data Types
42/11Working with Vectors
52/16Matrices
62/18Data Structures: Arrays, Lists, and Data Frames
72/23Reading Data From File
82/25Summarizing and Graphing Data Frames
93/2Conditionals, Loops and Advanced Parameters
103/4Working With Text
113/9Data Transformations
123/11Modeling
Before the First Class

Before we meet for the first time, you should download R Studio (see above), watch the videos for the first unit, and attempt the practice problems associated with them--If you're familiar with R these will be trivial; if you're not, they should still be easy.