**CICS 397A:
Predictive Analytics with Python, Fall 2019**

**https://www.cics.umass.edu/content/fall-19-course-descriptions**

** **

**Course Number**: CICS 397A

**Instructor**: Swarna Reddy

**Teaching
Assistants**:
To be announced

**Location**: To be announced (aiming to be at TT : 4-5:15pm)

**Time**: To be announced

**Instructor office hours**: To be announced

** Link
to Piazza**:
contains schedule, assignments, etc.

**Course
Description**:

Twenty first century technological advances are generating ever-greater volumes of data. Examples of these sources include the ubiquitous internet and the notorious smart phone. There is an astounding number of opportunities to use these data for good (and bad) in the applied sciences, business, social media, politics, cyber security, to name a few areas. Gaining insight from these data requires a firm understanding of the mathematics and computational methods upon which the methods are based and put into use. That said, the elements of data science are indeed accessible to a fairly broad audience, and so our goal is to provide course participants with an understanding of these elements through application.

The specific course objectives are to educate participants in some of the most commonly used data analytics including methods for reducing massively large data to informative statistics, data visualization, and cluster analysis. Practical data science demands the ability to program in a scripting language and therefore, students in this course will learn and use the most popular of these languages Python. The first learning goal is to understand these central data analytic methods, and the second learning goal is to know how to use them with Python. Our approach is close to the metal you'll create the Python scripts from the ground up and apply them to real and fascinating data sets.

The course will use a new approach, with in-class tutorials. The tutorials introduce students that are new to the area with practical data analytics. The topic-wise tutorials in python with actual data sets in the areas of political campaign contributions, the complex CDC-BRFSS (Behavioral Risk Factor Surveillance System). The choice of BRFSS data is due to its complexity, not just to benefit those, who are interested in healthcare industry but introduce the experience of information retrieval in the arena of data science and big data analytics. The course also teaches how to identify and analyze the stylistics in writing, the special case of this analysis is more known with the applications to identify plagiarism.

** **

**Required
Background**:

This course requires mathematical background in probability and statistics, calculus, and background in linear algebra is desirable. The general awareness of big data applications of current environment gives better insight of the course. The official prerequisites are Either COMPSCI 190 or STAT 240 (equivalent) and COMPSCI 119.

** **

**Override
questions**:

If
you'd like to take this course but cannot register, please submit an override
request through the online system. Above all,
please describe your background in Stat/Mathematics and or computer science.
Please list any courses you've taken either *in* those areas, or any other
relevant training or experience you might have.

** **

**Textbooks**:

The course readings will primarily be based the following textbook:

**Algorithms
for Data Science**
(ISBN-10: 3319457950)

https://www.springer.com/us/book/9783319457956

*About
textbook: This textbook on practical data analytics unites fundamental
principles, algorithms, and data. Algorithms are the keystone of data analytics
and the focal point of this textbook. Clear and intuitive explanations of the
mathematical and statistical foundations make the algorithms transparent. But
practical data analytics requires more than just the foundations. Problems and
data are enormously variable and only the most elementary of algorithms can be
used without modification. Programming fluency and experience with real and
challenging data is indispensable and so the reader is immersed in Python and R
and real data analysis. *

** **

Note: Chapter previews are available at publishers web-site.

** **

**Course
Format: **

Class meetings are divided between lectures and working in small groups on programming and data analytics.

**Course
requirements**:

50% In-class Tutorials/ Home works.

Homework and Tutorials: Homework exercises emphasizing applications of the algorithms will be assigned biweekly. Home works are usually include both written math questions, as well as programming submission problems.

Tutorials are oriented toward gaining proficiency in programming by guiding the student through the creation of a Python script. Students are responsible for completing 4 tutorials per month (due at the beginning of each month except September).

20% Midterm

30% final Exam

**Major
topics:**

1. Data mappings and the concepts of data reduction. Similarity measures and distance metrics.

2. List, set, and dictionary comprehension.

3. Scalable algorithms and associative statistics. Computing univariate and multivariate statistics using big data.

4. Introduction to distributed computing and the Map/Reduce algorithm.

5. Data visualization and ggplot2.

6. Predictive analytics. K-nearest neighbor methods and regression.

7. Cluster analysis. Hierarchical and k-means methods.

**The
Academic Honesty:**

** **

We follow the universitys Academic Honesty Policy and Procedures.

If you have questions about a particular situation, please ask.