CSE 519: Data Science

Data Science is a rapidly emerging discipline at the intersection of statistics, machine learning, data visualization, and mathematical modeling. This course is designed to provide a hands-on introduction to Data Science by challenging student groups to build predictive models for upcoming events, and validating their models against the actual outcomes. This course will cover the building blocks of data science from managing the data itself to algorithmic and analytical techniques. Specific topics include data preparation, exploratory data analysis, statistics, visualization, optimization, unstructured data, distributed analyses.

Required Textbook: The Data Science Design Manual by Steven S. Skiena, 2017, Springer

Supplementary Textbooks:

  1. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney, 2012, O’Reilly Media
  2. The Visual Display of Quantitative Information by Edward R. Tufte, 2nd edition, 2001, Graphics Press
  3. The Signal and the Noise: Why So Many Predictions Fail-but Some Don’t by Nate Silver, 2015, Penguin Books
  4. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neil, 2017, Crown
  5. Data Science from Scratch: First Principles with Python by Joel Grus, 2nd edition, 2019, O’Reilly
  6. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham and Garrett Grolemund, 2017, O’Reilly