Data Science Essentials (DAT203) marks the point where we have enough foundation that we can start forming a bigger picture of data science. To that goal, the course provides this definition:
Data Science is the exploration and quantitative analysis of all available structured and unstructured data to develop understanding, extract knowledge, and formulate actionable results.
Cynthia Rudin and Steve Elston are co-presenters in this entertaining, informative and well-organized course. Both are really effective instructors, with quite different teaching styles.
Cynthia covers the more theoretical topics including a fair bit of concepts relating to statistics. She is an entertaining presenter. A lot of personality and practical examples come through her presentations. She provides specific data science example project from her university lab, one of which concerns predicting manhole fires in Manhattan.
Steve does more of the applied instruction, including worked examples in the Azure Machine Learning (ML) environment. I'll confess to having my doubts with his first couple of videos. There is a sense of "mad-scientist" in his mannerisms, but after you work past that you'll realize he has a tremendous amount of insight to share. Among other things, he does a great job of orienting students to the Azure ML tool.
Class topics are divided into 6 categories:
- Intro to Data Science
- Probability and Statistics for Data Science
- Simulation and Hypothesis Testing
- Exploring and Visualizing Data
- Data Cleansing and Manipulation
- Introduction to Machine Learning
There are a total of 77 videos and a LOT of content to absorb. The 6 labs and final exam require the use of the Azure Machine Learning tool to answer questions. The learning curve for this tool is reasonable, and the pressure is reduced a bit in that the labs allow unlimited guesses and the final exam gives you two attempts for each question.
Like all the classes offered through edx.org -- you are free to download the videos. Doing so allows you to accelerate or lower playback speed using something like VLC Media Player. It also affords a method to review topics at time of application somewhere down-the-road. As part of my wiki authoring of lecture notes, I've embedded references to the module and video to accelerate retrieval if needed (see image.)
I required just under 40 hours to complete this course. It has been one of the best courses so far in the MPP sequence.