Skip to main content

DAT203-1 - Data Science Essentials

·392 words·2 mins
MPPDS - This article is part of a series.
Part 7: This Article

Data Science Essentials (DAT203) marks the point where we have enough foundation that we can start forming a bigger picture of data science. To that goal, the course provides this definition:

Data Science is the exploration and quantitative analysis of all available structured and unstructured data to develop understanding, extract knowledge, and formulate actionable results.

Cynthia Rudin and Steve Elston are co-presenters in this entertaining, informative and well-organized course. Both are really effective instructors, with quite different teaching styles.

featured.gif

Cynthia covers the more theoretical topics including a fair bit of concepts relating to statistics. She is an entertaining presenter. A lot of personality and practical examples come through her presentations. She provides specific data science example project from her university lab, one of which concerns predicting manhole fires in Manhattan.

Steve does more of the applied instruction, including worked examples in the Azure Machine Learning (ML) environment. I’ll confess to having my doubts with his first couple of videos. There is a sense of “mad-scientist” in his mannerisms, but after you work past that you’ll realize he has a tremendous amount of insight to share. Among other things, he does a great job of orienting students to the Azure ML tool.

Class topics are divided into 6 categories:

DSIntro_Wiki.gif
  • Intro to Data Science
  • Probability and Statistics for Data Science
  • Simulation and Hypothesis Testing
  • Exploring and Visualizing Data
  • Data Cleansing and Manipulation
  • Introduction to Machine Learning

There are a total of 77 videos and a LOT of content to absorb. The 6 labs and final exam require the use of the Azure Machine Learning tool to answer questions. The learning curve for this tool is reasonable, and the pressure is reduced a bit in that the labs allow unlimited guesses and the final exam gives you two attempts for each question.

Like all the classes offered through edx.org – you are free to download the videos. Doing so allows you to accelerate or lower playback speed using something like VLC Media Player. It also affords a method to review topics at time of application somewhere down-the-road. As part of my wiki authoring of lecture notes, I’ve embedded references to the module and video to accelerate retrieval if needed (see image.)

I required just under 40 hours to complete this course. It has been one of the best courses so far in the MPP sequence.

Jonathan Bartleson
Author
Jonathan Bartleson
MPPDS - This article is part of a series.
Part 7: This Article

Related

DAT203.2 - Principles of Machine Learning

·427 words·3 mins
Principles of Machine Learning (DAT203.2) is the 7th in a series of 10 courses that form the Microsoft Professional Program in Data Science. It proves that the further you get into this 10-course sequence, the more enjoyable the classes become. Similar to Data Science Orientation, this class is co-led by Cynthia Rudin and Steve Elston. Principles of Machine Learning # The lecture is composed of 60 videos spanning 8 hours lecture time. Watching them and working the exercises reveals the true practical value of the data science tools. This course forces you to genuinely harness the Azure Machine Learning environment with Python or R scripts. All told, this course required about 30 hours to complete.

DAT204 - Intro to R for Data Science

·433 words·3 mins
As a developer, I’m drawn to terse/concise languages that are purpose-built for an objective. Regular expressions are a prime example. There is something beautiful about expressing things in few words (something I try to do in blogging with only partial success!) In this context, I was eagerly anticipating “Intro to R for Data Science.” This course (and this language) did not disappoint. Before going further, I should note something: Within the Microsoft Professional Program for Data Science, it is the student’s discretion to take a Python or an R track. Your decision will be shaped by whether you have prior familiarity with one of those environments, and whether you want to reinforce what you already know or venture into a new tool. Your choice of this class will logically dictate the 2nd “advanced” course required later in the MPP track.

DAT222x - Essential Statistics for Data Analysis using Excel

·545 words·3 mins
Call me a nerd, but statistics are fascinating and useful. I’d had quite a bit of course-work years ago in school, and was looking forward to “Essential Statistics for Data Analysis using Excel” as a refresher course. Unfortunately, the experience of this edX course might be tag-lined “Sadistics.” Completing this was a painful experience. I hope the notes here will make the experience a bit more tolerable for others. Essential Statistics for Data Analysis using Excel # There is a huge amount of content being presented. In terms of coverage, it is worthy of a full semester college course in statistics.