Skip to main content

DAT203.2 - Principles of Machine Learning

·427 words·3 mins
Table of Contents
MPPDS - This article is part of a series.
Part 8: This Article

Principles of Machine Learning (DAT203.2) is the 7th in a series of 10 courses that form the Microsoft Professional Program in Data Science. It proves that the further you get into this 10-course sequence, the more enjoyable the classes become. Similar to Data Science Orientation, this class is co-led by Cynthia Rudin and Steve Elston.

Principles of Machine Learning
#

The lecture is composed of 60 videos spanning 8 hours lecture time. Watching them and working the exercises reveals the true practical value of the data science tools. This course forces you to genuinely harness the Azure Machine Learning environment with Python or R scripts. All told, this course required about 30 hours to complete.

The course covers the principles of classification and regression, then spends an extended period of time discussing improved learning models such as tree and ensemble methods, optimization-based methods, clustering, and recommenders.

featured.gif

Cynthia discusses feature selection and regularization – all of which help you to build models using the most relevant features. While listening to the videos, you felt like you’re getting real-world knowledge of the value of these techniques. The presenters make an effort to explain practical limitations of the different approaches. They work an extended decision tree example about whether a customer is likely to wait for a table at a restaurant.

Due to the nature of the topics covered, you are getting a superficial treatment of the techniques and approaches available. But I felt the broad nature of the discussion helped me better understand the options, and would guide future application of these techniques to a problem. These topics deserve more study and practice to apply reliably.

Module assessments are 60% of the grade, you have two tries to answer. I found the modules particularly helpful because they come with a detailed step-by-step PDF guide that walks you through the Azure ML and Python or R scripts.

featured.gif

The final challenge is 40% of the grade. I required 6 hours to complete this. It is not overly difficult because they guide you a bit on the project. It is an occasion to synthesize and apply what you’ve learned in the course. Your goal is to design a predictive model to determine arrival time for airline flights. You are provided with a data set which contains historic flights, airline carrier, routes, time of day, the day of the week, etc. Your score is based primarily on your ability to apply your model to accurately predict arrival time of 25 flights.

This is an altogether informative and enjoyable course, I highly recommend it.

Jonathan Bartleson
Author
Jonathan Bartleson
MPPDS - This article is part of a series.
Part 8: This Article

Related

DAT203-1 - Data Science Essentials

·392 words·2 mins
Data Science Essentials (DAT203) marks the point where we have enough foundation that we can start forming a bigger picture of data science. To that goal, the course provides this definition: Data Science is the exploration and quantitative analysis of all available structured and unstructured data to develop understanding, extract knowledge, and formulate actionable results. Cynthia Rudin and Steve Elston are co-presenters in this entertaining, informative and well-organized course. Both are really effective instructors, with quite different teaching styles. Cynthia covers the more theoretical topics including a fair bit of concepts relating to statistics. She is an entertaining presenter. A lot of personality and practical examples come through her presentations. She provides specific data science example project from her university lab, one of which concerns predicting manhole fires in Manhattan.

DAT204 - Intro to R for Data Science

·433 words·3 mins
As a developer, I’m drawn to terse/concise languages that are purpose-built for an objective. Regular expressions are a prime example. There is something beautiful about expressing things in few words (something I try to do in blogging with only partial success!) In this context, I was eagerly anticipating “Intro to R for Data Science.” This course (and this language) did not disappoint. Before going further, I should note something: Within the Microsoft Professional Program for Data Science, it is the student’s discretion to take a Python or an R track. Your decision will be shaped by whether you have prior familiarity with one of those environments, and whether you want to reinforce what you already know or venture into a new tool. Your choice of this class will logically dictate the 2nd “advanced” course required later in the MPP track.

DAT222x - Essential Statistics for Data Analysis using Excel

·545 words·3 mins
Call me a nerd, but statistics are fascinating and useful. I’d had quite a bit of course-work years ago in school, and was looking forward to “Essential Statistics for Data Analysis using Excel” as a refresher course. Unfortunately, the experience of this edX course might be tag-lined “Sadistics.” Completing this was a painful experience. I hope the notes here will make the experience a bit more tolerable for others. Essential Statistics for Data Analysis using Excel # There is a huge amount of content being presented. In terms of coverage, it is worthy of a full semester college course in statistics.