Skip to main content

Microsoft Professional Program in Data Science

·633 words·3 mins
Table of Contents
MPPDS - This article is part of a series.
Part 1: This Article

Microsoft recently announced a 10-part online course entitled Microsoft Professional Program in Data Science. Over the last couple of months, I’ve been working through this sequence and wanted to share with others what the experience has been like. The topics are a pretty direct “hit” for me as I’ve wanted to shore up my skills on the analysis side of things to complement skills in SQL Server.

featured.png

Curriculum for Microsoft Professional Program in Data Science
#

The curriculum is provided via edX.org and consists of 9 classes with a 10th element being a capstone project. The courses can be audited for free. If you are interested in completing all 10 you’re eligible for a new badge of sorts known as a “Microsoft Professional Program Certificate in Data Science”. The certificate status requires paying for individual classes.  Program details are here: https://academy.microsoft.com/en-us/professional-program/data-science/

Course Review
#

Listed below are summaries of the individual classes. For each class, I’ve tracked how many hours were required to complete, described the content and details about how it was presented. 

DAT101 - Data Science Orientation DAT201x - Querying with Transact-SQL DAT207x - Analyzing and Visualizing Data with PowerBI DAT222x - Essential Statistics for Data Analysis using Excel DAT204 - Intro to R for Data Science DAT203 - Data Science Essentials DAT203.2 - Principles of Machine Learning DAT209 - Programming with R for Data Science DAT213 - Analyzing Big Data with Microsoft R Server DAT102 - Microsoft Professional Program - Capstone

Key Takeaways
#

All told, this 10-course sequence has required a total of about 370 hours to complete.

Was it Worthwhile?
#

Absolutely, it was worthwhile. Here is why:

  • The coursework gives you hands-on experience with a variety of data science tools and techniques.
  • They force you to study fundamental data science topics you otherwise might not.
  • The quality of the training is very good and is laid out in a logical sequence. The R and machine-learning courses were particularly well done (eg., DAT203, DAT203.2, DAT204, DAT209, DAT213.)
  • The process will help you identify the areas of the data science field you have aptitude and interest.
  • On completion, you can start applying these skills on behalf of your organization.
  • You will realize how much more there is to learn in this field!

One of the key takeaways for me has been the beauty of the R language, and how comparatively frustrating AzureML is to use. For some reason, I don’t mind the GUI of something like SQL Server Integration Services. But I found the AzureML web interface to be very cumbersome. To its credit - this course sequence will allow you to experiment with a variety of different tools and learn which you prefer.

Advice for Maximizing Experience with Classes
#

For someone just starting this series, I would recommend the following:

  • Download the videos to your network. They are the sort of thing you may benefit from down the road,
  • Use VLC Media Player and program your arrow keys to speed up / slow down video (see screenshot). Many of the videos can be played back at an accelerated rate to save you time.
  • Know that there is too much content being delivered to permanently recall everything…
  • …So take detailed notes with screenshots when appropriate. I’ve populated a couple dozen pages in our Atlassian Confluence Wiki. This type of written reference will be useful to you long after the course is done.
  • Audit each course up to the point you pass. All of the courses allow this. Wait to pay until you know you’ve passed. There is no penalty for approaching it this way.

These data science techniques offer great potential for many organizations. I’ve been really pleased with the quality of the instruction. I hope the notes above are helpful to others interested in learning these topics.

Capstone_VLC_SpeedUp.png
Capstone_ConfluenceWiki.png

Jonathan Bartleson
Author
Jonathan Bartleson
MPPDS - This article is part of a series.
Part 1: This Article

Related

DAT213 - Analyzing Big Data with Microsoft R Server

·773 words·4 mins
This course teaches exploratory data analysis skills using the Microsoft R Server implementation known as RevoScaleR. This product is in most ways functionally equivalent to the open source CRAN-R. RevoScaleR offers three significant benefits over its open source brother: the ability to run analyses in parallel across different servers, the ability to “chunk” data for evaluation and bypass the in-memory limitation of R, and the ability to read more natively from data sources like SQL Server, Hadoop, and Spark. This course explains these benefits and allows a new user to become familiar with the RevoScaleR tool. Analyzing Big Data with Microsoft R Server # The course is divided into 4 segments:

DAT209 - Programming R

·373 words·2 mins
Programming R for Data Science is taught by Anders Stockmarr (on the faculty of Technical University of Denmark.) For US audiences, his accent requires some getting used to. He places emphasis on unexpected syllables and has a unique way of pronouncing many things. I found it helpful to use headphones and to adjust the playback speed of the recordings. It is worth making the effort to understand Dr. Stockmarr because he has put together a course with a lot of substance, using a tight script and backed up by supporting exercises. Programming R Course Highlights # I genuinely enjoyed this course, it goes a lot deeper than the introductory course in R taken earlier in the MPP sequence. For the first course, I used RStudio to experiment. With this course, I wanted to use the Visual Studio version of R to work the exercises and labs. The R Tools for Visual Studio (https://www.visualstudio.com/vs/rtvs/) required some fiddling to get installed, but were stable and had nice IDE features I’ve become used to with VS. Becoming familiar with R Tools for Visual Studio at this point will prepare you for taking DAT213 “Analyzing Big Data in MS R Server” which is the logical follow-on course in the MPP Data Science sequence.

DAT203-1 - Data Science Essentials

·392 words·2 mins
Data Science Essentials (DAT203) marks the point where we have enough foundation that we can start forming a bigger picture of data science. To that goal, the course provides this definition: Data Science is the exploration and quantitative analysis of all available structured and unstructured data to develop understanding, extract knowledge, and formulate actionable results. Cynthia Rudin and Steve Elston are co-presenters in this entertaining, informative and well-organized course. Both are really effective instructors, with quite different teaching styles. Cynthia covers the more theoretical topics including a fair bit of concepts relating to statistics. She is an entertaining presenter. A lot of personality and practical examples come through her presentations. She provides specific data science example project from her university lab, one of which concerns predicting manhole fires in Manhattan.