Skip to main content

MPP

Brief description of what the MPP section contains.

DAT102 Data Science Capstone

·1169 words·6 mins
With both pride and relief – I get to share news of completing the Capstone course for the Microsoft Professional Program in Data Science. Over the past six months, I’ve been systematically working through the nine courses leading up to this final 10th course in the series. You can look on this index page to see observations for the other courses. This includes some general advice for those considering taking these courses. The blog entry you are reading now is specific to the Capstone course. It explains the hoops you need to jump through, the time you will likely need to complete, and a variety of other observations about the Capstone.

Microsoft Professional Program in Data Science

·633 words·3 mins
Microsoft recently announced a 10-part online course entitled Microsoft Professional Program in Data Science. Over the last couple of months, I’ve been working through this sequence and wanted to share with others what the experience has been like. The topics are a pretty direct “hit” for me as I’ve wanted to shore up my skills on the analysis side of things to complement skills in SQL Server. Curriculum for Microsoft Professional Program in Data Science # The curriculum is provided via edX.org and consists of 9 classes with a 10th element being a capstone project. The courses can be audited for free. If you are interested in completing all 10 you’re eligible for a new badge of sorts known as a “Microsoft Professional Program Certificate in Data Science”. The certificate status requires paying for individual classes. Program details are here: https://academy.microsoft.com/en-us/professional-program/data-science/

DAT213 - Analyzing Big Data with Microsoft R Server

·773 words·4 mins
This course teaches exploratory data analysis skills using the Microsoft R Server implementation known as RevoScaleR. This product is in most ways functionally equivalent to the open source CRAN-R. RevoScaleR offers three significant benefits over its open source brother: the ability to run analyses in parallel across different servers, the ability to “chunk” data for evaluation and bypass the in-memory limitation of R, and the ability to read more natively from data sources like SQL Server, Hadoop, and Spark. This course explains these benefits and allows a new user to become familiar with the RevoScaleR tool. Analyzing Big Data with Microsoft R Server # The course is divided into 4 segments:

DAT209 - Programming R

·373 words·2 mins
Programming R for Data Science is taught by Anders Stockmarr (on the faculty of Technical University of Denmark.) For US audiences, his accent requires some getting used to. He places emphasis on unexpected syllables and has a unique way of pronouncing many things. I found it helpful to use headphones and to adjust the playback speed of the recordings. It is worth making the effort to understand Dr. Stockmarr because he has put together a course with a lot of substance, using a tight script and backed up by supporting exercises. Programming R Course Highlights # I genuinely enjoyed this course, it goes a lot deeper than the introductory course in R taken earlier in the MPP sequence. For the first course, I used RStudio to experiment. With this course, I wanted to use the Visual Studio version of R to work the exercises and labs. The R Tools for Visual Studio (https://www.visualstudio.com/vs/rtvs/) required some fiddling to get installed, but were stable and had nice IDE features I’ve become used to with VS. Becoming familiar with R Tools for Visual Studio at this point will prepare you for taking DAT213 “Analyzing Big Data in MS R Server” which is the logical follow-on course in the MPP Data Science sequence.

DAT203.2 - Principles of Machine Learning

·427 words·3 mins
Principles of Machine Learning (DAT203.2) is the 7th in a series of 10 courses that form the Microsoft Professional Program in Data Science. It proves that the further you get into this 10-course sequence, the more enjoyable the classes become. Similar to Data Science Orientation, this class is co-led by Cynthia Rudin and Steve Elston. Principles of Machine Learning # The lecture is composed of 60 videos spanning 8 hours lecture time. Watching them and working the exercises reveals the true practical value of the data science tools. This course forces you to genuinely harness the Azure Machine Learning environment with Python or R scripts. All told, this course required about 30 hours to complete.

DAT203-1 - Data Science Essentials

·392 words·2 mins
Data Science Essentials (DAT203) marks the point where we have enough foundation that we can start forming a bigger picture of data science. To that goal, the course provides this definition: Data Science is the exploration and quantitative analysis of all available structured and unstructured data to develop understanding, extract knowledge, and formulate actionable results. Cynthia Rudin and Steve Elston are co-presenters in this entertaining, informative and well-organized course. Both are really effective instructors, with quite different teaching styles. Cynthia covers the more theoretical topics including a fair bit of concepts relating to statistics. She is an entertaining presenter. A lot of personality and practical examples come through her presentations. She provides specific data science example project from her university lab, one of which concerns predicting manhole fires in Manhattan.

DAT204 - Intro to R for Data Science

·433 words·3 mins
As a developer, I’m drawn to terse/concise languages that are purpose-built for an objective. Regular expressions are a prime example. There is something beautiful about expressing things in few words (something I try to do in blogging with only partial success!) In this context, I was eagerly anticipating “Intro to R for Data Science.” This course (and this language) did not disappoint. Before going further, I should note something: Within the Microsoft Professional Program for Data Science, it is the student’s discretion to take a Python or an R track. Your decision will be shaped by whether you have prior familiarity with one of those environments, and whether you want to reinforce what you already know or venture into a new tool. Your choice of this class will logically dictate the 2nd “advanced” course required later in the MPP track.

DAT222x - Essential Statistics for Data Analysis using Excel

·545 words·3 mins
Call me a nerd, but statistics are fascinating and useful. I’d had quite a bit of course-work years ago in school, and was looking forward to “Essential Statistics for Data Analysis using Excel” as a refresher course. Unfortunately, the experience of this edX course might be tag-lined “Sadistics.” Completing this was a painful experience. I hope the notes here will make the experience a bit more tolerable for others. Essential Statistics for Data Analysis using Excel # There is a huge amount of content being presented. In terms of coverage, it is worthy of a full semester college course in statistics.

DAT207x - Analyzing and Visualizing Data with PowerBI

·230 words·2 mins
The course “Analyzing and Visualizing Data with PowerBI” is devoted to showing the capabilities of this Microsoft tool. For those who have worked previously with Excel, Microsoft Access or SQL Server Reporting Services – the video demonstration of PowerBI capabilities will cause you to repeatedly think “wow – that is slick.” As an example, pictured below is one of the dashboards created as part of the course. Analyzing and Visualizing Data with PowerBI # The course is composed of approximately 120 videos whose duration varies from 1-5 minutes. There are 4 different people presenting and the video content is quite good. The pace of content is well measured and the videos nicely support the lab materials. This is a really enjoyable course outlining capabilities of an innovative tool.

DAT201 - Querying with Transact-SQL

·643 words·4 mins
Following the breezy orientation course, the Microsoft Professional Program for Data Science curriculum digs into the Microsoft dialect of SQL known as Transact-SQL. This course briefly addresses updating data, stored procedures, transactions and error handling – but the bulk of the course concerns extracting data from SQL Server. This is the 2nd in a 10-part online course sequence for which I’m documenting my experience for others. The course title is “DAT201x: Querying with Transact-SQL” Querying with Transact-SQL # Transact-SQL gives you a lot of dexterity for pulling out data and the video series covers these topics comprehensively if not very deeply. All standard features of SELECT (where, group by, having clauses) but also topics like joins/intersect/except, correlated sub-queries, common-table expressions, grouping sets, and rollup/cube topics are included.

DAT101 - Data Science Orientation

·377 words·2 mins
“Data Science Orientation” is the first class in the Microsoft Professional Program (MPP) for Data Sciences. This class is a warm-up exercise for the larger program. It outlines the 10-part certificate process and introduces you to five current data scientists who answer questions about their career. Are you wondering what skills and personality traits will help you succeed as a data scientist? The professional interviews offer practical insight on those topics. The enthusiasm these persons have for their career is contagious. Hearing them speak is just the sort of encouragement one needs while embarking on the MPP courses. Videos in the Data Science Orientation # The video sequence then walks you through the fundamentals of statistics. Topics covered include variable types, populations versus samples, descriptive statistics, variance, correlation, T-tests and ANOVA. The authors use a fictional example of a lemonade stand to explain the topics. This part of the video sequence was just “ok”. If you’ve had prior statistics coursework – the treatment here may help jog your memory on topics. I found the presentation of these technical topics to be a bit “clouded” in their delivery. As part of the class, they provide a PDF study guide. It gives you an accurate sense of the statistics topics covered in this class. It emphasizes the calculation of measures more than the practical uses of measures. The class assessment consists of a dozen multiple choice questions and an Excel-based lab with questions.