A business associate just spent two weeks in turmoil trying to recover data that was permanently lost. Their backup process was flawed. I've got a gnawing sense that this could happen to us too? What can I do to ensure my backups are occurring when they should and that they will be usable if I actually need them?
With both pride and relief -- I get to share news of completing the Capstone course for the Microsoft Professional Program in Data Science. Over the past six months, I've been systematically working through the nine courses leading up to this final 10th course in the series. You can look on this index page to see observations for the other courses.
Our company acquired a data file containing over 15,000 rows and 300 columns. We are trying to identify patterns in the data. Where do we begin evaluating such a large dataset? Would Using R be helpful?
A database analysts experience with the Microsoft Professional Program in Data Science. This page is an index of 10 blog entries describing coursework included in this new edX.org curriculum aligned with Microsoft data science tools. This entry concerns the overview course.
This course teaches exploratory data analysis skills using the Microsoft R Server implementation known as RevoScaleR. This product is in most ways functionally equivalent to the open source CRAN-R. RevoScaleR offers three significant benefits over its open source brother: the ability to run analyses in parallel across different servers, the ability to "chunk" data for evaluation and bypass the in-memory limitation of R, and the ability to read more natively from data sources like SQL Server, Hadoop, and Spark.
Programming R for Data Science is taught by Anders Stockmarr (on the faculty of Technical University of Denmark.) For US audiences, his accent requires some getting used to. He places emphasis on unexpected syllables and has a unique way of pronouncing many things. I found it helpful to use headphones and to adjust the playback speed of the recordings. It is worth making the effort to understand Dr.
My goal is to help others measure, report and make sense of the data that drives their organization. Topics include: database tools, visualization techniques, open data sources and anything else that helps you apply data science.