Skip to main content

DataBaselines Blog

Using R to Evaluate College Scorecard

Our company acquired a data file containing over 15,000 rows and 300 columns. We are trying to identify patterns in the data. Where do we begin evaluating such a large data set?

Rotary Member Directory

Our organization produces member directories every year. They are tedious to create and prone to typos / inaccuracies. How can I use a database to create a member directory or product catalog?

Microsoft Professional Program for Data Science

A database analysts experience with the Microsoft Professional Program in Data Science. This is the first of 10 blog entries describing course work included in this new curriculum aligned with Microsoft data science tools. This entry concerns the overview course.

MPP Data Science - DAT213 - Analyzing Big Data with Microsoft R Server

This course teaches exploratory data analysis skills using the Microsoft R Server implementation known as RevoScaleR. This product is in most ways functionally equivalent to the open source CRAN-R. RevoScaleR offers three significant benefits over it's open source brother: the ability to run analyses in parallel across different servers, the ability to "chunk" data for evaluation and bypass the in-memory limitation of R, and the ability to read more natively from data sources like SQL Server, Hadoop and Spark.

MPP Data Science - DAT209 - Programming with R for Data Science

Programming with R for Data Science is taught by Anders Stockmarr (on faculty of Technical University of Denmark.) For US audiences, his accent requires some getting used to. He places emphasis on unexpected syllables and has a unique way of pronouncing many things. I found it helpful to use headphones and to adjust playback speed of the recordings. It is worth making the effort to understand Dr.

MPP Data Science - DAT203 - Data Science Essentials

Data Science Essentials (DAT203) marks the point where we have enough foundation that we can start forming a bigger picture of data science. To that goal, the course provides this definition:

Data Science is the exploration and quantitative analysis of all available structured and unstructured data to develop understanding, extract knowledge, and formulate actionable results.