Skip to main content

Data Wrangler

Data Wrangler, In These Parts?

Organizations today have data spread over multiple locations and formats. A common example is a Salesforce-type CRM application in the cloud with an inventory system on-premises as an Oracle database. You chose these applications because they are the best in their class. But from a business-planning standpoint, you now have isolated silos of data. You want to be able to leverage cross-system views of this data. You need to somehow corral it into a common space. In cowboy-technical terms, you need the help of a data wrangler.

Sharing data with business partners represents a similar type of challenge. These types of transactions vary by industry. But from medical professions to manufacturing industries -- the unmistakable trend is towards sharing of electronic records. These sorts of behind-the-scenes activities strengthen your relationship with business partners. Done right, they are set-it-and-forget-it sorts of services. But who on-staff can figure out the data formats and ensure that data connections with partners are secure?

There are many other occasions that you need to wrestle data from one format into another. These are all examples of the data wrangler role and are areas that we can help you. 

A data wrangler discussion with us will almost always include some reference to a powerful tool by Microsoft. It's called SQL Server Integration Services (SSIS). This is included with any license of SQL Server. The tool can connect to nearly any type of data source. It offers great flexibility for quickly transforming data and defining import/export logic. You can schedule these jobs to occur on a recurring basis. Simply put, SSIS is fantastic. It's the primary tool we depend upon for doing any data manipulation. 

The paragraphs below describe typical services in this category:

Data Wrangling Variations

These are data transformations that need to be run on a recurring basis. They may be part of a data warehouse load or represent partner data sources that are published nightly. These often interact with a remote SFTP server or are imported into a local database. SQL Server Integration Services (SSIS) is used for nearly all of these data wrangling tasks. SSIS readily supports munging of data from various destinations. It can publish them to SQL Server or other destinations. Daily "ETL" tasks can also involve the creation of data byproducts such as Excel or text-file extracts. These extracts can sometimes make it easier for employees to work with company data.
We are frequently asked to perform one-time data migrations from two different vendor systems. This typically occurs when a customer is transitioning from one software product to another. These projects require becoming familiar with the specifications of the giving and receiving data systems. They require data standardization and clean up as part of the process. Typically, we can do these types of migrations at lower costs than the other vendors involved.
These are one time or ongoing cleanups of data. They include address standardization and the use of reference data sources (Dun and Bradstreet data sources for example.) Another example is de-duplication which involves the identification and processing of duplicate customers for example. There are a variety of helpful tools in this area, including SQL Server Data Quality Services, Master Data Services, regular expressions or SQL scripts.
We can assist with ongoing data capture through hardware device integration such as barcode readers, Scantron devices, biometric sensors, or RFID hardware. This also includes web service, reference services, and SOAP type integrations.
These are typically health-care specific transactions in which medical records are electronically distributed in an HL7 file format. This sort of project places a high value on security and fault-tolerant messaging between remote servers. Mirth Connect is a powerful open-source toolbox that greatly simplifies the management of these machine-to-machine communications. The features and open-source licensing of this product make it a compelling choice for any organization tasked with managing these information exchanges.

Different Sort of Cattle Drive

In all of these cases, we are harnessing the computer to perform tasks that are tedious and demoralizing for your staff. This type of automation frees your staff for higher purposes. The staff-time-savings is often the clearest business use-case for recovering our expense. Data accuracy is another frequent side-benefit resulting from the automation of these processes. For a detailed example, read this data quality scenario.

What data do you wish you could harness more effectively? Call us and let us know how we can help. And we promise, no more cowboy puns.