Part IV: The Dimensionally-Conformed Spreadsheet
In Parts I, II and III of this series, Building a High Value Low Cost BI Solution, I addressed some of the practical matters of project definition, database terminology and the the evolutionary method for migrating your data to a data warehouse while also receiving the immediate benefit of a useful visual reporting system. In this section I'll describe how you can organize data in Excel to create a Star Schema structure that will facilitate analysis with Tableau.
The structure is composed of a FACT table which contains numeric data which you apply math toward. For example, the FACT table for a sales reporting system would include invoice information like quantity sold and the extended price of the billing. The DIMENSION tables contain information that you use to sort and filter your sales data. Information like customer names, addresses, sales channel, salesman assigned, product manager, and product information.
The Spreadsheet Layout
Getting Excel to look act like a database is not difficult. Create fact table tab and your various dimension tables in other worksheet tabs. Tableau will be the tool that “joins” the FACT table tab to the DIMENSION table tabs when you create your visualization.
The FACT Table Example
Let's assume we're going to build a sales reporting tool. I want to make this tool robust so I'm going to pull out of my ERP system very granular sales data. In fact, I want to get every invoice for the past month and include every line item on each invoice. I want to capture the most "atomic" data I can because that will make my data very rich and enable information consumers to ask any question and get an answer. So, this file includes every line item on every invoice billed in the month.
Here's a screen shot of the FACT table data:
Pay close attention to the columns highlighted in yellow. Those columns contain Key Records that are used to connect the FACT table tab to the DIMENSION tables which are the red-colored work book tabs.
In this particular example the DIMENSION tables include customer address records, product classification records, customer sales channel information, company ship from locations and a unique fiscal date calendar in DIMDate.
Each yellow column header in the FACT table has a corresponding (identical) column heading in each one of the DIMENSION tables workbook tabs. This makes it very easy for Tableau to identify the records for joining the FACT and DIMENSION tables.
DIMShipto Example Table
Why this structure provides value
It may not be apparent yet exactly why I’m going to the trouble to structure the information this way. Why shouldn’t you just put all the information on one big workbook page? Well, there are a few good reasons.
1. Replicates how a database works
This structure closely replicates how the join trains actions would work in a real data warehouse. Data quality issues that result in NULL VALUES will be identified and can be dealt with quickly. Tableau can be used to create lists of problem records which can be distributed to the appropriate staff or resolution.
2. Create Unique hierarchies that may not exist in your host ERP system
The primary source system for the extract may not include all the sorting and filtering hierarchies that you would like to have available. In this particular example I created sales channel data and ship from location data (in plain English) which did not exist in the ERP system from which the sales factual data was extracted from. So, you can actually enhance the quality and accessibility of information this way.
3. May be more space efficient
Depending on your particular case, this method may be more space efficient, resulting in smaller sized files than would normally be the case with a single large workbook tab including everything.
4. Facilitate migration to an actual database
Should you have plans to implement an enterprise-class data warehouse this structure closely replicates how data will be stored when you migrate. The weekly or monthly process that you use to update your dimensionally-conformed spreadsheet will actually help you define the ETL (Engineering, Transform & Load) logic that the data architect will need to implement. This ETL programming will automate much of the data quality assurance in the warehouse. By working with the data in spreadsheets first you will greatly enhance your understanding of the quality issues. In many cases you can implement process changes that will eliminate the issue.
In addition, the structure will make it much easier for the data architect to specify the table layout and field contents of the actual data warehouse because you will be able to provide detailed records that define much of the conceptual design. Slight modifications will be recommended by the architect to take advantage of the power of the database but the basic structure design will be very similar.
When I first discovered Tableau Software my primary objective was to implement a data warehouse. I actually used dimensionally-conformed spreadsheets as the data source because it provided me with a way to get very visual reports our quickly, that were of high quality, that were flexible and were based on very granular data that had been cleaned and validated against the general ledger numbers that everyone saw in monthly financial statements.
Ultimately when I migrated to a full MS SQL Server 2005 database, my costs were 10% of the quoted costs for doing a data warehouse the traditional way. The key contributing factors:
-
I knew my data and had a clear understanding of data quality issues that would need to be addressed when updating the data warehouse
-
I had a very well documented dimensional structure for the data architect to review
-
I new exactly what the end user report requirements were
-
There was no learning curve for the information consumers when the actual data warehouse was rolled out. The structure the same. The only thing the users noticed was a slightly different login process and a massively increased amount of data.
In Part V of this Series, Connecting Tableau to Your Dimensionally-Conformed Spreadsheet, I’ll demonstrate how to connect Tableau to your spreadsheet FACT tab, DIMENSION tabs and how to change the default JOIN function in Tableau to perform a LEFT OUTER JOIN. This method will allow you to generate reports and identify data quality issues.
Full Disclosure: InterWorks Inc is a reseller of Tableau Software.