Your business is growing,
and so are your questions.
Now you’re at a point where you want to know things like:
What is my advertising return on investment by channel?
Are my customer support initiatives improving my customer lifetime value (CLV)?
Are customers acquired through inbound or outbound marketing more likely to churn?
Why you need a data warehouse
To answer these, you need to analyze data from multiple sources. You thought that your business intelligence tool would “connect” all of your data sources and allow you to visualize them on a single dashboard. But as your questions begin to get more complex, reality is setting in. Pulling your separate data streams into one dashboard isn’t the same as being able to do real cross-domain analysis.
What is “Cross-Domain Analysis”?
When you want to calculate advertising ROI by channel, you need to join data from each of your individual ad platforms with data from your transactions database. The ad platforms know how much money you spent, and your transactions database knows how much each customer bought.
To measure the impact of customer support initiatives on CLV you need to join data from your customer support platform and your transactions database. Your customer support platform knows who’s filing tickets and, again, your transactions database knows how much each customer bought.
Cross-domain analysis is simply the act of analyzing data from multiple “domains”, or sources. It can be more challenging than analyzing data from a single data source, but it’s also where you’ll find the insights that will catapult your business out ahead of the competition.
How do I “join” this data?
When you join data, you’re taking data from two different sources and mashing it together so that you can analyze it all at once. The most important part of this process is establishing a key relationship.
A key is a single column that exists in both sources. If you’re relating data from ad networks to your transactional database, the key is probably campaign name. If you’re relating data from your customer support platform to your transactional data, the key is probably customer ID or email address. In both cases, this is data that both systems know, and you use that to mash the data sets together and analyze them.
Joining this data together isn’t easy. Exporting the data to a spreadsheet and hacking something together is definitely possible, and many people do it. But the more you find yourself doing this, the more time-consuming (and soul-sucking) this becomes.
The plot thickens
You’ll also find that there are a bunch of other details you’ll need to take into account. Different systems report data in different timezones, and you’ll need to correct for this. You may also need to perform currency conversions.
To further complicate things, you’re not the only one working with this data. Other people in your organization are also running similar analyses. This duplicate work is highly inefficient and error-prone. Don’t be surprised when you end up in a meeting looking at three different “monthly revenue” numbers.
When you find yourself here, you know it’s time for a data warehouse.
Enter the data warehouse
A data warehouse is a single central location unifying your data. Building your analytics around a data warehouse gives you a powerful, centralized, and fast source of data.
To build a data warehouse, you first need to copy the raw data from each of your data sources, cleanse, and optimize it. The process of getting data into a data warehouse is called ETL: Extract, Transform, Load. Here’s the SparkNotes version:
1. Extract the data from the source system
One of the primary challenges is that you’ll need to avoid copying all the data every time you sync. Some of your data sources could have millions or billions of records, and copying every record during every sync isn’t a good idea. This optimization is when things really get complicated.
2. Transform the data
In this step, data is cleansed, denormalized, and pre-calculated so that it’s ready for analysis. Cleansing the data means that you resolve any inconsistencies (i.e., links that have been tagged in different ways, or order statuses that have been tracked differently over time). Denormalizing optimizes the data warehouse’s ability to read data, and pre-calculations include the calculations that you know you’ll need frequently (i.e., total revenue per client and number of orders per client will be critical to calculating CLV).
3. Load the transformed data into the warehouse
Once your data is loaded into your warehouse, it’s officially ready for analysis. You can query it directly via SQL, or you can work with it in a business intelligence tool. Before you copied all of your data into a central location, you could only look at separate pools of data. With a data warehouse, you now have a deep well of information to draw from.
This sounds like an ambitious undertaking, and it is. Keep in mind, when you’re first starting out, you absolutely do not need a data warehouse. However, as you grow and become more data-driven, you will start pushing at the limits of the kinds of analysis you can do. Here are the signs that it’s time:
- You’re generating too much data for Excel to handle
- You’re downloading data exports from multiple systems and combining them together in Excel
- You’re wasting too much time updating existing reports
- You’re seeing discrepancies in reports on the same data, generated by different departments
- You’re delaying decisions while waiting for answers
Companies at different stages of growth need very different data infrastructure strategies. When it’s time to move on to something bigger and better, you’ll know.
Why having a data warehouse rocks
Having a data warehouse allows you to make smarter decisions, faster.Here are some other benefits:
- It can store and analyze a huge amount of data. Traditional database technologies are built to be really good at processing transactions–creating, reading, updating, and deleting records one at a time. An entirely new set of data warehousing technologies are now becoming available, including Amazon RedShift, Vertica, and Teradata, that is far better at analyzing massive datasets. Using these data stores as the foundation for your warehouse will result in speedy analysis that can scale to the moon.
- Generating reports won’t slow your system down. Running BI tools directly on your transactional systems can be a huge strain, and can even cause your website to grind to a halt. With a data warehouse, analysis happens completely independently of your transactional systems.
- No more conflicting reports. Having a single data repository for your organization means everybody is looking at the same data. No more duplicative work or conflicting results.
- Greater data security. With a single location, you can provide secure access only to the people who need to view specific data.
I'm ready. How can I get one?
If you’ve read this far, you’ve probably gotten the impression that data warehouses are expensive, time-consuming to build, and require constant ongoing maintenance as data needs change. You might have even been told this by vendors looking to scare you away from attempting to build a data warehouse.
All of this is true. Or at least, it used to be.
Here’s the good news. At RJMetrics, we build your data warehouse for you. We manage the entire ETL pipeline, customized for your data sources and your business rules. And we store the data in a massive warehouse powered by Amazon RedShift. We can take any data from any source, and analyze however much of it you have. And we can easily change any aspect of your warehouse with a few clicks, so you’ll never have to go running to your tech team to make updates.
We’re intimately familiar with the challenges involved in building and maintaining data warehouses. We really don’t think you should have to be.