What is data modeling?
In business intelligence and analytics, “data modeling” commonly refers to dimensional modeling, which has a specific set of techniques and best practices.
Many other methods and techniques get labeled as data modeling, some with conflicting recommendations. For this article, I’ll discuss data modeling as applied using dimensional modeling techniques.
What are the benefits?
Technology alone doesn’t grant insights, nor does it make the business capable of using data every day to find those insights. Data modeling can provide the missing link; it’s a part of the enablement pillar of a data-driven culture.
The short version is:
- Modeled data is easier to use
- Modeled data is more consistent
- Modeled data is closer to reality
Below, I elaborate on some of these benefits.
Structured data is easier to interpret
Organizations with an investment in big data tend to have lots of unstructured or semi-structured data available. It can be tempting to point client tools directly at a folder of JSON files – and sometimes, it makes sense to do so. However, these files may have an unfamiliar format (e.g. objects containing arrays of other objects).
For people who do their day-to-day work in Excel, this isn’t easy to work with. Taking that JSON document and converting it to a structured, tabular form makes it easier to use.
Core business entities are represented compactly
In dimensional modeling, we identify the entities that make up our business – customers, products, vendors – and create a dimension table for each entity. For the business, this means that all descriptive information for an entity can be found in one spot. For example, all customer-related attributes are located in the customer dimension, instead of being spread out across five tables.
Business users don’t need to figure out joins
By condensing all information related to an entity, we eliminate the need for business users to understand joins. There’s no need to figure out how to link to a lookup table to get your customer’s postal code; the information is already associated with that customer.
When relating dimensions to facts, joins are usually simple and obvious. This means that some client tools (or semantic layer tools, such as SSAS) can automatically define those joins.
Naming conventions make data easier to find
Most businesses need to integrate multiple systems for reporting. These systems often have different schemes for naming their data fields. For example, some will use acronym-heavy column names, while others use verbose, title-cased names. A user trying to combine two such systems on their own will have to do the hard work of figuring out what data resides in each field. A data model unifies systems using standard conventions.
Metrics can be analyzed from multiple perspectives
In the absence of a well-formed data model, developers or business users often create large, wide reporting tables. These tables contain all of the information they could need – including slicing/dicing attributes – for a single use case. However, this causes two problems:
- Adding other metrics or attributes to the table makes it unwieldy
- Slicing/dicing other metrics using the same attributes means reinventing the wheel for the other metrics
In a dimensional model, many facts can be analyzed using the same dimensions. The model allows flexibility.
Reporting and analytics start from a common base
Without an established model, different business units tend to create their own siloed models. These models diverge and often conflict. For example, Marketing and Sales may have different definitions of products or customers. Reconciling these differences is hard work, but enables a global view of your data.
History can be recorded
A key aspect of dimensional modeling is the concept of slowly changing dimensions (SCD). These allow you to capture the history of business entities over time. Because most applications only represent the current state, analytics-based entirely on Excel exports or ad hoc queries ignore historical context, such as the movement of a salesperson from one territory to another. Capturing history allows reporting to stay consistent as source data changes.
Whatever your investments are in technology, I’d recommend you take a serious look at modeling your data. Quantity of data alone doesn’t enable its use. To get full value from your data, you need to make it easy for your business to use and understand.