No IT Involvement
In this approach, IT doesn’t manage data sources at all – users import data into Power BI on an individual basis. This is fairly uncommon on an enterprise scale, but may occur where Power BI has been implemented departmentally. Data sources can include:
- SaaS applications with built-in connectors
- Manually extracted and updated files (CSV, Excel, etc.)
Typically, this approach simply represents a shift from manual Excel reports to somewhat less manual Power BI reports.
When this approach works:
- Users need to explore new data sources (business value has not been fully established),
- Users have a good understanding of the data source,
- Or the data isn’t mission-critical and has a limited audience (one team or department)
The advantages of this approach:
- The software vendor typically surfaces data in a form that makes sense to users (example: Salesforce)
- If files are being imported into Power BI, then users are typically very familiar with the specific structure of that file and the various assumptions underlying the data
The disadvantages include:
- Data is siloed, and the a business entity may be represented differently from how the rest of the organization sees it
- Refreshing data requires manual user input, unless a personal gateway is installed
- Routine reporting processes develop points of failure – individuals on whom the entire process depends
Expose Database Objects to Users
In this approach, IT manages access to specific database views or tables, but doesn’t perform the additional work of modeling or labeling it.
When this approach works:
- The database schema is labeled and comprehensible,
- The schema does not have many denormalized lookup tables (tables containing multiple sets of code-value pairs),
- Or documentation exists for unlabeled or ambiguous fields
- Users can choose between importing data into their model (and refreshing when necessary) or connecting live
- The data itself doesn’t need to be updated manually
- Data can be made accessible to Power BI rapidly by setting permissions
- The structure of the raw data may be undocumented
- The structure of the raw data may not be intuitive
- Labels on raw data (if they exist) may not give any indication of fields’ contents
In other words, raw data is often difficult to use and interpret. It may require users to devote significant time and effort to cleansing, transforming, and understanding the data.
Often, applications offer an option to export to Excel, which is often used as a data source. Users may revert to using Excel exports as a data source, which saves them the effort of transforming the data, but requires them to update their reports manually.
Model Data for Users
In this approach, IT ensures that data exposed to users is comprehensible and well labeled. This often takes the form of a denormalized star schema, but may include reporting views defined for specific use cases. With Power BI, this may take the form of an IT-managed dataset made available through the Power BI service.
When this approach works:
- The data has known business value,
- IT can build on the foundation of an existing Power BI model on the data (created by users),
- Or new data can leverage existing conformed dimensions in a data warehouse
- Business entities can be represented as dimension/lookup tables, which can be used for slicing multiple tables
- Business logic can be implemented upstream, creating a single source of truth
- Performance may be better, especially when the data source is heavily normalized
- Users need to rely on IT to model data
- Changes to the model come from the business but need to be implemented by IT
These disadvantages can be mitigated by embedding developers in business units or taking an Agile approach to make smaller, incremental updates.
Ideally, your organization should combine all three approaches. Adding data to the corpus can happen organically or as a planned project, and it can move from one approach to another. For example, a business unit may download data and import it into Power BI. After its business value has been established, IT can automate extraction of the data from a source database or via an API. Eventually, the business may identify value in transforming the data, labeling it, and integrating it with other data sources in an enterprise model.
Additional data sources can go through this same process and “bubble up” from user-created models to centralized, IT-managed models. Power BI is especially well suited to this conversion process. Even if IT doesn’t have first-hand experience with the source data, any modeling work done by users on imported data will be recorded in Power BI’s query editor or in DAX formulas. With some effort, that work can be translated and reimplemented as SQL.
The matrix below represents the various approaches and when they should be used, based on characteristics of the data in question:
The rise of self-service BI tools does not guarantee that users will actually be able to use data made available to them. The level of IT involvement in managing data available in Power BI may vary by data source:
- When raw data is difficult to understand, self-service BI initiatives can benefit from greater IT involvement, with developers working with the business to apply classical modeling techniques (e.g. Kimball methods) before exposing data to users
- When raw data is well labeled and well structured (e.g. consistently normalized or denormalized, or surfaced in pre-constructed views), Power BI’s modeling tools make it fairly easy for savvy users to use that data
- And when new data sets enter the picture, the ability to import data into a local model and combine it with other datasets gives users greater exploratory power
Finally, Power BI’s query editor is an extremely useful tool in leveraging work done by power users; developers can see every step applied in transforming the data and every formula used in calculations.