Pareto Data Platform

What is it?

The Pareto Data Platform is an architecture blueprint for a data platform that fits 80% of the needs in a medium-size company with just a 20% of complexity

Assumptions

  • The main use case for the data platform is Business Intelligence
  • The company already use Microsoft Services, especially Microsoft Entra ID
  • The major data sources run on-premise

Components

The Pareto Data Platform is composed of:

  • Azure Data Factory as ETL Tool
  • Azure Data Lake as Persistent Staging Area
  • Azure Synapse Dedicated SQL Pool as Data Warehouse
  • Microsoft Power BI as Business Intelligence Tool

Azure Data Factory as ETL Tool

Azure Data Factory is a cloud-based data integration service that enables you to automate data movement and transformation

Combined with Microsoft Integration Runtime, we will have an enterprise-grade ETL tool capable of extracting data from applications and databases running on-premises

Azure Data Lake as Persistent Staging Area

Azure Data Lake extends the capabilities of Azure Blob Storage to support hierarchical role-based access control while it remains extremely cost-effective

Using such as simple API and plain file-based storage will help us stay vendor-neutral and be able to migrate to any other cloud provider in the future

Storing the data in the Azure Data Lake before moving it to the data warehouse will create an additional backup layer

Azure Synapse Dedicated SQL Pool as Data Warehouse

Azure Synapse Dedicated SQL Pool is a fully managed elastic data warehouse that provides a scalable and cost-effective solution for storing and analyzing large volumes of data

The reason to use a Dedicated SQL Pool instead of the Serverless alternative is the higher performance and improved cost transparency

The Serverless SQL Pool might be a better choice for smaller workloads or sporadic data analysis but we assume uninterrupted usage by our BI users during working hours

Microsoft Power BI as Business Intelligence Tool

Microsoft Power BI can be used either as Self-Service BI Tool for explorative data analysis or as centralized BI tool to share standarized KPIs and reports across the company

Empowering our BI users will improve the return of investment in our data platform by letting users leverage the whole power of the existing data and can also speed up the analysis and design of new reports

With centralized semantic models, dashboards and reports, we will make sure that we have a single source of truth for our BI users

The Data Flow

HTML comment recognizes as a presenter note per pages.

You may place multiple comments in a single page.