Hammad Tariq
Problem
With the currently implemented mechanism the university faces some of the major concerning issues including Data Security, Operational speed, operational cost, data integrity and platform specific limitations. A detailed overview of the root cause of each of the problems is given by
Data Security
It is not secure to directly access the master database for reporting.
Operational Speed
Since the reporting has to be done with raw data, it takes a lot of time to load the data and present it in the dashboards
Data Integrity
The data coming from the master database is raw data and it has to be manipulated while generating the reports that makes the process complex and hectic.
Qlikview Limitations
Qlikview has limitations when it comes to integration with other tools and systems that makes it difficult for the BI engineers to perform other operations
Solution
We decide to help the university bring up a new architecture for their reporting mechanism that is more secure and reliable in terms of security, integrity and reusability. The stakeholders decided to switch to Power BI for their dynamic dashboard and day to day report generation. We planned to achieve this by using data marts to handle the reporting data instead directly using the master database for day to day reporting operations
Strategy
To implement the new architecture, we decide to distribute the operations into steps to make it easy for everyone to understand and provide efficient support and troubleshooting
The first step is to create data marts to handle the data at different stages. One is a staging database and a reporting database to handle the operations and reduce the usage of the master database.
The data marts are setup in MySQL server. The staging database pulls raw data from the master database and the reporting database is used to hold refined data tables that are used for reporting
Then, we setup a data pipeline that pulls raw data from the staging database, perform quality checks, handle unwanted and missing data points and create views for the refined tables in the reporting database
The reporting database is then utilized to build a dynamic dashboard on Power BI with extensive filters and sorting algorithms
Tools and Technologies
ETL Tools
We used SQLMesh, and Dagster to build the ETL pipeline. The pipeline pulls raw data from the staging database, perform quality checks, handle missing data points and create views for the refined tables in the reporting database
Data Storage
After transforming the data for the reports, we utilized MySQL server to store the data in a staging database. The staging database is then used to pull data and create views in an enterprise database that is directly linked to the dashboards
Visualization
We built dynamic dashboards in Power BI with source being the enterprise database in MySQL server that contains views for specific data tables. Power BI uses these views to visualize the data on charts and dashboards dynamically
Results
Security
With the current approach, the university has secured the master database from unnecessary access and excessive read/write operations.
Performance
The dashboards on Power BI are enriched with cleaned data instead of raw data thus takes less time to load the results dynamically as the amount of data is reduced
Cost Efficiency
The ETL operations have been standardized and the pipeline is orchestrated that requires minimum human interference optimizing operational costs by 35%
Summary
The project tasks were achieved by using a different set of tools and techniques that allows secure and reliable data processing without disturbing the day to day reporting, optimizing the workflow for speedy progress and optimizing the operational costs. The break down of the process into modules makes it easy for BI engineers to troubleshoot and monitor the operations