Data harmonization & Azure Data lake
Volume of data
1 TB
streaming and batch data
Business Impact
75%
cost reduction
Outcome
200x
performance improvement
Problem statement
Client has more than 200+ sources from Oracle Fusion and other API sources. Existing reports takes more than 3 hours to load the data in a report. The existing solution couldn’t scale as they have manually created the mapping of multiple sources. The cost of consumption is more than $40k per month. Customer wanted to have a scalable and low-cost model to reduce the cost and build better system of design.
Challenges
- Use of Different datasets for ingestion and ETL techniques for streaming and historical data.
- More than 200+ sources and managing the data pipelines with an automated way.
- Understand the complexity in designing data models and marts for performance improvements.
- Reduce the cost of consumption to improve overall business adoption.
![](https://jp0ddc.a2cdn1.secureserver.net/wp-content/uploads/2021/05/Setup-Analytics-bro-300x300.png)
Technologies used
![](https://jp0ddc.a2cdn1.secureserver.net/wp-content/uploads/2021/05/data-harmonization-tech-used.png?time=1722019912)
Solution
Antz helped customer by creating an automated data pipeline management which can manage data ingestion through metadata management. Created a logical data model on top of data lake and created a Delta Lake House. Reduced the time to retrieve the reports using aggregated data model layers.
Result
![](https://jp0ddc.a2cdn1.secureserver.net/wp-content/uploads/2021/05/Data-report-pana-300x300.png)
- Automated data ingestion pipeline management.
- 200x reduction in performance of data retrieval.
- Reduced the cost of consumption from $40,000 per month to $5000 per month