Data Harmonization

Data harmonization & Azure Data lake

Volume of data

1 TB

streaming and batch data

Business Impact

75%

cost reduction

Outcome

200x

performance improvement

Problem statement

Client has more than 200+ sources from Oracle Fusion and other API sources. Existing reports takes more than 3 hours to load the data in a report. The existing solution couldn’t scale as they have manually created the mapping of multiple sources. The cost of consumption is more than $40k per month. Customer wanted to have a scalable and low-cost model to reduce the cost and build better system of design. 

Challenges

  • Use of Different datasets for ingestion and ETL techniques for streaming and historical data. 
  • More than 200+ sources and managing the data pipelines with an automated way.  
  • Understand the complexity in designing data models and marts for performance improvements.  
  • Reduce the cost of consumption to improve overall business adoption.

Technologies used

Solution

Antz helped customer by creating an automated data pipeline management which can manage data ingestion through metadata management. Created a logical data model on top of data lake and created a Delta Lake House. Reduced the time to retrieve the reports using aggregated data model layers.  

Result

  • Automated data ingestion pipeline management. 
  • 200x reduction in performance of data retrieval.  
  • Reduced the cost of consumption from $40,000 per month to $5000 per month

Have a similar requirement, lets talk