Big Data Cluster Architecture

Ashish Sharma

Business Analyst
Data Analyst
Data Engineer
Power BI
Python
Bikanervala
◦        Objective – To develop efficient big data solution.
◦        Challenge – To cater large data (+3TB) and multiple data sources.
◦        Approach – To deploy cluster architecture with single data source.
◦        Solution – Implemented Hadoop (HDFS) + Spark (PySpark) + Airflow (Scheduler). A Linux based cluster architecture with compressed data warehouse solution for having single data source. Full architecture was open source.
◦        Result – ETL process time reduced by 10x.
Partner With Ashish
View Services

More Projects by Ashish