EDP Migration

Kadiyala Ajay

Data Engineer

Python

SQL

Reponsibliities:

• Responsible for building scalable distributed data solutions using Spark.

• Ingested log files from source servers into HDFS data lakes using Sqoop.

• Developed Sqoop Jobs to ingest customer and product data into HDFS data lakes.

• Developed Spark streaming applications to ingest transactional data from Kafka topics into Cassandra tables in near real time.

• Developed an spark application to flatten the transactional data coming from using various dimensional tables and persist on Cassandra tables.

• Involved in developing framework for metadata management on HDFS data lakes.

• Worked on various hive optimizations like partitioning, bucketing, vectorization, indexing and using right type of hive joins like Bucket Map Join and SMB join.

• Worked with various files format like CSV, JSON, ORC, AVRO and Parquet.

• Developed HQL scripts to create external tables and analyze incoming and intermediate data for analytics applications in Hive.

• Optimized spark jobs using various optimization techniques like broadcasting, executor tuning, persisting etc.

• Responsible for developing custom UDFs, UDAFs and UDTFs in Hive.

• Analyze the tweets json data using hive SerDe API to deserialize and convert into readable format.

• Orchestrating Hadoop and Spark jobs using Oozie workflow to create dependency of jobs and run multiple Jobs in sequence for processing data.

• Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.

• Tech stack: Scala, Hadoop, Spark, Spark SQL, Spark Streaming, Hive, Cassandra, MySQL, HDFS,Apache Kafka.

Like this project

Posted Feb 24, 2023

Worked on building various scalable distributed data solutions using Big data technologies tools like spark, hive, hadoop, scala, kafka etc.

Likes

Views

EDP Migration

Join 50k+ companies and 1M+ independents