EDP Migration

Kadiyala Ajay

Data Engineer
Python
SQL
Reponsibliities:
• Responsible for building scalable distributed data solutions using Spark.
• Ingested log files from source servers into HDFS data lakes using Sqoop.
• Developed Sqoop Jobs to ingest customer and product data into HDFS data lakes.
• Developed Spark streaming applications to ingest transactional data from Kafka topics into Cassandra tables in near real time.
• Developed an spark application to flatten the transactional data coming from using various dimensional tables and persist on Cassandra tables.
• Involved in developing framework for metadata management on HDFS data lakes.
• Worked on various hive optimizations like partitioning, bucketing, vectorization, indexing and using right type of hive joins like Bucket Map Join and SMB join.
• Worked with various files format like CSV, JSON, ORC, AVRO and Parquet.
• Developed HQL scripts to create external tables and analyze incoming and intermediate data for analytics applications in Hive.
• Optimized spark jobs using various optimization techniques like broadcasting, executor tuning, persisting etc.
• Responsible for developing custom UDFs, UDAFs and UDTFs in Hive.
• Analyze the tweets json data using hive SerDe API to deserialize and convert into readable format.
• Orchestrating Hadoop and Spark jobs using Oozie workflow to create dependency of jobs and run multiple Jobs in sequence for processing data.
• Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
 
Tech stack: Scala, Hadoop, Spark, Spark SQL, Spark Streaming, Hive, Cassandra, MySQL, HDFS,Apache Kafka.
Partner With Kadiyala
View Services

More Projects by Kadiyala