• Handled importing data from various data
sources, performed transformations using Hive, Map Reduce/Apache Spark, and
loaded data into HDFS.
• Extracted the data from Oracle Database into HDFS using the Sqoop.
• Loaded data from Web servers and Teradata using Sqoop, Spark Streaming API.
• Utilized Spark Streaming API to stream data from various sources. Optimized
existing Scala code and improved the cluster performance.
• Experience in working with Spark applications like batch interval time, level
of parallelism, memory tuning to improve the processing time and efficiency.