Datalake & Real-Time Analytics for Media Streaming Platform

Gautam Gupte

Brief description :
Margo Networks (a subsidiary of Zee Entertainment) developed Sugarbox, a hyperlocal infotainment platform enabling users to stream content via mobile devices. Real-time log and usage data from edge servers is processed using the Hortonworks Hadoop ecosystem and Spark Streaming with exactly-once guarantees. Insights delivered: billing, content consumption analysis, footfall trends, and partner performance & operational metrics. Additional epic use cases for AWS migration, campaign performance management, and sentiment analysis are mentioned in the additional content section in detail.

Detailed description:

Margo Networks Pvt Ltd, a subsidiary of Zee Entertainment Ltd, is a media technology company.

Project Description:

Margo's flagship product called Sugarbox is an info-entertainment server deployed in a hyperlocal loop at various client locations. The consumers can watch content with their mobile devices in the vicinity of the local loop. At the backend, various consumption and machine/server logs are processed in real time. Using Hortonworks' Hadoop ecosystem and state-of-the-art technologies such as Spark Streaming ensures an exactly-once processing mechanism for billing and other analytical use cases like footfall analytics, content analytics, and deployment partner analytics.

Technical Environment:

HDP 3.1.0, Spark Streaming 2.3, Kafka 1.1, Hadoop 3.1.1, Hive 3.1.0, Sqoop 1.4.7, Oozie 4.3.0, MySQL 5.5+ Galera, Apache Superset

Project Epic Description: Measuring Campaign Performance Based on Tweet Sentiments

Periodic campaigns to spread awareness and improve usage needed a metric to measure their effectiveness. A Tweepy and Amazon Comprehend-based solution was developed to capture tweets and perform sentiment analysis. The objective was to measure campaign performance and gain direct user feedback for third-party campaigners.

Technical Environment:

Tweepy 3.7, BOTO3 1.12.0, Amazon Comprehend (ver: 2017-11-27)

Project Epic Description: Migration to AWS

Given the anticipated business and data growth, the current data center-based infrastructure was found inadequate. Due to capex constraints, a cloud-based solution using AWS was adopted. With minimal codebase changes, a solution using EMR for streaming/batch Spark apps, S3 for storage, DMS for data migration, and MySQL on EC2 was designed and implemented using Re-host and Re-platform approaches. Historical and ongoing data was transferred using custom scripts and DMS. Post-migration, cost optimizations and resizing were conducted.

Technical Environment:

EMR 5.30, EC2, S3, DMS, Athena

Role:

Leading the Analytics function in entirety
Owning the complete analytics function and end-to-end solutions
Developing and collaborating on the BI vision and roadmap
Aligning business use cases with strategic goals and managing delivery
Sharing insights based on comparative data trends
Like this project

Posted Jun 11, 2025

Led data management and analytics for Sugarbox, delivering insights on billing and content consumption using Hadoop and Spark.

Healthcare Informatics: Adv. analytics platform for pharma firm
Healthcare Informatics: Adv. analytics platform for pharma firm
Hybrid Data Platform and Integration Hub Development
Hybrid Data Platform and Integration Hub Development

Join 50k+ companies and 1M+ independents

Contra Logo

© 2025 Contra.Work Inc