š Stock Data Extraction using Apache Kafka, Cassandra & Confluent
This project demonstrates how to extract and stream real-time stock market data using Apache Kafka, process it with Python, and persist it in Apache Cassandra. It leverages Confluent Platform to simplify Kafka setup and management.
š ļø Tech Stack
Python
Apache Kafka (for real-time data streaming)
Confluent Platform (for easier Kafka management)
Apache Cassandra (NoSQL database for storing stock data)
Kafka-Python (Kafka client library)
JSON (data format)
š Project Structure
āāā kafka_producer.py # Sends stock data to Kafka topic āāā kafka_consumer.py # Consumes stock data and inserts into Cassandra āāā README.md # Project documentation
š How It Works
1. Producer
Reads data extracted from polygonio
Publishes each record to Kafka topic stock_prices
2. Kafka (via Confluent Platform)
Acts as the message broker between producer and consumer
3. Consumer
Subscribes to stock_prices topic
Parses stock records and inserts them into Apache Cassandra
Real-time ETL pipeline with Python, Kafka (Confluent), and Cassandra: streams stock data via the Polygon.io API into Kafka, processes, and stores in Cassandra.