DataHub Catalog Implementation

James Bohrman

0

Data Analyst

Data Engineer

GitHub

Project: Datahub Deployment and Metadata Automation
Scope:
Implement and deploy Datahub on Google Cloud Platform (GCP)
Develop a metadata intake and event automation system
Create a notification system for metadata events
Set up infrastructure, testing, and documentation
Description:
The deliver for this project was a automated deployment of Datahub, a metadata platform, on GCP instrumented for metadata intake and event processing. The solution includes a multi-tenant setup, automated metadata event handling, and a notification system for users. The project leverages GCP's Kubernetes engine, Datahub's Python SDK with REST Emitter, n8n for workflow automation, and Twilio for notifications.
Key Components:
GCP Infrastructure Setup
Configure GCP project, Kubernetes cluster, and CI/CD pipeline
Datahub Deployment
Deploy multi-tenant Datahub using Helm charts on GCP Kubernetes
Metadata Intake and Event Automation
Develop multi-tenant intake form and implement REST Emitter for metadata events
Event Reaction and Notification System
Set up n8n workflows and Twilio integration for event processing and notification
Outcomes:
Fully functional Datahub deployment on GCP
Automated metadata intake and event processing system
User-friendly notification system for metadata events
Comprehensive documentation and knowledge transfer to the client's team
This project implements a robust, scalable solution for metadata management and automation, enhancing data governance and visibility across the organization.
Like this project
0

Posted Aug 21, 2024

In this project, I implemented a solution using the open-source data catalog called Datahub for a client.

Likes

0

Views

21

Tags

Data Analyst

Data Engineer

GitHub

MOps Pros Job Board Scraping
MOps Pros Job Board Scraping
Automated RSS to newsletter generator
Automated RSS to newsletter generator
Self-hosted n8n
Self-hosted n8n