DataHub Catalog Implementation

James Bohrman

Data Analyst
Data Engineer
GitHub
Project: Datahub Deployment and Metadata Automation
Scope:
Implement and deploy Datahub on Google Cloud Platform (GCP)
Develop a metadata intake and event automation system
Create a notification system for metadata events
Set up infrastructure, testing, and documentation
Description:
The deliver for this project was a automated deployment of Datahub, a metadata platform, on GCP instrumented for metadata intake and event processing. The solution includes a multi-tenant setup, automated metadata event handling, and a notification system for users. The project leverages GCP's Kubernetes engine, Datahub's Python SDK with REST Emitter, n8n for workflow automation, and Twilio for notifications.
Key Components:
GCP Infrastructure Setup
Configure GCP project, Kubernetes cluster, and CI/CD pipeline
Datahub Deployment
Deploy multi-tenant Datahub using Helm charts on GCP Kubernetes
Metadata Intake and Event Automation
Develop multi-tenant intake form and implement REST Emitter for metadata events
Event Reaction and Notification System
Set up n8n workflows and Twilio integration for event processing and notification
Outcomes:
Fully functional Datahub deployment on GCP
Automated metadata intake and event processing system
User-friendly notification system for metadata events
Comprehensive documentation and knowledge transfer to the client's team
This project implements a robust, scalable solution for metadata management and automation, enhancing data governance and visibility across the organization.
Partner With James
View Services

More Projects by James