DataHub Catalog Implementation

James Bohrman

Data Analyst
Data Engineer
GitHub

Project: Datahub Deployment and Metadata Automation

Scope:

  • Implement and deploy Datahub on Google Cloud Platform (GCP)
  • Develop a metadata intake and event automation system
  • Create a notification system for metadata events
  • Set up infrastructure, testing, and documentation

Description:

The deliver for this project was a automated deployment of Datahub, a metadata platform, on GCP instrumented for metadata intake and event processing. The solution includes a multi-tenant setup, automated metadata event handling, and a notification system for users. The project leverages GCP's Kubernetes engine, Datahub's Python SDK with REST Emitter, n8n for workflow automation, and Twilio for notifications.

Key Components:

  • GCP Infrastructure Setup
  • Configure GCP project, Kubernetes cluster, and CI/CD pipeline
  • Datahub Deployment
  • Deploy multi-tenant Datahub using Helm charts on GCP Kubernetes
  • Metadata Intake and Event Automation
  • Develop multi-tenant intake form and implement REST Emitter for metadata events
  • Event Reaction and Notification System
  • Set up n8n workflows and Twilio integration for event processing and notification

Outcomes:

  • Fully functional Datahub deployment on GCP
  • Automated metadata intake and event processing system
  • User-friendly notification system for metadata events
  • Comprehensive documentation and knowledge transfer to the client's team



This project implements a robust, scalable solution for metadata management and automation, enhancing data governance and visibility across the organization.


Partner With James
View Services

More Projects by James