poetry lock and poetry install if needed).scheduler.py script.scheduler.py script will:(pipeline_*_to_postgres).[pipeline_name] and [system] (e.g., [pipeline_xero_to_postgres], [xero]).dbt transformations defined in the dbt project.PIPELINE_INTERVAL_SECONDS (from the .env file) before starting the next sequence.RECORD messages conforming to the schema. 7️⃣ State is updated → Emits STATE messages periodically and at the end, saving progress for the next run.target-postgres component reads the SCHEMA, RECORD, and STATE messages and handles writing the data to the appropriate PostgreSQL tables.pyproject.toml for dependency details.Render Background Worker with Persistent Disk.env file in the project root directory. Fill in necessary credentials:DB_HOST, DB_PORT, DB_DATABASE, DB_USERNAME, DB_PASSWORD for the target PostgreSQL database.HUBSPOT_API_KEY, XERO_CLIENT_ID, XERO_CLIENT_SECRET, XERO_REFRESH_TOKEN, WRIKE_PERMANENT_TOKEN). These are referenced by the individual config.yml files.PIPELINE_INTERVAL_SECONDS to control the wait time between full sequence runs (defaults to 300 seconds / 5 minutes).config.yml file inside each pipelines/pipeline_*/ directory. These files define:select: key
For example:scheduler.py:pyproject.toml. It will also create the necessary tap-* command-line executables.sheduler.py__init__.py files for Python's import system and Poetry's script generation. Verify or create the following:pipelines/pipeline_hubspot_to_postgres, pipelines/pipeline_wrike_to_postgres, pipelines/pipeline_xero_to_postgres. (Rename using mv old-name new_name if needed).2.1.2 or higher. Check for poetry versionpipeline-multi-platform/):pyproject.toml: Defines project dependencies (shared libs, taps via scripts), metadata, and scripts for the entire project.poetry.lock: Locks dependency versions for reproducibility..env: Stores secrets and configuration (DB credentials, API keys). Do not commit this file.scheduler.py: The main script that orchestrates sequential pipeline runs.run.sh: The entry point script for setup and execution.pipelines/: Contains the code for individual pipelines.dbt/: Contains the code for dbt project.pipelines/pipeline_[system]_to_postgres/):config.yml: Configures the specific tap (env var mapping, stream selection).runner/: Contains runner/__init__.py, Contains the main() entry point. Also generates configs and executes the tap | target command for this specific pipeline.tap_name/internal.py: Contains the core Singer tap logic:fetch.py, utility.py: Handle API syncing logic.schemas/: Define schema will be loaded into Postgres for each data endpoints.run.sh: A execution scirpt inside each pipelines. A simple script executed by the scheduler.dbt/):dbt_project.yml
Defines the dbt project configuration:novellidbt.wrike, novellidbt.hubspot, etc.)profiles.yml: Contains connection settings for dbt to connect to the target database (e.g., Postgres).
The profile value in dbt_project.yml must match a profile defined here.models/
Contains SQL transformation logic, organized by platform:wrike/: SQL models for Wrike (e.g., task_duration.sql)hubspot/: SQL models for HubSpotxero/: SQL models for Xeromacros/: Currently used for print custom schema namePosted May 5, 2025
Orchestrated multiple Singer data pipelines into PostgreSQL using a central scheduler.
0
1