poetry lock
and poetry install
if needed).scheduler.py
script.scheduler.py
script will:(pipeline_*_to_postgres)
.[pipeline_name]
and [system]
(e.g., [pipeline_xero_to_postgres]
, [xero]
).dbt
transformations defined in the dbt project.PIPELINE_INTERVAL_SECONDS
(from the .env file) before starting the next sequence.RECORD
messages conforming to the schema. 7️⃣ State is updated → Emits STATE
messages periodically and at the end, saving progress for the next run.target-postgres
component reads the SCHEMA
, RECORD
, and STATE
messages and handles writing the data to the appropriate PostgreSQL tables.pyproject.toml
for dependency details.Render Background Worker with Persistent Disk
.env
file in the project root directory. Fill in necessary credentials:DB_HOST
, DB_PORT
, DB_DATABASE
, DB_USERNAME
, DB_PASSWORD
for the target PostgreSQL database.HUBSPOT_API_KEY
, XERO_CLIENT_ID
, XERO_CLIENT_SECRET
, XERO_REFRESH_TOKEN
, WRIKE_PERMANENT_TOKEN
). These are referenced by the individual config.yml
files.PIPELINE_INTERVAL_SECONDS
to control the wait time between full sequence runs (defaults to 300 seconds / 5 minutes).config.yml
file inside each pipelines/pipeline_*/
directory. These files define:select:
key
For example:scheduler.py
:pyproject.toml
. It will also create the necessary tap-*
command-line executables.sheduler.py
__init__.py
files for Python's import system and Poetry's script generation. Verify or create the following:pipelines/pipeline_hubspot_to_postgres
, pipelines/pipeline_wrike_to_postgres
, pipelines/pipeline_xero_to_postgres
. (Rename using mv old-name new_name
if needed).2.1.2
or higher. Check for poetry versionpipeline-multi-platform/
):pyproject.toml
: Defines project dependencies (shared libs, taps via scripts), metadata, and scripts for the entire project.poetry.lock
: Locks dependency versions for reproducibility..env
: Stores secrets and configuration (DB credentials, API keys). Do not commit this file.scheduler.py
: The main script that orchestrates sequential pipeline runs.run.sh
: The entry point script for setup and execution.pipelines/
: Contains the code for individual pipelines.dbt/
: Contains the code for dbt project.pipelines/pipeline_[system]_to_postgres/
):config.yml
: Configures the specific tap (env var mapping, stream selection).runner/
: Contains runner/__init__.py
, Contains the main()
entry point. Also generates configs and executes the tap | target
command for this specific pipeline.tap_name/internal.py
: Contains the core Singer tap logic:fetch.py
, utility.py
: Handle API syncing logic.schemas/
: Define schema will be loaded into Postgres for each data endpoints.run.sh
: A execution scirpt inside each pipelines. A simple script executed by the scheduler.dbt/
):dbt_project.yml
Defines the dbt project configuration:novellidbt.wrike
, novellidbt.hubspot
, etc.)profiles.yml
: Contains connection settings for dbt to connect to the target database (e.g., Postgres).
The profile
value in dbt_project.yml
must match a profile defined here.models/
Contains SQL transformation logic, organized by platform:wrike/
: SQL models for Wrike (e.g., task_duration.sql
)hubspot/
: SQL models for HubSpotxero/
: SQL models for Xeromacros/
: Currently used for print custom schema namePosted May 5, 2025
Orchestrated multiple Singer data pipelines into PostgreSQL using a central scheduler.
0
1