Automated Worship Schedule PDF Scraper to CSV Converter by James CrowleyAutomated Worship Schedule PDF Scraper to CSV Converter by James Crowley

Automated Worship Schedule PDF Scraper to CSV Converter

James Crowley

Data Analyst

Data Scraper

pandas

Python

Professional Services

Worship Schedule PDF Scraper -> a Excel Spreadsheet

Project Overview

I developed a Python script to automate the extraction of worship service information from PDF order-of-service files. The goal was to take multiple PDF bulletins and turn them into a structured CSV format that could be reused for scheduling, archiving, or analysis. This tool saves hours of manual copying and ensures consistent formatting of key worship elements.

Technologies Used

Python

pdfplumber (for PDF text extraction)

Regex (for parsing structured sections)

Pandas (for tabular data handling)

CSV Export

Key Features

Automatic PDF Text Extraction – Reads all text from multiple bulletins.

Section Parsing – Captures key service elements such as Call to Worship, Reflection, Hymns, Prayers, Words of Assurance, Affirmation of Faith, and Benediction.

Hymn Cleaning – A custom cleaning function removes prefixes, numbers, and extra formatting to give clean hymn titles.

Scripture Reading Extraction – Identifies Old and New Testament readings by reference.

Doxology Handling – Hybrid approach to capture both hymn-style and scripture-style doxologies.

CSV Output – Exports all parsed data into a clean CSV file with consistent column ordering.

Challenges

Inconsistent Bulletin Formatting Different bulletins had slight variations in structure. Regex patterns had to be flexible yet accurate. This was solved through iterative refinement and testing on multiple files. As well as the PDF having sidebars making it difficult to navigate.

Hymn Title Noise Hymn and psalm entries often contained extra phrases (like “Standing” or “In singing we”). A custom cleaning function was built to normalize and standardize titles.

Complex Layouts in PDFs Some bulletins split content across columns or embedded non-standard characters. Using unicodedata normalization and careful line parsing helped solve these issues.

Future Direction

Add Spotify/YouTube embedding support by linking parsed hymns/psalms to recordings.

Extend parsing to handle Postludes and other optional service elements.

Improve error handling for even more variation in bulletin formatting.

Create a web-based interface where staff could upload PDFs and instantly get back a CSV or Excel file.

Design & Development Process

The process began with gathering several bulletin samples and identifying repeating sections. From there, I created regex patterns for each element (e.g., Call to Worship, Reflection, Hymn of Praise). I used iterative development: extract → test → refine, until the parser consistently returned usable data. Debug logging was included to track hymn matches and cleaning.

Outcome and Result

The final script successfully processes multiple bulletins at once, extracts all key sections, and saves them into a single CSV file. This makes managing worship planning significantly easier. I was especially happy with the hymn cleaning and doxology extraction, which were the most challenging parts.

Overall, this project demonstrated how automation can save repetitive manual work and ensure consistency. It’s a strong base for future enhancements like web deployment or direct integration with worship scheduling software.

Like this project

Posted Sep 29, 2025

Automated PDF to CSV conversion for worship schedules using Python.

Likes

Views

Timeline

Sep 14, 2025 - Sep 20, 2025