Worship Schedule PDF Scraper -> a Excel Spreadsheet
Project Overview
I developed a Python script to automate the extraction of worship service information from PDF order-of-service files. The goal was to take multiple PDF bulletins and turn them into a structured CSV format that could be reused for scheduling, archiving, or analysis. This tool saves hours of manual copying and ensures consistent formatting of key worship elements.
Technologies Used
Python
pdfplumber (for PDF text extraction)
Regex (for parsing structured sections)
Pandas (for tabular data handling)
CSV Export
Key Features
Automatic PDF Text Extraction – Reads all text from multiple bulletins.
Section Parsing – Captures key service elements such as Call to Worship, Reflection, Hymns, Prayers, Words of Assurance, Affirmation of Faith, and Benediction.
Hymn Cleaning – A custom cleaning function removes prefixes, numbers, and extra formatting to give clean hymn titles.
Scripture Reading Extraction – Identifies Old and New Testament readings by reference.
Doxology Handling – Hybrid approach to capture both hymn-style and scripture-style doxologies.
CSV Output – Exports all parsed data into a clean CSV file with consistent column ordering.
Challenges
Inconsistent Bulletin Formatting
Different bulletins had slight variations in structure. Regex patterns had to be flexible yet accurate. This was solved through iterative refinement and testing on multiple files. As well as the PDF having sidebars making it difficult to navigate.
Hymn Title Noise
Hymn and psalm entries often contained extra phrases (like “Standing” or “In singing we”). A custom cleaning function was built to normalize and standardize titles.
Complex Layouts in PDFs
Some bulletins split content across columns or embedded non-standard characters. Using unicodedata normalization and careful line parsing helped solve these issues.
Future Direction
Add Spotify/YouTube embedding support by linking parsed hymns/psalms to recordings.
Extend parsing to handle Postludes and other optional service elements.
Improve error handling for even more variation in bulletin formatting.
Create a web-based interface where staff could upload PDFs and instantly get back a CSV or Excel file.
Design & Development Process
The process began with gathering several bulletin samples and identifying repeating sections. From there, I created regex patterns for each element (e.g., Call to Worship, Reflection, Hymn of Praise). I used iterative development: extract → test → refine, until the parser consistently returned usable data. Debug logging was included to track hymn matches and cleaning.
Outcome and Result
The final script successfully processes multiple bulletins at once, extracts all key sections, and saves them into a single CSV file. This makes managing worship planning significantly easier. I was especially happy with the hymn cleaning and doxology extraction, which were the most challenging parts.
Overall, this project demonstrated how automation can save repetitive manual work and ensure consistency. It’s a strong base for future enhancements like web deployment or direct integration with worship scheduling software.
Like this project
Posted Sep 29, 2025
Automated PDF to CSV conversion for worship schedules using Python.