:An open-source pipeline that turns oncology research papers into structured, auditable biomarker...:An open-source pipeline that turns oncology research papers into structured, auditable biomarker...
The network for creativity
Join 1.25M professional creatives like you
Connect with clients, get discovered, and run your business 100% commission-free
Creatives on Contra have earned over $150M and we are just getting started
:An open-source pipeline that turns oncology research papers into structured, auditable biomarker data. Drop in a PDF, get back a 4-sheet workbook with study details, the biomarker catalog, statistical results, and author inferences — every row tagged with a verbatim source quote and the section it came from.
The problem Pharma analysts and translational researchers spend hours hand-extracting biomarkers, p-values, hazard ratios, and cohort details from each paper. Existing tools either miss the statistical layer or hide the source, which makes the output un-auditable.
How it works
Local PDF parsing (pymupdf4llm + pdfplumber) — no Azure dependency, no cloud upload of your PDFs.
A study classifier detects disease, study type, and biomarker type, then composes a 4-level prompt (core + disease addon + study-type addon + biomarker-type addon).
Four specialized LLM agents run in parallel: Study Details, Biomarker Details, Results, Inferences.
Outputs: Excel (4 sheets), flat CSV (one row per finding, pivot-ready), and full JSON. A corpus-wide registry shows biomarker frequency across papers.
Disease coverage today: breast, lung, liver, colorectal, gastric, pancreatic, thyroid. Study types: longitudinal clinical, survival oncology, diagnostic, methylation, immune infiltration.
Post image
Back to feed
The network for creativity
Join 1.25M professional creatives like you
Connect with clients, get discovered, and run your business 100% commission-free
Creatives on Contra have earned over $150M and we are just getting started