AI clinical chat QA scoring model

Linden Jensen-Page

AI Agent Developer

ML Engineer

AI Developer

LangChain

OpenAI

Python

Artificial Intelligence

Summary

For a healthcare technology company, I built an AI QA Application for automated patient/clinician chat assessments to meet regulatory requirements around service quality. It autonomously evaluated 1000s of daily chats for user experience and clinical accuracy. The solution was built with Python, OpenAI API, Langchain, Pydantic, Google BigQuery, and deployed on GCP using Docker and Cloud Run. The service dramatically reduced the time taken for human reviews with the aim of replacing them altogether, saving the operations team days of work per week . Few-shot learning was used for prompt optimisation to minimise hallucinations.

Details

AI QA model was used to assess clinical and general quality of chats between health care practitioners and patients for a London based healthcare tech scale up

It was designed to process 1000s of daily chat transcripts to flag issues. The model was built around a detailed multi-step scoring rubric that performed well at scale

Few-shot training was used on labelled data to optimise the prompt to minimise hallucinations and maximise consistency of the content and structure of the outputs

Tech stack

OpenAI API, GPT-4o

Python, Langchain, Pydantic

Google Cloud Platform (GCP), Cloud Run, BigQuery

Docker

Message Bird (access to chats via API)

Like this project

Posted Feb 10, 2025

AI application for healthcare technology company to automate patient/clinician chat QA, ensuring regulatory compliance and reducing human work by days per week

Likes

Views

Tags