For a healthcare technology company, I built an AI QA Application for automated patient/clinician chat assessments to meet regulatory requirements around service quality. It autonomously evaluated 1000s of daily chats for user experience and clinical accuracy. The solution was built with Python, OpenAI API, Langchain, Pydantic, Google BigQuery, and deployed on GCP using Docker and Cloud Run. The service dramatically reduced the time taken for human reviews with the aim of replacing them altogether, saving the operations team days of work per week . Few-shot learning was used for prompt optimisation to minimise hallucinations.