top of page

Researching Safe Medical AI

Teaming up with ApartLabs to reduce medical AI hallucinations by analysing an AI's internal processes.

Problem

AI systems are increasingly used in medical decision-making, but their tendency to hallucinate - confidently generating false information - poses significant risks. We teamed up with the research organisation ‘Apart Labs Studio’ to investigate a new method of reducing hallucinations using a 'Sparse Auto Encoder' service provided by the start-up Goodfire, this allowed us to investigate the inner thought processes of the AI. This was non commercial work contributed as part of a common effort to improve AI safety.


Solution

We sourced freely available test data, creating 5,000 hallucination tests for medical applications from Huggingface. Using Goodfire's Sparse Auto-Encoder (SAE), we identified neural features (the model’s inner processing circuits) associated with hallucinated responses in those medical questions.


We built a number of machine learning classifiers to detect potential hallucinations. We were then in a position to open up the model and use those features to steer it away from hallucinations, to instead refuse to answer when unsure.


The approach successfully reduced hallucination rates, but highlighted how the complexities of knowing when to give advice and when to refuse. Overall the method was powerful, but the Goodfire SAE would require refinement for this use case.


Recipe


  • MedHALT FCT medical hallucination dataset from Huggingface

  • Llama-3.1-8B-Instruct as base model

  • Goodfire's SAE API for feature extraction

  • Human Disease Ontology dataset for validation of the hallucination features found

  • 3x Classification algorithms (SVM, Decision Tree, Logistic Regression)

  • Goodfire’s feature steering tools, targeting key features which impact the models’ willingness to respond to questions it has insufficient information for


Agentico Logo

Let's talk

© 2024 Agentico Ltd. All Rights Reserved.

Privacy Policy

Thanks for subscribing!

Registered

Agentico Ltd, Reg'd in England & Wales 15428063

ICO Reg'n ZB657122

Location

34-35 Butcher Row, Shrewsbury, Shropshire, SY1 1UW, United Kingdom

Phone

01952 928189

WhatsApp

WhatsApp QR code
Shropshire Chamber of Commerce
bottom of page