Machine Learning • 2025

Researching Safe Medical AI

Researching Safe Medical AI

< Back

Researching Safe Medical AI

Teaming up with a research organisation to reduce medical AI hallucinations by analysing an AI's internal processes.

Problem

AI systems are increasingly used in medical decision-making, but their tendency to hallucinate - confidently generating false information - poses significant risks. We teamed up with the research organisation ‘Apart Labs Studio’ to investigate a new method of reducing hallucinations using a 'Sparse Auto Encoder' service provided by the start-up Goodfire, this allowed us to investigate the inner thought processes of the AI. This was non commercial work contributed as part of a common effort to improve AI safety.

Solution

We sourced freely available test data, creating 5,000 hallucination tests for medical applications from Huggingface. Using Goodfire's Sparse Auto-Encoder (SAE), we identified neural features (the model’s inner processing circuits) associated with hallucinated responses in those medical questions.

We built a number of machine learning classifiers to detect potential hallucinations. We were then in a position to open up the model and use those features to steer it away from hallucinations, to instead refuse to answer when unsure.

The approach successfully reduced hallucination rates, but highlighted how the complexities of knowing when to give advice and when to refuse. Overall the method was powerful, but the Goodfire SAE would require refinement for this use case.

Recipe

  • MedHALT FCT medical hallucination dataset from Huggingface

  • Llama-3.1-8B-Instruct as base model

  • Goodfire's SAE API for feature extraction

  • Human Disease Ontology dataset for validation of the hallucination features found

  • 3x Classification algorithms (SVM, Decision Tree, Logistic Regression)

  • Goodfire’s feature steering tools, targeting key features which impact the models’ willingness to respond to questions it has insufficient information for

Latest projects

See all projects