Research

Research & Machine Learning

Pushing the boundaries of AI safety, interpretability, and reasoning.

Paper · 01 of 07
Feb 2026 · Mechanistic Interpretability
Tiny Recursive Model Feb 2026

How can a model reason better than ChatGPT yet be 100,000× smaller?

AI models famously pattern-match rather than truly reason. Frontier LLMs have trillions of parameters — enough that the pattern-match closely emulates reasoning. In late 2025 the Tiny Recursive Model matched them on ARC-AGI with a 100,000ᵗʰ of the neurons. I adapted TRM to logic puzzles and traced how it gets there with so few resources.

100,000×
smaller vs. frontier
ARC-AGI
benchmark parity
Frontier LLM
~1.8T parameters
■ emulated reasoning
Tiny Recursive Model
~18M parameters
▲ recursive reasoning
github.com/olimoz/trm_mechinterp
01 / 07
Can AI Agents Be Manipulated by Advertising?

March 2026

Can AI Agents Be Manipulated by Advertising?

As we deploy AI agents for 'Agentic Commerce' and shopping/procurement, a critical question emerges: can these agents be manipulated by the same persuasion tactics that work on humans? This study replicated classic cognitive-bias experiments across five frontier models (Claude, GPT, Gemini, GLM, Kimi) with over 8,000 trials. Common marketing techniques such as decoy pricing, gain/loss framing, and price anchoring transfer to AI agents, often with effect sizes matching or exceeding those seen in humans. Organisations deploying AI agents for commercial decisions need to understand these exploitable biases.

How to Peer Inside Models and Spot Medical Hallucinations

February 2025

How to Peer Inside Models and Spot Medical Hallucinations

LLM's have a habit of being confidently wrong. In medicine this is clearly dangerous, yet hundreds of millions of people have no access to doctors and would benefit from trustworthy AI. I was invited to collaborate with Apart Lab Studio, a research collective focused on AI safety. I conducted research on whether we can detect hallucinations in LLM's used for medical scenarios, investigating how to identify and influence the internal features that govern a model's expression of confidence and hallucination. Published findings on steering Llama 3 8B to improve identification of uncertainty boundaries in clinical contexts.

Tuning AI Agents One Neuron At A Time

September 2024

Tuning AI Agents One Neuron At A Time

This research investigates the practical limits of reasoning in AI Agents by testing them against abstract logic puzzles (the ARC-AGI challenge), then uses mechanistic interpretability techniques to look under the hood at how the models think. The study benchmarks Google's Gemma2-9B against Anthropic's Claude 3.5, finding that all models fail on problems involving matrices larger than 300 elements regardless of model size, while demonstrating that agentic workflows with self-reflection can meaningfully improve performance. The work shows how Sparse Autoencoders can inspect, steer, and potentially disrupt a model's internal reasoning features.

AI Agents in Coding Teams

December 2023

AI Agents in Coding Teams

A working prototype of multi-agent AI collaboration, now included in Microsoft's Autogen. This project demonstrated AI agent teams coding collaboratively in a REPL environment. The core innovation is presenting a Jupyter notebook as a persistent execution environment for a team of collaborating agents, solving a key limitation where agents lose track of previously created variables between steps. The original code and Jupyter notebook are on GitHub.

What Do People Really Want?

July 2021

What Do People Really Want?

Using unsupervised machine learning this study analyses nearly half a million sentences from Wikipedia expressing human desires. The algorithms discover motivational patterns organically. Thirteen distinct clusters emerged, and virtually all centre on interpersonal and relational themes — emotional connection, group belonging, authority dynamics — rather than acquiring material objects.

Building a Transformer Trained on Common Reasoning

January 2021

Building a Transformer Trained on Common Reasoning

Can AI learn to generate plausible cause-and-effect explanations from natural language? This research trains a transformer model on over 500,000 cause-effect sentence pairs extracted from Wikipedia, investigating a data-driven alternative to hand-coded causal graphs. Using pre-trained BERT embeddings makes this feasible on desktop hardware. Relevant to any organisation seeking AI systems that reason about "why" things happen — explaining root causes, generating justifications for anomalies, or powering decision-support tools that go beyond correlation.


Interested in my research or want to discuss AI safety for your organisation? Book a Free Discovery Session →