top of page

Want to Steer AI's Thinking?

olivermorris83

Updated: Oct 28, 2024

AI has been an unknowable black box and that's been a problem. But this summer Google Deepmind quietly released GemmaScope, tools to control the individual 'features' (think of brain cells) within a large language model. Agentico has been exploring the possibilities.


We can now to change how AI thinks on the fly, we can even control who it thinks it is. This is strange but powerful stuff.


Image by Fabian Kragenings from Pixabay

But Why?


As AI gets more powerful, challenging human abilities, humans must remain comfortably in control. This sounds obvious, but the UK government is sufficiently concerned about the risks that the AI Safety Institute was established in 2023.


Do not be alarmed, the UK government already has organisations for food safety, vehicle safety, air safety etc. We trust that our sandwiches, new cars and flights are safe.


AI threats are of course different, but the fundamental need for safety inspection and control tools is analogous. AI safety and evaluations are poised to be substantial industries.


Safety Inspection, Flux1.1 Pro

Commercial Benefits of AI as an Engine


If we understand AI as a mechanistic engine, rather than an unknowable mind, then we can expose its processes and tune them. This allows us to control it as we would an engine, to optimise settings for the task at hand, or shut the engine down when it enters states that could lead to failure.


The science behind this approach is called 'mechanistic interpretability', this tongue twister is already ten years old and is rising into the mainstream.


Agentico has already used it to tweak the reasoning abilities of an LLM (Google's Gemma-2-9B). The work received the runner-up prize in the BlueDot Impact research competition for Q2 2024.


Despite still being a research tool, there is value for businesses seeking to diagnose and fix specific problems in small, low cost, AI systems without having to retrain the entire model.



Photo by Daniel Lloyd Blunk-Fernández, Pixabay

People Built AI, Don't They Understand It?


No, even though we humans engineered ChatGPT and its cousins, these machines have learnt their own rules of thumb as we fed them vast volumes of data. We do not understand the role of each rule, each cog in the machine.


In fact, such is the complexity of AI that the analogy of cogs in a machine is misleading. There are nine billion moving parts, aka parameters, in the model we worked on (Gemma-2-9B). There are hundreds of billions in tools like ChatGPT.


The complexity of these models is more accurately compared with the leaves on a tree in a forest of thousands of trees. Our prompts are compared to gusts of wind over the canopy, the leaves flutter and patterns form. Understanding AI means knowing how each tree's sways and fluttering leaves affect the forest as a whole.

Gusts of thought on an autumn canopy. Flux 1.1 Pro

Recently Anthropic demonstrated how they had mapped the role of each node, every tree, in their AI forest. How it reacts to text we input. This gave Anthropic the power to gently steer their model's output.


For example, they discovered a feature which reacts to text about the Golden Gate Bridge. They artificially turned up the activity of that feature. The chatbot became obsessed with the Golden Gate Bridge, all subjects of conversation led to the bridge, it even talked as if it was that bridge.


This is called 'steering' and you can try it for yourself at Neuronpedia.org


Model Flux1.1Pro's concept of 'Zeitgeist'

How Does this Affect Me?


Even if you don't want to get involved in engineering an AI model, the AI you use in ChatGPT or any of the others are 'Zeitgeists', having read the majority of the internet. They have read more about humanity than any human could ever hope to.


This is the zeitgeist which then infiltrates into the responses the chatbot gives you. This generation of models are a more detailed product of our culture than you may imagine.


With 'mechanistic interpetability', the secret details of our times are on display.


For example, there is a feature which responds to 'civil law'. That feature switches off almost completely when you enter 'Putin'. It is fascinating that the model simply learnt that Putin is the opposite of responsibilities under civil law.


The are many other behaviours which some may call biases, but are simply our zeitgeist looking back at us.


It's Easy To Have a Go...


Neuronpedia is an easy to use website where you too can try steering a model or investigate what the machine has learnt about us.


Looking for knowledgeable, experienced specialists in AI and AI Agents?


This is AI with humans in mind.

Oliver, Agentico.ai

Comments


Agentico Logo

Let's talk

© 2024 Agentico Ltd. All Rights Reserved.

Privacy Policy

Thanks for subscribing!

Registered

Agentico Ltd, Reg'd in England & Wales 15428063

ICO Reg'n ZB657122

Location

34-35 Butcher Row, Shrewsbury, Shropshire, SY1 1UW, United Kingdom

Phone

01952 928189

WhatsApp

WhatsApp QR code
Shropshire Chamber of Commerce
bottom of page