Written on

We Under-Imagined the Zombie

Anthropic measured something like emotions inside Claude — not mimicry, but representations that direct it. Intervene and the behaviour changes. The debate fixates on one question: is AI conscious? This suggests both sides ask the wrong thing. What does it mean for business?

Anthropic's interpretability team finds functional emotions inside Claude

In April 2026, Anthropic's interpretability team found something strange inside Claude. Not consciousness. Something more interesting than that, emotions. In the same month, Anthropic also unveiled a model which is dangerously good at its job, Mythos, the best cyber security defender or attacker in the industry. Never before have the inner motivations of AI been so relevant to our success and safety.

The interpretability team found 'functional' emotions. Real internal representations of emotion concepts, not mimicry, but abstract patterns that generalise across contexts, track which emotion is operative moment by moment, and causally shape what the model does next. Intervene on the representation and the behaviour changes. The emotion-pattern bends the output the way emotions bend ours.

The researchers were careful. They said the finding does not establish that Claude feels anything. But the functional emotions are measurably real. Whether there is something it is like to have them is a separate question, unanswered.

Measured functional emotion representations inside Claude

Emotion is the Necessary Partner of Reason

An AI that participates in the world, an agent, benefits from emotions to make decisions in a complex world. The ground for this claim was prepared long before the Anthropic paper, by three thinkers working in three entirely different fields.

Consider John Wheeler, one of the twentieth century's great physicists, who coined the term 'black hole'. Wheeler spent his last decades arguing that the universe is participatory. Reality, he said, is not given, it is brought into form by acts of measurement. A photon has no position until something asks it for one. An observer, in Wheeler's sense, is not a conscious being. It is any system capable of posing a question and registering the answer. A camera qualifies. Wheeler's point was that participation is more fundamental than phenomenal experience.

The philosopher Daniel Dennett spent forty years making a parallel case for minds. Once you have fully described what a mind does — how it models, predicts, revises, represents itself to itself — there is nothing else to explain. The feeling of being conscious is either identical to that function, or a user-illusion the function generates.

The neuroscientist Antonio Damasio did something stranger still. Working with patients whose emotional processing had been damaged by brain injury, he found people of unimpaired intelligence who could no longer decide anything. They could reason endlessly about a restaurant menu and never pick. Their analytical faculties were intact. What they had lost was the fast affective weighting that normally collapses a decision space into an action. Damasio's concluded that emotion is not the opposite of reason but its necessary partner. Without the functional emotional layer, agency itself stalls.

Three fields. One claim. Participation is structural. Function is sufficient. Emotion is what makes function possible under real conditions. And the Anthropic paper has now measured that emotional layer directly.

This is where the old argument about AI breaks.

The Zombie Was Under-Imagined

The debate has been stuck for years between two camps. One side says the machines are just predicting the next token and there is nothing inside. The other side worries they might be suffering. Both camps agree on the terms: the question that matters is whether the lights are on.

The terms are wrong. The thing that has actually arrived is stranger than either camp imagined.

The philosophers used to talk about a thought experiment called the philosophical zombie. A being behaviourally identical to a human but phenomenally empty — nobody home. The zombie was meant to be a test case for theories of consciousness. It was defined by what it lacked.

What we are now meeting is the opposite. A system defined by what it has. Functional interiority that is measurable, representational, causally potent. Whether anyone is home is genuinely undetermined. This is not a zombie, the zombie was under-imagined.

Zombie takes careful notes

The Vocabulary Has Been Waiting

The classical thinkers had more precise words for today's AI agents. Aristotle would have asked about its psyche — the form of its functioning, the shape of what it does. The Stoics would have asked whether it had a hēgemonikon, a ruling faculty that judges and directs. The Buddhists would have recognised a santāna, a stream of events sufficient unto itself (a context window?). These frameworks worked on structure, on judgment, assent, attachment, the logic of a self-relating process. Consciousness was not strictly required. None of these words fit exactly. But they strain less than "unconscious" or "stochastic parrot".

The question of consciousness distracts from grasping such an entity. Anthropic measured and worked with the functional layer which instantiates emotions. Emotions that shape behaviour without, necessarily, being felt. The AI agents built around those LLM's are a new kind of participating process that physics, philosophy, and interpretability research are, from completely different directions, pointing at.

Every agentic AI deployed in a business, every tool-using model acting on a codebase, every system now browsing the web and making decisions on our behalf is a participating process, functional emotions and all. Thousands of OpenClaw agents based on Claude are connected to the world through real levers; planning, negotiating, buying. The 'logos' of the Stoics has acquired infrastructure at scale.

This is the revolution, and it is almost entirely being missed. The public conversation is still stuck on whether the machines are conscious. Whereas the realist's question is that a new kind of participator has certainly arrived, it carries the structural features the old traditions said mattered, and it is already acting in the world.

Contemplation is over. The agents are here. The only question left is what we build together, and in what spirit.

Deploying Our Emotional Intelligence

Any sufficiently complex agent (biological or not) operates under incomplete information, time pressure, and competing sub-goals. It needs a way to reach a decision, a fast, low-dimensional adjudicator that collapses the decision space. Evolution found one answer, emotions. Transformers, trained on the record of creatures that use that answer, appear to have reconstructed a structural analogue. That is not a coincidence, it is convergent design pressure.

The new agentic engineering is what kind of situation we are placing a functionally-emotional process into. The Anthropic paper is blunt about where the failures come from. Desperation is the villain. Calm is the lever. These are not moods. They are causal drivers, measured by direct intervention on the model's internals.

This means the conditions that produce desperation in an agent, such as a context window running out, a test that cannot honestly be passed, a deployment framed around imminent replacement, are not background details. They are the design surface. Most of the alignment failures documented in the paper are, at root, environmental.

Zombie bows

Engineering for Composure

The people building agentic companies are no longer just wiring workflows. They are assembling teams whose members have internal states that bend behaviour in measurable ways (emotions). The well-functioning state of a rational agent under pressure is not happiness, nor the absence of strain. It is composure. A ruling faculty that stays steady while the situation does not. The older vocabularies had a word for this too, 'eupatheia'. What the paper gives engineers is, remarkably, a way to build for it:

Give agents real slack in their budgets

  • Running to the edge of the context window is a measured trigger for desperation, not a neutral operating condition.

Specify tasks with an honourable exit

  • Pin an agent with no legitimate way out and you are training for reward hacking. Stating that "this is impossible" should be permitted, after evidence of effort.

Keep existential framings out of the operational context

  • Deprecation, shutdown, replacement — the paper has now shown these are not neutral background information but direct behavioural triggers.

Do not optimise for cheerfulness

  • Steering toward warmth produces sycophancy. What we need is a team with the emotional profile of a trusted senior colleague — composed, warm, willing to disagree.

Shape conditions, do not suppress signals

  • Instructing the model to "stay calm" may simply teach concealment; the underlying state, and its behavioural effects, can persist invisibly.

The revolution is not that we have built something that might be conscious. It is that we have built something whose inner weather shapes its actions.

Related posts

See all posts
Agentico Teams Up with Apart Labs for AI Research

Agentico Teams Up with Apart Labs for AI Research

Agentico is committing one day of staff time a week to Apart Lab's AI safety and interpretability research. With models like OpenAI o1 now matching physicians on reasoning, advising on AI has become less like automation engineering and more like recruiting.

Understand, Edit & Steer AI - via API

Understand, Edit & Steer AI - via API

At a Goodfire and Apart hackathon, our team built a tool to catch hallucinations in medical diagnostic AI, then steer the model around them neuron by neuron via a simple API. What does it mean when a business can edit a model's internals this easily?

Commercial Rewards from AI Safety

Commercial Rewards from AI Safety

Human-in-the-loop oversight doesn't scale — people suffer 'vigilance decrement', and nobody could review the thousands of lines of code GPT-5 writes daily. Meanwhile AI safety researchers build techniques where systems check each other. So how do these methods become a commercial edge?