Written on

Commercial Rewards from AI Safety

Human-in-the-loop oversight doesn't scale — people suffer 'vigilance decrement', and nobody could review the thousands of lines of code GPT-5 writes daily. Meanwhile AI safety researchers build techniques where systems check each other. So how do these methods become a commercial edge?

Commercial Rewards from AI Safety

We are overlooking an opportunity for competitive advantage. AI researchers complain they are under funded, yet there is a Win-Win to be had with industry.

The work being done in AI safety is not just about mitigating far-future risks; it's about developing the evaluation and oversight capabilities that businesses need to profit from AI. Moreover, the insights and techniques emerging from safety research will prove to be a key competitive advantage as we integrate the next generation of models into our workplaces.

There are enormous efficiencies to be had from automating business processes with AI. Yet, for any use case, from personalised content creation to order processing, how do we reliably evaluate the quality and safety of AI-generated outputs at scale?

Toyota disrupted the 70s/80s vehicle market by understanding quality and reliability. Image colorised and enhanced by AI

If there is a set of rules to apply, then we can enforce quality using simple code. But life is often not that simple, many tasks involve judgment, which superficially suits AI. ‘Human-in-the-loop’ works well for overseeing low numbers of tasks per day, but it doesn’t scale, humans suffer ‘vigilance decrement’. Moreover, human-in-the-loop will strain at the seams for the next generation of AI.

For example, how many highly experienced developers would be required to oversee the thousands of lines of code GPT5 could create per day? The 'Human-in-the-loop' approach is not how money will be made, it may not even prevent losses from errors and misunderstandings.

'Vigilance decrement' - the limit to Human-in-the-loop

Fortunately, AI safety researchers, in their quest to ensure advanced AI remains aligned with human values, have pioneered a suite of techniques that could revolutionize how businesses evaluate and deploy agentic AI. Leopold Aschenbrenner (ex OpenAI Safety) lists the below techniques where AI systems critique and oversee each other

These approaches reflect conscientious thought patterns. We experience doubt when handling edge cases, our internal monologue seeks to cross check understanding. When entirely outside our zone of comfort, then we ask colleagues, specialists working on associated tasks.

Plato's "Justified True Belief" for understanding doubt. NB "Gettier Cases" lie outside this theory.

Let’s imagine an AI system tasked with generating software. An adversarial AI, engaged in a prover-verifier game, could automatically probe the code for vulnerabilities, dramatically scaling the evaluation process. Or, consider an AI assisting with business strategy - by having multiple AI systems debate strategic options, leaders could quickly surface critical considerations from multiple angles.

These approaches are strategies to supercharge human-AI teams, not abdicate control to AI. They are enhanced by the fact that current AI systems ‘think out loud’, their logic and workflow are in plain, auditable, English. For as long as AI does not exceed human abilities, then we can batch check them, just as we might drop in on a trainee. Of course, AI development is ongoing, so this is a frontier ripe for exploration.

The scalable oversight capabilities these techniques enable will be crucial for any business looking to reliably harness the power of advanced AI. Therein lies the opportunity. By engaging with and adapting these AI safety techniques, forward-thinking businesses can constructively participate at the forefront of the next generation of AI.

Whilst AI safety may seem like an academic concern, its implications for present-day businesses are profound. The insights and techniques emerging from this field could prove to be a key competitive advantage. It's not just about building better AI, but about building better ways to work with AI. And that's a frontier every AI-minded business should be exploring.

Related posts

See all posts
We Under-Imagined the Zombie

We Under-Imagined the Zombie

Anthropic measured something like emotions inside Claude — not mimicry, but representations that direct it. Intervene and the behaviour changes. The debate fixates on one question: is AI conscious? This suggests both sides ask the wrong thing. What does it mean for business?

Agentico Teams Up with Apart Labs for AI Research

Agentico Teams Up with Apart Labs for AI Research

Agentico is committing one day of staff time a week to Apart Lab's AI safety and interpretability research. With models like OpenAI o1 now matching physicians on reasoning, advising on AI has become less like automation engineering and more like recruiting.

Understand, Edit & Steer AI - via API

Understand, Edit & Steer AI - via API

At a Goodfire and Apart hackathon, our team built a tool to catch hallucinations in medical diagnostic AI, then steer the model around them neuron by neuron via a simple API. What does it mean when a business can edit a model's internals this easily?