Commercial Rewards from AI Safety

We are overlooking an opportunity for competitive advantage. AI researchers complain they are under funded, yet there is a Win-Win to be had with industry.

The work being done in AI safety is not just about mitigating far-future risks; it's about developing the evaluation and oversight capabilities that businesses need to profit from AI. Moreover, the insights and techniques emerging from safety research will prove to be a key competitive advantage as we integrate the next generation of models into our workplaces.

There are enormous efficiencies to be had from automating business processes with AI. Yet, for any use case, from personalised content creation to order processing, how do we reliably evaluate the quality and safety of AI-generated outputs at scale?

Toyota disrupted the 70s/80s vehicle market by understanding quality and reliability. Image colorised and enhanced by AI

If there is a set of rules to apply, then we can enforce quality using simple code. But life is often not that simple, many tasks involve judgment, which superficially suits AI. ‘Human-in-the-loop’ works well for overseeing low numbers of tasks per day, but it doesn’t scale, humans suffer ‘vigilance decrement’. Moreover, human-in-the-loop will strain at the seams for the next generation of AI.

For example, how many highly experienced developers would be required to oversee the thousands of lines of code GPT5 could create per day? The 'Human-in-the-loop' approach is not how money will be made, it may not even prevent losses from errors and misunderstandings.

'Vigilance decrement' - the limit to Human-in-the-loop

Fortunately, AI safety researchers, in their quest to ensure advanced AI remains aligned with human values, have pioneered a suite of techniques that could revolutionize how businesses evaluate and deploy agentic AI. Leopold Aschenbrenner (ex OpenAI Safety) lists the below techniques where AI systems critique and oversee each other

Debate(Dario Amodei, now CEO Anthropic)
Market making(derived from " How to Get Truth from a Liar")
Recursive reward modelling (Jan Leike, ex OpenAI now anthropic)
Prover-verifier games(Uni. Toronto, Vector Institute for AI)

These approaches reflect conscientious thought patterns. We experience doubt when handling edge cases, our internal monologue seeks to cross check understanding. When entirely outside our zone of comfort, then we ask colleagues, specialists working on associated tasks.

Plato's "Justified True Belief" for understanding doubt. NB "Gettier Cases" lie outside this theory.

Let’s imagine an AI system tasked with generating software. An adversarial AI, engaged in a prover-verifier game, could automatically probe the code for vulnerabilities, dramatically scaling the evaluation process. Or, consider an AI assisting with business strategy - by having multiple AI systems debate strategic options, leaders could quickly surface critical considerations from multiple angles.

These approaches are strategies to supercharge human-AI teams, not abdicate control to AI. They are enhanced by the fact that current AI systems ‘think out loud’, their logic and workflow are in plain, auditable, English. For as long as AI does not exceed human abilities, then we can batch check them, just as we might drop in on a trainee. Of course, AI development is ongoing, so this is a frontier ripe for exploration.

The scalable oversight capabilities these techniques enable will be crucial for any business looking to reliably harness the power of advanced AI. Therein lies the opportunity. By engaging with and adapting these AI safety techniques, forward-thinking businesses can constructively participate at the forefront of the next generation of AI.

Whilst AI safety may seem like an academic concern, its implications for present-day businesses are profound. The insights and techniques emerging from this field could prove to be a key competitive advantage. It's not just about building better AI, but about building better ways to work with AI. And that's a frontier every AI-minded business should be exploring.

Find what you need

Commercial Rewards from AI Safety

Related posts

We Under-Imagined the Zombie

Agentico Teams Up with Apart Labs for AI Research

Understand, Edit & Steer AI - via API

Before you start

Assistant unavailable