What is the Vision for Agentic AI?

olivermorris83
Aug 1, 2024
7 min read

Updated: Aug 5, 2024

When we grant LLM’s, like ChatGPT, the agency to use data and tools for business or personal tasks then we get 'agentic' AI. These agents can work in collaboration with us as a co-pilot, or fully autonomously as an agent. They can work alone, or in teams with other agents, either in rigidly defined processes or in open conversations.

Such agentic AI now attracts a meaningful slice of the world’s AI research effort and huge investment from business. Meta has spent billions to “.. introduce AI agents to billions of people in ways that will be useful and meaningful”.

Yet, no vision has been shared for the world we are being driven towards. Let's distil this vision from key players' public statements. See end of document for full list of those players.

What Business Problem Does Agentic AI Address?

Sierra AI, led by the ex CEO of Salesforce and now OpenAI board member, Bret Taylor, makes the pithiest bid for agent's usefulness in business. Paraphrased below:

What if your most knowledgeable technician was available to advise sales staff without forgoing billable time?
What if staff could spend less time on documenting past work and more on developing client relationships?
What if customers could rely on your service specialists on site at all times ?

The potential benefits go on, Agentic AI promises to augment people with new skills and automate laborious paperwork. Let's cut to results of the review:

Source: Gary Butterfield, Sheffield Hallam University

A Vision for Agentic AI

For Customers & Staff

Embody & Engage

AI agents as the primary digital manifestation and interface for brands [Sierra.ai]
This extends websites in 1995, social media in 2005 and mobile in 2015 [Sierra.ai]
Corporate AI to foster relationships with customer's personalised AI [Meta]

Augment & Automate

Agents add value by augmenting human talent with digital skills [Microsoft]
Workflow automation with agents and agencies (multi-agent teams) [LangGraph]
Staff liberated to invest in client relationships, innovation and strategy [Zendesk]

For Operations & Regulators

Progress & Protect

Agents continuously learn from user interactions and other agents [CrewAI]
Agent operations; Monitoring safety and effectiveness [AgentOps]
Balance of helpful and empowered AI vs harmless but constrained [Imbue]

Trust & Transparency

Preserving brand trust with layered AI supervision and deterministic rules [Voiceflow]
Transparent decision-making processes with detailed logging and auditing [LangSmith]
User data privileges and privacy reflected in permissions of the AI agent [MS CoPilot]

For Developers & Analysts

Compose & Dispose

Agents compose and dispose their own teams for each task [AgentZero]
Spin out specialist ‘agents as a service’ publicly via marketplaces [MetaGPT]
Integration of agent teams via ‘Internet of Agents’ [Tsinghua / Tencent]

Flexible & Fast

Agents follow rules for rigid workflows, but devise solutions for open tasks [Sema4.ai]
LLM's map language to rigid workflows, code executes them repeatably [DeepMind]
Dynamic selection of LLM, tool or custom code as required (aka Compound AI) [BAIR]

See end of this document for the list of participants in the market which were reviewed for this vision.

Are the Foundations of Agentic AI Reliable?

Agentic AI has pedigree, it goes back at least as far as ‘Society of Mind’ by Marvin Minsky in 1986. Minsky was co-founder of the AI lab at MIT.

Minsky reminds us that the human mind is not a single, unified entity, but rather numerous specialisations, from a sense of balance to reason and emotional insight. Complex problems, such as the questions above, are solved through the cooperation and competition of these specialisms, which operate as ‘agents’.

Minsky emphasizes the importance of analogies and metaphors in human cognition and problem-solving. ChatGPT is our analogue for an agent, and in fact, asking ChatGPT to ‘reason by analogy’ is a useful prompting technique.

LLM’s such as ChatGPT are granted ‘agency’ when we augment them with access to business data, tools for processing that data into something useful and the ability to reflect on their work.

The tasks we give such an agent may be simple, such as booking a meeting, or multi-step, such as drafting a report using management information, or even complex, such as discussing product requirements with many users then synthesising user stories and tests.

Specialist agents can partially automate tasks in collaboration with people, as done by Microsoft Office Co-Pilot. Or, they can fully automate workflows in collaboration with other specialist agents, such as with Microsoft Autogen.

Don’t Pave the Cow Paths

The above vision requires us to employ the advantages of AI, whilst being wary of the limits. This is different to replacing people with AI, which is rarely feasible let alone desirable. Direct replacement tempts us into ‘paving the cow paths’ whereby we employ new technology in habitual processes, rather than establishing processes around the new technology’s advantages.

For example, when electric motors first replaced massive steam engines in factories, direct replacement of one centralised motor with another led to few gains. It wasn’t until factories were re-organised for distributed power tools that companies raised productivity.

AI distributes intelligence to the edges, just as electricity distributed power a century ago. Working practices change accordingly. Source: TranquilDuo Pixabay & Vance Osterhour, Unsplash.

Agents are the distribution of helpful intelligence to user’s fingertips, just as electricity delivers muscle power to their hands. Be careful, there is an unavoidable trade off between AI which is helpful and that which is harmless. As with power tools, helpful tools can do harm, harmless tools are helpless. AI safety researchers have been struggling with this dilemma for years.

What else can AI do that people can’t? Consider these advantages in the context of the business problems:

Mass Personalized Interaction
- An AI agent can engage in thousands of individualized conversations simultaneously
Scalable Empathy
- AI agents can provide emotionally intelligent interactions with unlimited patience,they simply have more time for patients than doctors do.
Generalist Expertise
- AI agents combine multiple specialist skills in one entity, streamlining teamwork
Iterative Intuition+Rules Loop
- AI agents enable cycling between intuitive conversation and the following of rules, which is computing’s traditional expertise
Cross-Modal Understanding
- Multimodal AI blends different types of data (text, images, etc.) in ways humans cannot, for example, it can visualise the output of code
Vast Simultaneous Context
- AI’s working memory is many books worth of information, it can summarise and extract themes from customer comments at scale.

Who Do the Agents Work For?

Agents can be employed by individuals and by companies. So, there are four interactions which can be mediated and automated by agents on our behalf:

Individual – Company … individual as a customer
Company – Individual … individual as staff, consultant or subject specialist
Company – Company … companies in partnership, eg a supplier and client
Individual – Individual … individuals as colleagues or partners

Be aware of the corporate impact of personal technologies. Consider how quickly Facebook’s momentum led to adoption in professional settings. In fact, ‘Bring Your Own AI ’ is already a feature at many companies who do not approve of ChatGPT or Claude. Staff bypass this constraint by using their phones, such is the advantage of the AI.

A Dose of Skepticism

A vision is by definition not yet a reality. There are reasonable doubts to be addressed:

Can I Trust It?

Business needs safe reliable AI just as airlines need safe and reliable aircraft. I assume you’ve already used ChatGPT or Claude, so you know what it can do for you, and what it can’t.

A common refrain is that it requires the user to be a subject matter expert to know whether the AI output is accurate or an hallucination. It can be amazing, but when we inquire about subjects in which we are not experts then we must cross reference outputs, which is time consuming.

The vision of Agentic AI is that it blends probabilistic tools, like chatGPT, for handling common sense queries, and rule based tools, such as software code or policy documents, for handling regulated situations.

We already employ agents to write code, testing and correcting their own output. Even before ChatGPT, chatbots used rule books to ensure they adhere to company policies in conversations with clients. Since ChatGPT, many chatbots automatically switch between a fluid conversational AI and a set of approved responses, according to the context. Voiceflow discusses such a hybrid agent.

What Can AI Not Currently Do?

There are many limitations, but here are three that commonly face me as an agent developer:

Intuition vs Logic

Now we turn to ‘hallucination’. In the words of a recent scientific paper “LLM’s are in context semantic reasoners, not symbolic reasoners”. In other words, LLMs are not the reasoning engines of science fiction, instead, their style is fast and intuitive. As such, their reasoning is valid only within the constraints of the patterns of words they have seen during training. When the words or patterns are not familiar then LLMs will often confabulate false yet convincing outputs.

Compare this to our own intuition, quickly applying rules of thumb, versus our slower deliberate thoughts. Otherwise known as ‘thinking fast’ versus ‘thinking slow’, or common sense versus formal reasoning. LLM’s do not currently ‘think slow’.

There are ways to bridge the gap, but this is where the skill of the AI engineer comes in.

Interpolation not Extrapolation

All neural networks, including LLM’s, learn to interpolate between the data they are trained on. LLM’s and image generators are impressive because of the sheer breadth of their training.

This allows them to convincingly blend unrelated but pre-existing styles, such as holiday snaps with Picasso. But, they could not invent Picasso’s style from scratch, they cannot extrapolate into entirely new insights and styles.

Context Window and Cost

LLM’s, including AI Agents, are ‘stateless’. Every time we make a request, the entire conversation history must be sent to the LLM to provide a context for that request. As the conversation gets longer, the amount of data grows. Each ‘token’, or part of a word, costs money and time. Complex agentic workflows can consume enormous numbers of tokens.

This is also a skill of the AI engineer, to build workflows which are economical for businesses and performant for users.

OpenAI’s recent GPT-4o-mini is cheap and very fast, it is clearly targeting agentic workflows.

A Work In Progress

This document is a first attempt at the vision enabled by Agentic AI. I’d be delighted to incorporate your comments, do comment below !

List of Participants in the Agentic AI Space

The below table lists the companies whose public statements were reviewed in order to derive the vision statement.

Single Agents for Enterprise	Workflow Automation for Agents	Multi Agent Frameworks for Developers
- MS Co-Pilot Studio - Google Vertex AI Agent Builder - Amazon Bedrock Agents Builder - OpenAI GPT Store - Nvidia Visual Agents - BAIR (Berkeley AI Research) - Sierra.ai - Sema4.ai - MultiOn -Imbue	- Taskade.com - FlowiseAI.com - Adept.ai - Emergence.ai - n8n.io - MindStudio.ai - Relevance.ai - Stack.ai - Dust.tt - AgentVerse.ai -Cassidy.ai	- AutoGen - LangGraph - CrewAI - MetaGPT -AgentZero(Github)
Agent Ops & Security	Single Agents for Consumers	Chatbot Providers Developing into Agents
- AgentOps - Lakera.ai - Langsmith - AgentDojo	- Meta AI studio - Open Interpreter - OpenAI GPT Store	- Botpress - Voiceflow - Intercom - Zendesk -Aurelio.ai