AI Agents Suit Organisations Better Than Chatbots

Task automation is the most obvious use case for AI. However, the ‘AI agents’ which automate tasks are commonly eclipsed by chatbot services such as ChatGPT, despite relying on the same underlying services, such as GPT3.5 and GPT4.

Organizations are often reluctant to empower their workers to use AI, but agents fit more naturally into the existing structure of organizations. Because you assign them tasks, like humans, they can work as “AI contract workers” who are delegated jobs. Agents also represent the first break away from the chatbot and copilot models for interacting with AI.

Dr Ethan Mollick, Wharton School of Business, April 2024.

We all understand chatbots, their shortcomings in office environments in well known, not least the unmanaged 'black market' IT aspect they encourage. AI Agents offer structured, workflow managed, auditable automation. To better understand the opportunity, let’s look a little closer at what ai agents are.

Agentic AI is Productive AI

ChatGPT and other chatbot services are wonderful tools, allowing us to quickly get to the nub of an issue. However, we cannot hand them a task and expect it to be done.

Andrew Ng, founder of Google Brain and Baidu’s Chief Scientist, said in March 2024 “The set of tasks that AI can do will expand dramatically because of agentic workflows”

Sounds good, so what is an agentic workflow?

Well, we can assign Large Language Models like ChatGPT to act as agents capable of understanding and executing tasks on our behalf. We prompt the chatbot with our objective, just as we would in a conversation. Then it can either use its reasoning powers to devise a solution, or step through a prescribed workflow in managed steps. The prescribed workflow would be almost the same as any you would present to a person.

To be more specific, Microsoft Co-Pilot is an AI agent, albeit with a low level of ‘agency’. Nevertheless, it has the main components of an AI Agent:

LLM
- To understand and discuss the task in English
Data
- Access to business data, within the user’s permissions
Tools
- Access to coding tools, functions or an API. Just like we need access to calculators, AI benefits from being a ‘reasoning engine’ which can call lower level code to reliably process data on its behalf
Environment
- This would be Excel or Powerpoint for MS Copilot. But does not need to be Microsoft. It could be a CRM system, ERP, coding environment or whatever the source and sink of processed data should be.

We then give the agent a prescribed workflow, a sequence of steps to complete with the tools at hand. For example, for a marketing AI agent...

Search the internet for these competitors [names here]
Summarise the information
What could be better about the summary? Make those improvements.
Compare the competitors on these dimensions [names here]
Plot the comparison on a spider graph

The agents complete the free text tasks, eg summarisation themselves, as you would on chatgpt. The agents ‘call’ tools such as code or google to complete tasks like plotting charts, making fair comparisons or searching the internet. They are trained in what tools they have available and how to ‘call’ a tool.

Most current usages of AI agents demand strict prescribed processes to be followed. But the future will belong to AI agents collaborating in teams.

Agents Work Well in Teams, Like People

The simplest team is where one agent is a critic of the other, encouraging the acting agent to reflect on their plan before enacting it. This is really just a prompting strategy, increasing the quality of outputs.

Agents have an important future in more complex teams, because teamwork enables agents from different businesses or departments to collaborate on objectives, each with their own specialist data and tools. Frameworks like Microsoft Research’s AutoGen, CrewAI or LangGraph enable us to structure the workflows for teams, or for agents working in strict process flows.

These workflows can optionally have a ‘human in the loop’ to participate in the agent’s conversation, or to approve the final output. But they are capable of conversing with themselves to get the job done.

Large Language Models, like ChatGPT, are productive ‘Reasoning Engines’, within limits.

There are currently limits to what agents can do for us. They can automate tasks but they cannot automate entire jobs, at least not yet. This is a limitation of the reasoning style of Large Language Models (LLM’s).

Currently LLM’s are useful for reasoning only within the confines of their training, they learn rules of thumb, they do not learn generalised theories.

They do not simulate the situation facing them in order to plan actions, instead they are ‘gut players’ on auto pilot. If you’ve ever been deep in thought whilst driving then you have experienced being under your own autopilot. It is effective, but has limits.

This insight about LLM’s is better summarised by the title of a research paper…

“LLM’s are In Context Semantic Reasoners not Symbolic Reasoners”.

Intuition is the more common name for ‘in context reasoning’. Its those countless rules of thumb, i.e. experience, that are hard won and allow us to work quickly. But, they can let us down when taken out of context. The Nobel prize winning psychologist Daniel Kahnemann might say LLM’s exhibit System 1 (thinking fast, intuition) not System 2 (thinking slow, reasoning).

In our own lives, we do not generalise those intuitive learnings into a universal theory, usually we don’t need to. Therefore, these rules can lead us astray when we attempt to apply them outside the context in which they were learnt. The same is true of LLM’s, whose only ‘experience’ of the world is through text.

Good Prompting Is a Genuine Skill

The best ‘reasoning’ from agents is achieved when we ask them to do the right thing. Sounds obvious, words matter, but is instructing ChatGPT actually a skill? I was derisive at first, then I discovered what an enormous difference a well crafted prompt can make.

The internal network of an LLM resembles the rises and shallows of a golf green. A well crafted prompt can shape the green in our favour. The prompt does the landscaping, establishing the best context for the ball to reach the hole.

Having no prompting strategy is like putting on a rough green. The ball frustratingly rolls around the hole but not into it.

Build a prompting strategy into your AI conversations and project. It allows you to garner the range of performance afforded by the prompt as opposed to either the data or the model’s underlying ability. Experimenting with prompts is easier than building new data pipelines or models:

https://ai.google.dev/docs/prompt_best_practices

Today’s AI is the Worst You Will Ever Use

The above advice is valid for the time being. AI researchers are keenly aware of the shortcomings of their Large Language Models. New models are on the way from all the major companies. When the next generations arrives, probably this year, then I’ll have to rewrite this article!

Meanwhile, experiment today knowing there will be greater brains only months down the line.