Written on

Building Teams from AI Agents

You become the manager: you set the objective. Three frameworks — AutoGen, MetaGPT and ChatDev — make this real, with agents planning, writing code in Docker and fixing their errors. But LLMs hallucinate. So which tasks can you trust them with?

Building Teams from AI Agents

In the first blog we considered teams of AI agents and how they invoke the increase in productivity Adam Smith observed from specialised teams of workers.

In the second blog we looked at the four key steps to autonomous agents (Agentic AI) which were required to to deliver on the promise.

Now we have the tools, let's look at how to manage teams of agents. We will focus on software development with agent teams, but many other applications are possible.

We, the human user, become the software development manager. We specify the objective, who we want on our team and the collaboration methodology (waterfall / agile / etc). The three leading tools are:

Microsoft AutoGen

“a framework that enables development of LLM applications using multiple agents that can converse with each other to solve tasks…seamlessly allowing human participation”

MetaGPT

“provides the entire process of a software company along with carefully orchestrated standard operating procedures"

ChatDev

“a virtual software company that operates through various intelligent agents holding different roles”.

The first task is to establish the team, all the frameworks allow control over what kind of team you need, or even employ a team of teams:

Each agent is an LLM instance, adopting roles like CTO, Programmer, or Reviewer. Roles have a description, a prompt describing their responsibilities. Teams typically share one team memory of progress and prompts. Agents are capable of planning, using diagrams, before they set down code, as per this example from MetaGPT:

Most diagrams and plans can easily be written by an LLM in code format and presented visually, as above, by tools such as PlantUML or GraphViz.

Having made their plan, they set to work. In the below example, note how they test their code then collaborate to correct errors (in red):

These team management frameworks can accommodate a human in the loop, as if we were an agent like any other. Hence teams can iteratively make proposals for our feedback and approval.

Conversational Frameworks

We are free to configure any conversational pattern so the team is optimised for the task. Critically, an agent can be an environment, managing the rules and state of a simulation or game in which the other agents act, for example, an agent can adopt the role of a chess board for other agents to compete in.

A New Recruitment Industry?

OpenAI are close to opening an ‘AgentStore’, a recruitment site for specialised agents. They can then be employed in teams and the more they are employed, the more they learn.

“If It Aint Tested, It’s Broken”

Hallucination and trustworthiness are correctly cited as serious problems with the generative pretrained transformer (GPT) architecture of today’s LLM’s. Meta's Yann LeCun has been particularly vocal about this, alternatives are under research.

There are many techniques to uncover factual errors or inconsistencies, but all are fallible, there is no algorithm for truth. All of the frameworks feature self reflection and a tester agent. Autogen and ChatDev teams execute their own code in Docker, review errors and rectify. Where user feedback is required then the LLM’s can adopt personas of various users and review their own application, see RecAgent.

These developments are encouraging but assume we present precise SMART objectives, as opposed to vague aspirations. As with ChatGPT, our dialogue is only as enlightening as our question permits, echoing Douglas Adams in ‘Deep Thought’s answer to life the universe and everything.

LLM teams can extend beyond coding to advise on any subject; business strategy, agriculture, logistics, law, accounting, medicine etc. LLM’s are fallible, as per self-driving vehicles, any mission critical application would need hard evidence of performance consistently better than humans. To do this, the teams will need simulation or scenario testing environments to trial their proposals. Only applications where such trials are possible will find agent teams trustworthy.

However, not all tasks are so critical. For example, teams can be configured to create content; writers, illustrators, marketers and SEO specialists in co-operation to create and promote on behalf of resource strapped businesses.

Want to Know More ?

For much more detail and a theoretical basis of agentic AI and teams of agents, see:


This article was first published on Medium on 27 November 2023: How Professional Can Agentic AI Teams Get?.

Related posts

See all posts
New Economics of the Agentic Firm

New Economics of the Agentic Firm

When cognitive labour was scarce, firms built processes to protect it. Now agents generate drafts and fixes faster than humans can inspect, and review becomes the bottleneck. If intelligence is suddenly cheap, what becomes scarce inside the firm, and what must leaders redesign?

When Attention Is Automated, Judgment Becomes the Advantage

When Attention Is Automated, Judgment Becomes the Advantage

METR's data shows the length of tasks AI completes reliably doubling every seven months; humans hold focus for twenty minutes. Attention, scarce for fifty years, is becoming abundant. If attention is no longer the constraint, what is, and where should you deploy agents?

How to Reinvent Markets with Agents?

How to Reinvent Markets with Agents?

A new MIT and Harvard report reaches back to Coase's 1937 theory of the firm and Gale-Shapley matching to argue the real prize isn't automating old processes. The boundaries between firms and markets start to move. So where does that leave your competitive advantage?