Where's the Frontier in Agentic AI?

UC Berkeley (San Francisco) held their second Agentic AI summit in August. The bay area's most connected players (Google, OpenAI etc) and frontier researchers (Princeton, Berkeley etc) gave two days of presentations on where they see the near future of agentic AI.

So what can we learn from them? What is the near future of agentic AI?

There are 18hrs of video to catch up with, below are my highlights. My favourite being Chi Wang's MassGen for agents to achieve state of the art intelligence in a democratic team, and the Linux Foundation's AGNTCY,standards for the 'Internet of Agents'.

Sherwin Wu, Head of Engineering OpenAI API

OpenAI's priorities for agents via API...

Richard Socher, Founder You.com

You.com moving into the 'Context as a Service' space. Important that he mentions 'simulations', evaluation of agent performance can often be done well in simulation, which Google demonstrated earleir this year with their AI CoScientist.

Michele Castasta, President at Replit

Agents are increasingly able to complete tasks over long time horizons, think of how DeepResearch works over 10minutes as opposed to ChatGPT 3.5 struggling to retain a conversation. This trend will continue and agents will work over tasks requiring hours of focus, but at multiples of the speed that humans would execute them.

However, to ensure their success we must provide the right information for the long task, i.e. 'context management'

Arvind Jain, Founder Glean.com

Businesses drowning in options for implementing AI, but a strategy for success is harder to come by. I see this everywhere, although it is not surprising for such a new technology.

May Habib, Co Founder & CEO Writer.com

Org charts become dynamic when we have people and agents collaborate on tasks.

Important when rethinking how organisations will function when they are collaborations between people and AI.

In many ways the orchestration graph is already a truth in companies, its how people talk with each other. Reporting lines in org charts never captured the real conversation, but agentic workflows must explicitly replicate that flexibility.

Sergey Levine, CoFounder Physical Intelligence, Assoc.Prof UC Berkeley

Talked about training agents for specific jobs by using a technique common in Machine Learning called 'reinforcement learning'. However, it requires lots of training examples and we rarely have such clean data to teach agents how to do an office job. We usually have only the final result, a document or product, not a record of the process that got us there. But we do have proxy data; conversations, emails, etc. Sergey then gives a mechanism for the agents to learn and outperform people by using such proxy data, despite it being relatively disorganised. The race is on to collect that data for professional jobs.

Ranjan Sinha, IBM Fellow, CTO, IBM Research AI

Proposed the NLIP schema for all multi modal LLM’s to use a single message standard. This would make it much easier for agents on different LLM's to collaborate.

Chi Wang, Founder of AutoGen/AG2, Senior Researcher at DeepMind

Chi launched MassGen, a multi agent collaboration tool which takes inspiration from Google’s DeepThink, which recently won IMO Gold. “ MassGen is a cutting-edge multi-agent system that leverages the power of collaborative AI to solve complex tasks.”

Takes inspiration from Chi’s earlier article Myth of Reasoning, where he discusses that reasoning is a messy process which is only formalised and structured in hindsight.

Essentially its a framework for a mixture of LLM's to discuss their common objective and conduct a series of votes to decide the best solution. Getting opinions from multiple unrelated LLM's achieves state of the art intelligence.

Papi Menon, CPO Outshift (Cisco), AGNTCY (Linux Foundation)

Outshift and a sister organisation, AGNTCY (a Linux foundation project), are working towards standards for the ‘Internet of Agents’, encompassing existing agent collaboration standards such as MCP and A2A.

Those are only the beginning for ai agents to collaborate across the entire internet: “The AGNTCY project provides the complete infrastructure stack for agent collaboration—discovery, identity, messaging, and observability that works across any vendor or framework. It is the foundational layer that lets specialized agents find each other, verify capabilities, and work together on complex problems.”

Jay Rodge, Developer Advocate NVIDIA

Jay continued the 'internet of agents' / multi agent collaboration theme with tools for managing and observing teams of agents. This is especially tricky where each agent was created with a different agent framework, such as LangChain or LlamaIndex. Good to see Nvidia liberating agents from their framework, so they can easily collaborate with any other agent,

Nandi Subhrangshu, Senior Staff Scientist for Applied AI at Amazon

Much of the conference was about evaluations of agents. How do they perform when given ’Standard Operating Procedures’ (SOP) to execute? Afterall, this is how most agents in an enterprise environment which be required to work, yet they SOPs are written for humans.

A simple example is a shipping company classifying whether a product is ‘dangerous goods’ or not. Sounds like their are straight forward rules for such a task, but turns out that people are using fairly nuanced logic to accomplish this.

Nandi presented a method for companies to create Benchmarks and test their agents against real industrial SOPs

Xifeng Yan, Professor, Uni.California at Santa Barbara

Xifeng argues it is too hard for non technical users to create their own agents. So he presented a simple standard called ADL (agent declarative language) as part of the MICA project.

It uses a declarative language, looking like a list of bullets, whereas software developers currently use a coding language, python, which is much less accessible.

We all like simple! Exactly the kind of standard that could get traction

Sam Rodriques, CEO & CoFounder of FutureHouse

A new a more reliable form of agentic deep research (like ChatGPT's research tool) which is based on a recent scientific paper called PaperQA, which was open sourced. Sam demonstrated how it was used to create a knowledge base on gene function by reviewing a million papers and was more accurate than people at this task.

Claimed there is money lying around to be picked up by such research agents simply because the vast research in many subjects has not been synthesised into digestible form. Only the recent invention of agents allows such volumes of knowledge to be organised, and we have only just begun the process.

Bo Pan, Dept. Computer Science Emory University

Can LLMs become faster at reasoning through recurrent exposure to relevant tasks, and if so, how can it be achieved? This belongs to a wider theme at the summit, to develop agents that can learn, but do so simply.

Emory's Paper shows that agents which have their relatively simple mechanism for remembering learnings, ‘in context learning’, experience greater improvements in both efficiency and accuracy than supervised fine-tuning. They call their framework ‘SpeedUpLLM’.

Aashutosh Nema – Data Science Consultant, Dell Technologies

"From Hype to Impact”, Aashutosh discussed Dell’s learnings scaling up agentic solutions. Good summary of common challenges to adoption:

Michelle Tabart – Principal UX Researcher at Salesforce

Salesforce nicely summarised concerns that agentic projects which are intended to save time are tackling poorly defined business problems, where the process is unclear or data is difficult to find or trust. So where projects are failing to achieve returns it is common to find the real human effort to achieve the task is also not well understood.

Full videos of Berkeley's Agentic AI Summit - August 2nd 2025

Watch recordings: https://rdi.berkeley.edu/events/agentic-ai-summit

Find what you need