top of page
Search

How Did Google's AI Agent Achieve Months of Work in 2 Days ?

Updated: Mar 6


Recent months have seen OpenAI, Perplexity and Google launch "deep research" agents that, given an objective, will search and analyze hundreds of sources to produce evidenced findings in minutes. These agents employ the compute and search resources of hyperscalers to produce an service with remarkable value compared to carrying out these tasks by hand.


In February 2025, Google launched their AI Co-Scientist, a further leap in the scale of compute and data available to agentic AI. The AI Co-Scientist is a team of agents collaborating to answer a query over hours or days. In one example it accomplished in two days what human researchers took months to discover.


This demonstrates agentic AI is approaching expert levels of contribution to real work. Google published a paper on the AI Co-Scientist which provides our first glimpse into how these "super agents" create substantive value - with fascinating lessons for business.



Source: OpenAI's DeepResearch
Source: OpenAI's DeepResearch

What is 'Substantive Value' in Agentic AI?


Before diving deeper, let's clarify what we mean by "substantive value" in agentic AI. Over the past couple of years agentic AI has been employed for process improvements:


  • Increased efficiency

    • Automating routine tasks and streamlining workflows

  • Improved customer experience

    • Personalizing interactions and fulfilling customer needs


With the AI Co-Scientist are now seeing agentic AI with use cases in:


  • Enhanced decision-making

    • Providing data-driven insights and predictions

  • Novel insights and discoveries

    • Enabling breakthroughs in science and technology


We use 'substantive' to mean value that goes beyond the gains of process improvements to create outcomes that meaningfully advance knowledge, solve complex problems, or generate novel insights that would typically require high-level human expertise.


Google's Co-Scientist demonstrates such substantive value. In the research paper, its conclusions on bacterial gene transfer mechanisms matched those in previously unpublished findings by PhD level researchers.


The Anatomy of Value Creation in Agentic AI


The AI Co-Scientist is a multi agent system, a powerful approach to orchestrating LLM's which we have discussed many times on this blog:

A Case Study in Agent Architecture


Let's dive a little deeper into how the AI Co-Scientist works. The AI Co-Scientist is an advanced implementation of multi-agent architecture, with specialized agents handling different aspects of the research process.


The below table lists each agent and their tasks. It's a long table ! (click to expand).


The important thing is to note how each agent effectively has a job description. The instructions behind each sub task, not detailed here, comprise substantial prompts.


Click to expand
Click to expand

Recurring Patterns


Analyzing how these agents co-operate reveals a handful of recurring patterns:


  1. Iterative Refinement

    1. Multiple agents build upon each other's work through progressive stages, enabling deeper exploration but consuming significant compute.


  2. Rigorous Verification

    1. The Reflection Agent cross-checks findings with external sources to ensure accuracy, requiring specialist data access and advanced search capabilities.


  3. True Agency

    1. Agents autonomously coordinate their efforts, independently determining which tasks need attention and how to approach them.

    2. We don't need to micromanage, to anticipate all possible situations in advance. We permit intelligent tools to resolve issues and verify outcomes afterwards.


  4. Competitive Evaluation

    1. The Ranking Agent runs "tournaments" between competing hypotheses, scoring them on logical strength to identify the most robust insights.


The Resources for Substantive Value


The above patterns rely on the following chargeable resources, all varieties of compute and search, which are the increased investment required to achieve the substantive value:


  • Compute for LLM Inference

    • Immediate LLM responses, used in hypothesis generation, simulated debates etc

    • The cost of processing the text entered to the LLM and the final response out


  • Search over web + data repositories

    • Employed in literature exploration, web searches, data extract, similarity analysis


  • Compute for LLM Reasoning

    • Employed in planning, deep verification, hypothesis tournaments

    • Often called 'test time' compute, this is the 'thinking' tokens used by the LLM


  • Compute for Evaluation & Simulation

    • Not compute for LLM's, but for traditional analytics. Utilized in tournament results, cluster mapping, and meta-review critiques. Used by the LLM as required.


Limitations


Before we get carried away, be aware that Google concede the AI Co-Scientist has limitations:


  1. Information Access Gaps

    • Limited to open-access literature, missing paywalled content

  2. Technical Shortcomings

    • Poor interpretation of visual scientific data and integration with specialized tools

  3. Evaluation Problems

    • Preliminary assessment methods producing outputs below publication standards

  4. Inherited LLM Flaws

    • Factual errors and biases from underlying information sources may not be challenged


And they propose future enhancements:


  • Implement reinforcement learning for better hypothesis generation

  • Integrate images and databases beyond text

  • Develop lab automation integration

  • Create better interfaces for human-AI collaboration.


AI Co-Scientist Reimagined for Business


Google's approach could work for many tasks in industry. The limiting factor is likely to be the availability of data to anchor the agent team, to test their hypotheses and ensure they remain on task for long durations. Larger businesses are likely to have this data, smaller businesses may initially be at a disadvantage.


For example, let's consider marketing content creation, already a common use case for AI, but ripe for true AI collaborators. A multi-agent system could research market trends, analyze competitor messaging, draft multiple content variations, and test them through simulated audience reactions. This would allow marketing teams to focus on strategic decisions rather than execution details, while producing more effective, data-driven content.


Other tasks may also be appropriate, Project Management, HR Recruitment & Onboarding, etc.


The Transformation of Knowledge Work


What effect would a workplace occupied by such agents have on people?


The current generation of AI systems is already reshaping how knowledge workers operate, let alone agent teams like the AI Co-Scientist. A recent Microsoft study found that office work is shifting in three fundamental ways:


  • From information gathering to information verification

  • From problem-solving to AI response integration

  • From task execution to task stewardship


There are two analogies for workers employing such agents;

  • The overseer of a self driving vehicle, skills may atrophy over time

  • The team manager, competition with other managers drives new critical thinking skills


This subject deserves a blog post of its own. For now we simply note that Microsoft and Google are funding research into these obvious problems.


The Future of Substantive AI Value


The AI Co-Scientist offers a glimpse into how truly valuable AI systems will operate. It favours companies which have, or can acquire, repositories of high quality data relevant to the tasks they wish to automate.


Rather than providing quick, superficial responses, substantive AI combines detailed prompts, iterative refinement, fact-checking, self-orchestration, and competitive evaluation to produce insights that would take humans significantly longer to develop.


As these systems continue to evolve, we can expect them to handle increasingly complex research tasks while human knowledge workers shift toward verification, integration, and stewardship roles. The key challenge for organizations will be designing workflows that drive competition and engage people.


The future belongs not just to organizations that adopt AI, but to those that deliberately architect systems to generate substantive value through thoughtful investment in inference compute, test time reasoning, knowledge search, and compute for evaluation.

 
 
 

Comments


Agentico Logo

Let's talk

© 2024 Agentico Ltd. All Rights Reserved.

Privacy Policy

Thanks for subscribing!

Registered

Agentico Ltd, Reg'd in England & Wales 15428063

ICO Reg'n ZB657122

Location

34-35 Butcher Row, Shrewsbury, Shropshire, SY1 1UW, United Kingdom

Phone

01952 928189

WhatsApp

WhatsApp QR code
Shropshire Chamber of Commerce
bottom of page