How Did Google's AI Agent Achieve Months of Work in 2 Days ?
- olivermorris83
- Mar 5
- 5 min read
Updated: Mar 6
Recent months have seen OpenAI, Perplexity and Google launch "deep research" agents that, given an objective, will search and analyze hundreds of sources to produce evidenced findings in minutes. These agents employ the compute and search resources of hyperscalers to produce an service with remarkable value compared to carrying out these tasks by hand.
In February 2025, Google launched their AI Co-Scientist, a further leap in the scale of compute and data available to agentic AI. The AI Co-Scientist is a team of agents collaborating to answer a query over hours or days. In one example it accomplished in two days what human researchers took months to discover.
This demonstrates agentic AI is approaching expert levels of contribution to real work. Google published a paper on the AI Co-Scientist which provides our first glimpse into how these "super agents" create substantive value - with fascinating lessons for business.

What is 'Substantive Value' in Agentic AI?
Before diving deeper, let's clarify what we mean by "substantive value" in agentic AI. Over the past couple of years agentic AI has been employed for process improvements:
Increased efficiency
Automating routine tasks and streamlining workflows
Improved customer experience
Personalizing interactions and fulfilling customer needs
With the AI Co-Scientist are now seeing agentic AI with use cases in:
Enhanced decision-making
Providing data-driven insights and predictions
Novel insights and discoveries
Enabling breakthroughs in science and technology
We use 'substantive' to mean value that goes beyond the gains of process improvements to create outcomes that meaningfully advance knowledge, solve complex problems, or generate novel insights that would typically require high-level human expertise.
Google's Co-Scientist demonstrates such substantive value. In the research paper, its conclusions on bacterial gene transfer mechanisms matched those in previously unpublished findings by PhD level researchers.
The Anatomy of Value Creation in Agentic AI
The AI Co-Scientist is a multi agent system, a powerful approach to orchestrating LLM's which we have discussed many times on this blog:

A Case Study in Agent Architecture
Let's dive a little deeper into how the AI Co-Scientist works. The AI Co-Scientist is an advanced implementation of multi-agent architecture, with specialized agents handling different aspects of the research process.
The below table lists each agent and their tasks. It's a long table ! (click to expand).
The important thing is to note how each agent effectively has a job description. The instructions behind each sub task, not detailed here, comprise substantial prompts.

Recurring Patterns
Analyzing how these agents co-operate reveals a handful of recurring patterns:
Iterative Refinement
Multiple agents build upon each other's work through progressive stages, enabling deeper exploration but consuming significant compute.
Rigorous Verification
The Reflection Agent cross-checks findings with external sources to ensure accuracy, requiring specialist data access and advanced search capabilities.
True Agency
Agents autonomously coordinate their efforts, independently determining which tasks need attention and how to approach them.
We don't need to micromanage, to anticipate all possible situations in advance. We permit intelligent tools to resolve issues and verify outcomes afterwards.
Competitive Evaluation
The Ranking Agent runs "tournaments" between competing hypotheses, scoring them on logical strength to identify the most robust insights.
The Resources for Substantive Value
The above patterns rely on the following chargeable resources, all varieties of compute and search, which are the increased investment required to achieve the substantive value:
Compute for LLM Inference
Immediate LLM responses, used in hypothesis generation, simulated debates etc
The cost of processing the text entered to the LLM and the final response out
Search over web + data repositories
Employed in literature exploration, web searches, data extract, similarity analysis
Compute for LLM Reasoning
Employed in planning, deep verification, hypothesis tournaments
Often called 'test time' compute, this is the 'thinking' tokens used by the LLM
Compute for Evaluation & Simulation
Not compute for LLM's, but for traditional analytics. Utilized in tournament results, cluster mapping, and meta-review critiques. Used by the LLM as required.
Limitations
Before we get carried away, be aware that Google concede the AI Co-Scientist has limitations:
Information Access Gaps
Limited to open-access literature, missing paywalled content
Technical Shortcomings
Poor interpretation of visual scientific data and integration with specialized tools
Evaluation Problems
Preliminary assessment methods producing outputs below publication standards
Inherited LLM Flaws
Factual errors and biases from underlying information sources may not be challenged
And they propose future enhancements:
Implement reinforcement learning for better hypothesis generation
Integrate images and databases beyond text
Develop lab automation integration
Create better interfaces for human-AI collaboration.
AI Co-Scientist Reimagined for Business
Google's approach could work for many tasks in industry. The limiting factor is likely to be the availability of data to anchor the agent team, to test their hypotheses and ensure they remain on task for long durations. Larger businesses are likely to have this data, smaller businesses may initially be at a disadvantage.
For example, let's consider marketing content creation, already a common use case for AI, but ripe for true AI collaborators. A multi-agent system could research market trends, analyze competitor messaging, draft multiple content variations, and test them through simulated audience reactions. This would allow marketing teams to focus on strategic decisions rather than execution details, while producing more effective, data-driven content.
Other tasks may also be appropriate, Project Management, HR Recruitment & Onboarding, etc.
The Transformation of Knowledge Work
What effect would a workplace occupied by such agents have on people?
The current generation of AI systems is already reshaping how knowledge workers operate, let alone agent teams like the AI Co-Scientist. A recent Microsoft study found that office work is shifting in three fundamental ways:
From information gathering to information verification
From problem-solving to AI response integration
From task execution to task stewardship
There are two analogies for workers employing such agents;
The overseer of a self driving vehicle, skills may atrophy over time
The team manager, competition with other managers drives new critical thinking skills
This subject deserves a blog post of its own. For now we simply note that Microsoft and Google are funding research into these obvious problems.
The Future of Substantive AI Value
The AI Co-Scientist offers a glimpse into how truly valuable AI systems will operate. It favours companies which have, or can acquire, repositories of high quality data relevant to the tasks they wish to automate.
Rather than providing quick, superficial responses, substantive AI combines detailed prompts, iterative refinement, fact-checking, self-orchestration, and competitive evaluation to produce insights that would take humans significantly longer to develop.
As these systems continue to evolve, we can expect them to handle increasingly complex research tasks while human knowledge workers shift toward verification, integration, and stewardship roles. The key challenge for organizations will be designing workflows that drive competition and engage people.
The future belongs not just to organizations that adopt AI, but to those that deliberately architect systems to generate substantive value through thoughtful investment in inference compute, test time reasoning, knowledge search, and compute for evaluation.
Comments