Written on

Agentico Teams Up with Apart Labs for AI Research

Agentico is committing one day of staff time a week to Apart Lab's AI safety and interpretability research. With models like OpenAI o1 now matching physicians on reasoning, advising on AI has become less like automation engineering and more like recruiting.

Agentico Teams Up with Apart Labs for AI Research

We're excited to announce that Agentico is contributing one day of staff time per week to Apart Lab's program of AI safety and interpretability research.

Why engage in research now? Because advising companies on intelligent AI demands more than being an automation engineer. Such is the intelligence of the latest models, such as OpenAI o1, that we are effectively recruiters for AI, facing the same issues as when hiring people:

  • What does the agent know about its own knowledge, is it's confidence well placed?

  • What is the agent thinking below the surface, is it aligned with us or quietly disruptive?

  • How does the agent prioritise information, are there hidden biases or blind spots?

These will be unavoidable questions for all businesses as agents become increasingly entrusted to carry out tasks. Ever more intelligent agents will get more done, yet will raise questions of alignment with our interests. Most especially when the agent is assisting with consequential tasks, even those overseen by a human; medicine, banking, accounting, law etc.

Some may eschew agents altogether, but over the coming years this will become ethically problematic and commercially awkward.

For example, agents already diagnose patients as accurately as many doctors [1,2]. Human doctors remain essential but who wouldn't want their harried A&E doctor to have the option of a second opinion from a proven AI? An AI which has none of the competing duties of the doctor.

There are enormous benefits to be had, but benefits always come with risks. We have the duty and opportunity to monitor and hence control those risks.

1. OpenAI o1: " Superhuman Performance of a LLM on the Reasoning Tasks of a Physician". Harvard Medical School. Dec 2024

2. OpenAI GPT-4o: " LLM Influence on Diagnostic Reasoning, A Randomized Clinical Trial". University of Virginia Health Sys. Nov 2024.

Related posts

See all posts
We Under-Imagined the Zombie

We Under-Imagined the Zombie

Anthropic measured something like emotions inside Claude — not mimicry, but representations that direct it. Intervene and the behaviour changes. The debate fixates on one question: is AI conscious? This suggests both sides ask the wrong thing. What does it mean for business?

Understand, Edit & Steer AI - via API

Understand, Edit & Steer AI - via API

At a Goodfire and Apart hackathon, our team built a tool to catch hallucinations in medical diagnostic AI, then steer the model around them neuron by neuron via a simple API. What does it mean when a business can edit a model's internals this easily?

Commercial Rewards from AI Safety

Commercial Rewards from AI Safety

Human-in-the-loop oversight doesn't scale — people suffer 'vigilance decrement', and nobody could review the thousands of lines of code GPT-5 writes daily. Meanwhile AI safety researchers build techniques where systems check each other. So how do these methods become a commercial edge?