THE BRIEFING
Brick by brick, the house of AI × bio is being built - sometimes literally.
☑️ Medra opens what it calls America's largest autonomous lab in San Francisco on April 21.
☑️ NVIDIA and partners pushed 1.7 million protein complex predictions into the AlphaFold Database because the computing infrastructure finally caught up.
☑️ Anthropic's scary good (supposedly, at least) Claude Mythos matched the best human scientists on a real biological design challenge at Dyno Therapeutics.
☑️ And Genentech's CompBioBench offers something the field doesn’t have enough of: a way to rigorously measure whether AI agents can actually do computational biology, with ground truth answers in a domain where ground truth is hard to come by.
But infrastructure only matters if it produces results. Our lead story is one: an Oxford AI that reads the fat around your heart in a routine CT scan and predicts heart failure five years before symptoms appear.
Let's dive in.
AD
The Future of AI in Marketing. Your Shortcut to Smarter, Faster Marketing.
Unlock a focused set of AI strategies built to streamline your work and maximize impact. This guide delivers the practical tactics and tools marketers need to start seeing results right away:
7 high-impact AI strategies to accelerate your marketing performance
Practical use cases for content creation, lead gen, and personalization
Expert insights into how top marketers are using AI today
A framework to evaluate and implement AI tools efficiently
Stay ahead of the curve with these top strategies AI helped develop for marketers, built for real-world results.
NEWS
AI reads heart fat to predict failure five years early
An AI tool developed at the University of Oxford can predict a patient's risk of heart failure at least five years before symptoms appear - using routine cardiac CT scans that are already being performed. The system, led by Charalambos Antoniades, professor of cardiovascular medicine at Oxford, detects textural changes in the fat surrounding the heart that are invisible to the human eye, then outputs a risk score with no human input.
Trained on 59,000 patients and validated on a further 13,000 across nine NHS trusts with up to a decade of follow-up, it predicted heart failure with 86% accuracy. Patients in the highest-risk group were 20 times more likely to develop the condition than those in the lowest. Notably, the score has almost no correlation with body weight - this is not just detecting obesity, but a subtler remodeling of heart fat that signals early cardiac distress. The findings have been published in The Journal of the American College of Cardiology (JACC).
Why it matters: Around 350,000 cardiac CT scans are performed each year in the UK to check for blocked arteries. Adding this tool to existing radiology workflows would turn each one into a heart failure early-warning system with no extra imaging and no extra cost. The team is seeking NHS regulatory approval and is also working to adapt the system for any chest CT scan - not just cardiac ones.
Did you know? The tool uses software from Caristo Diagnostics, an Oxford spinout co-founded by Antoniades.
NEWS
Claude Mythos matches the best human biologists on a real sequence design challenge

Dario Amodei in a video released by Anthropic, accompanying the launch of Claude Mythos.
Dyno Therapeutics, the Watertown-based company using AI to solve gene therapy's delivery problem - engineering the viral capsids (protein shells) that carry therapeutic genes into specific tissues - gave Anthropic's Claude Mythos Preview its machine learning hiring challenge. The task: analyze a dataset of synthetic RNA sequences, build models to predict their function, and design new ones. Dyno has used this open-ended take-home to evaluate scientists since 2019. Fifty-seven people have attempted it, many from top PhD programs, many of whom went on to lead at companies including EvolutionaryScale and Generate Biomedicines.
Mythos Preview exceeded the 75th percentile of those candidates on both prediction and design, and hit the 90th percentile on prediction alone. But the headline finding, in Dyno's own words:
“While still not surpassing the very top human performer in every dimension, the best-performing models are indistinguishable from the best-performing human candidate on the combination of criteria. That is a striking result. For a well-defined, standard biological design problem, models are no longer just assisting the work. They are beginning to reason at a level that routinely outperforms the best human scientists out there.”
Anthropic's system card frames this capability as more than a benchmark win - it treats sequence-to-function modeling as an early indicator of broader biological design capability, and flags that such capability can propagate risk across downstream threat pathways. Dyno endorses this framing, writing that progress “has to be paired with serious evaluation, uncontaminated benchmarks, and grounded ways of measuring what models can actually do.”
Why it matters: This is a general-purpose AI - not one trained on biology specifically - performing at expert level on a real biological design task. And Dyno's blog pushes past the benchmark itself. The argument: a workflow that used to require multiple expert scientists can increasingly be compressed into a single agentic process. The human role could shift from executing each step to defining the right problem, choosing objectives, and deciding what should be built and validated in the real world.
Did you know? Dyno's AI-designed capsids are already in commercial use. On April 8, Astellas licensed a Dyno-engineered capsid for skeletal muscle gene therapy - the second such license after Roche took one for neurological diseases in 2025. Dyno's Psi-Phi protein design tools, which BAIO noted in Issue 10, are available through Claude Code and on Hugging Face.
NEWS
Genentech benchmark shows AI agents can do real computational biology

Agents navigating a biological maze in search of the ground truth, as imagined by Nano Banana 2.
The Dyno result above tested one well-defined design challenge. But how do AI agents handle the messy, varied work computational biologists actually do every day? Researchers at Genentech and Roche built CompBioBench to find out - 100 tasks spanning genomics, transcriptomics, single-cell analysis, human genetics, and machine learning, each with a single verifiable answer.
The tasks aren't textbook exercises. They include detecting contaminating species in RNA sequencing data, spotting sample swaps in axolotl tissue labels, inferring ancestry from personal genetic variants, and optimizing mRNA sequences using published machine learning models. Agents started with a bare-minimum computing environment and only the input files - they had to find, download, and install everything else themselves.
The top scores: OpenAI's Codex CLI hit 83%, Anthropic's Claude Code hit 81%. Without agentic capabilities - just asking the raw language model - accuracy dropped to around 5%. On the hardest problems, Claude Code actually led, 69% to 59%.
The failure patterns described in the preprint are also interesting. Agents don't necessarily fail because they lack the knowledge. They fail because they get sidetracked by plausible-looking approaches and stop prematurely. In one task, detailed in the paper, short DNA sequences could plausibly match multiple locations in a synthetic genome because the chromosomes were deliberately designed to look similar. The correct solution required writing custom code to account for that ambiguity. Claude Code solved it once out of three attempts - in the two failures, it assigned each read to its most likely location, got a plausible-looking answer, and stopped. The authors call this “brittleness”: not an inability to solve the problem, but premature stopping after a superficially plausible analysis.
Why it matters: The scores aren't really the point. Biology has lacked what coding and mathematics have had for years: benchmarks rigorous enough to tell us whether AI systems can genuinely reason about the underlying science. The core difficulty is that real biological analysis is noisy and open to interpretation, making it hard to construct problems with unambiguous answers. CompBioBench solves this through synthetic and augmented data - mixing species reads in specific proportions, swapping tissue labels, scrambling metadata - to create problems that require multi-step reasoning but have a single ground truth.
NEWS
Medra opens what it calls America's largest autonomous lab

Credit: Medra
Medra, a San Francisco AI and robotics startup, opens its first autonomous laboratory on April 21 - a 38,000-square-foot facility in the city's SoMa district that the company says will be the largest of its kind in the country. The lab, called ML001, will house general-purpose robots that execute experiments end-to-end, interfacing with standard lab instruments while scientists direct workflows in plain English. A companion AI system interprets results and adjusts protocols, creating what Medra calls a closed loop: design experiments, run them, learn from the results, redesign.
In an X post promoting a teaser video for the lab opening, CEO Michelle Lee, who holds a PhD from the Stanford AI Lab, said: “The next industrial revolution isn’t software. It’s science.”
Medra raised a $52 million Series A in December 2025 led by Human Capital, with Lux Capital, Menlo Ventures, and others participating. The company has attracted pharma partners including Genentech. Patrick Hsu, co-founder of Arc Institute, endorsed the approach at the time of the raise, noting that AI models now “generate predictions far faster than they can be validated experimentally and that most lab automation is too rigid to keep up.”
Why it matters: One recurring bottleneck in AI-driven biology is that someone still has to walk to a bench and actually do the lab work. Predictions are scaling exponentially; physical experiments aren't. Medra is trying to close that gap.
Did you know? Medra is hiring.
NEWS
NVIDIA's GPU muscle powers 1.7 million protein complex predictions into AlphaFold database

Credit: NVIDIA
AlphaFold changed biology by predicting the shapes of individual proteins. But proteins rarely work alone. They pair up, forming complexes that carry out most biological functions - and predicting how two proteins fit together is a much harder problem than predicting either one in isolation. A collaboration between NVIDIA, EMBL-EBI, Google DeepMind, and Seoul National University has now added 1.7 million high-confidence predicted protein complexes to the AlphaFold Database, with millions more coming. It is the largest open dataset of protein complex predictions available.
NVIDIA recently published a detailed technical blog describing how they built the pipeline - and why this hadn't been done before. Predicting the shape of a single protein is expensive. Predicting millions of protein pairs is a different order of problem entirely. The team ran the work on clusters of H100 GPUs, used accelerated tools for the most time-consuming steps, and split the pipeline into independent stages to keep GPUs busy rather than sitting idle. EMBL-EBI says the infrastructure overcame limitations that historically made calculations at this scale impractical.
Why it matters: The AlphaFold Database already has 3.4 million users in 190 countries. This update gives them a resource that didn't exist before - predicted structures for the interfaces where most biological action actually happens.
Did you know? NVIDIA's technical blog doubles as a step-by-step guide for teams wanting to run protein complex predictions at scale, covering pipeline design, GPU orchestration, and accuracy validation. The partnership has calculated 30 million complexes total - the 1.7 million in the database are the high-confidence subset.
THE EDGE
Paperclip lets AI agents search, filter, and read across 8 million-plus scientific papers - directly from the command line. What makes it different from other literature tools: results stick around between queries. An agent can pull 50 papers on a topic, then search within that same set for a specific term, narrowing down without losing context. It can also scan figures using a vision model and run structured queries on metadata like author, date, or journal. Built by Generative Expert Labs (GXL), it installs in one line or plugs into Claude Code as a tool the agent can call.
ON OUR RADAR
Until next time,
Peter at BAIO



