THE BRIEFING

Some issues are about one clean theme. This one is more like the field suddenly turning up the volume.

Biohub put $500 million behind virtual biology - the open data layer needed to build predictive models of the cell. Lilly signed a $2.25 billion deal for AI-designed gene insertion. And the FDA, alongside Massive Bio’s OpenAI collaboration, took aim at the clinical trial bottleneck.

Those are big signals.

But the smaller-looking stories (relatively speaking) matter too. ASI’s Alexandria shows what literature AI agents need to do before scientists can trust them with papers. GeneBench shows why the agents still fail like novices when the data gets messy.

Let’s dive in.

NEWS
Biohub puts $500M behind virtual biology

Credit: Biohub

Biohub launched the Virtual Biology Initiative, committing $500 million over five years to build the open data foundation for predictive models of the human cell. The split: $100 million for external research and $400 million for Biohub’s own data generation, imaging, engineering, and infrastructure work.

Biohub is saying the quiet part out loud: biology needs its very own version of scaling laws. AlphaFold was built on decades of shared sequence and structure data. A useful virtual cell - one that predicts how cells behave in health, disease, and after interventions - will need far larger, richer datasets, generated in years rather than decades.

“Accurate digital representations of the cell could reveal the mechanisms that are responsible for disease, and show how to reverse them”, Alex Rives, Head of Science at Biohub, wrote on X.

And look at the partner list: Allen Institute, Arc Institute, Broad Institute, Wellcome Sanger Institute, Human Cell Atlas, Human Protein Atlas, NVIDIA, and Renaissance Philanthropy. Biohub says the data it generates will be open and freely available.

This lands just a few days after BAIO’s special report on the state of AI x Bio - where we argued that data diversity may be the hidden scaling law. Biohub is now making that thesis institutional: better models will not be enough. Biology needs multimodal data across molecules, cells, tissues, perturbations, space, and time.

Why it matters: Like the Human Genome Project, virtual biology to solve complex diseases may require shared infrastructure at global scale - open data, common tools, coordinated measurement, and serious funders willing to build for years. Biohub is saying the route to ending disease runs through predictive models of the cell - and it is putting real infrastructure behind that claim.

Did you know: CZI’s Virtual Cells Platform already hosts open models, datasets, tools, and benchmarks for virtual cell research. Biohub is hiring.

NEWS
Lilly bets $2.25B on AI-designed gene insertion

Credit: Profluent

Profluent and Eli Lilly signed a multi-program partnership worth up to $2.25 billion to design custom recombinases - enzymes that can insert or rearrange DNA at specific genomic sites. Profluent is an Emeryville, California startup built around foundation models for protein design - AI systems trained on protein sequences to help design new new genome-editing tools. Last year, it published OpenCRISPR-1, an AI-generated genome editor, in Nature.

The target in the Lilly partnership is kilobase-scale DNA editing: edits large enough to insert long DNA sequences, sometimes entire genes. Many genetic diseases are not fixed by changing one letter. Some need the broken instruction replaced.

“Kilobase-scale DNA editing remains a holy grail in genetic medicine,” Profluent CEO Ali Madani said in a press release. “We believe only AI can create the designer recombinases needed to precisely target any location in the genome.”

Profluent brings the AI-designed knowhow. Lilly brings the drug-development engine for turning it into a therapy. That is the promise. The hard part is making those designer enzymes specific, efficient, and safe enough for human therapies.

BAIO has tracked Lilly’s AI build-out since Issue 3, from its NVIDIA co-innovation lab to the Prompt-to-Drug framework with Insilico. Issue 12 added Lilly’s $2.75 billion Insilico deal. This is a different layer of the stack: not AI for finding drugs, but AI for designing the machinery that rewrites biology.

Why it matters: Today’s most established gene editors are powerful, but still best at small changes. If AI-designed recombinases can reliably insert large DNA sequences at precise sites, gene editing could reach diseases that need whole genes restored or replaced - not just single letters corrected.

Did you know: Profluent is hiring.

NEWS
For this literature agent, images matter too

Alexandria is looking for clues. Credit: ASI

Scientific papers are not just prose. In biology, the evidence often lives in figures, tables, microscopy images, charts, and multi-panel layouts. Alexandria, a new literature AI agent from startup Applied Scientific Intelligence, built with NVIDIA, is designed around that problem.

Many systems treat a figure like one flat image. Alexandria treats it more like scientific evidence: something the agent can search for, zoom into, and link back to the exact paper and page.

“Most AI models see biology as a single signal. But cells speak in RNA, proteins, metabolites, structure, and clinical outcomes - all at once,” ASI CEO Aakaash Meduri told BAIO.

“We want to build the missing knowledge layer that finally brings every mode of biological truth together.”

ASI says Alexandria scores 62.5% on FigQA2, a benchmark for scientific figure understanding, ahead of Edison’s 58.1% and o3 Deep Research at 29.4%.

But the more interesting result may be where it still fails. In 30 of 33 always-wrong cases, Alexandria actually found the right paper, passage, and figure. The system knew where to look. The vision-language model, however, still misread the image. That suggests that if VLMs get better at reading crowded scientific plots, Alexandria should get better with them.

Why it matters: Literature agents are becoming part of the research workflow, but biology papers are built from evidence that is visual, scattered, and easy to misread. Alexandria’s failure analysis shows two bottlenecks coming apart: finding the right evidence and understanding it. The first part is starting to look solvable. The second now seems to depend on how fast vision-language models improve.

Did you know: ASI is taking early-access signups for Alexandria.

NEWS
AI agents still struggle with scientific judgment

Credit: Dima Solomin on Unsplash

OpenAI and Herasight released GeneBench, a new benchmark that asks AI agents to do realistic genomics analysis instead of answering isolated questions or working through carefully prepared exercises.

The benchmark includes 103 problems across 10 domains, from statistical genetics and functional genomics to cancer genomics, proteomics, clinical genetics, and epigenomics. Each task starts with messy data and minimal guidance. The agent has to clean the data, spot problems, choose the right analysis, revise when the first plan breaks, and produce a final answer that could inform a scientific or translational decision.

The scores are sobering. GPT-5.5 reaches a 25.0% pass rate at high reasoning settings. GPT-5.5 Pro reaches 33.2%. Gemini 3.1 Pro, the strongest external baseline reported, reaches 11.2%.

The interesting failure mode is almost human. The authors compare it to the difference between experts and novices: experts use experience to reframe the problem as new evidence appears; novices may notice the same clues, but fail to integrate them into the bigger picture. GeneBench sees something similar in agents. Models often detect measurement error, or other warning signs, then keep going down the wrong analysis path anyway.

BAIO covered Genentech and Roche’s CompBioBench in Issue 16, where agents tackled 100 computational biology tasks with verifiable answers. GeneBench feels like a close cousin: another attempt to move beyond biology trivia and ask whether agents can do the messy work of real analysis. The shared lesson is how they fail - by seeing clues, choosing plausible-but-wrong paths, and not revising when the evidence says they should.

Why it matters: Scientific agents will improve only if the field can measure the right thing. Benchmarks that reward clean answers to clean questions can make models look more capable than they are.

Did you know? OpenAI is hiring.

NEWS
AI comes for the clinical trial bottleneck

Marty Makary, head of the FDA, announces real-time clinical trials.

Clinical trials remain the mother of all bottlenecks: slow, expensive, paperwork-heavy, and still weirdly manual.

Two announcements this week point at that same pressure point. Massive Bio is working with OpenAI to turn trial rules into structured patient-matching systems, so cancer patients can be screened against open studies faster and at larger scale. The FDA is launching a real-time clinical trials pilot, where regulators can monitor aggregated trial data while studies are still running instead of waiting for the final submission package.

The common idea is simple: trials should be less like PDFs passed between humans and more like systems that machines can read, query, and update. That matters because trial failure is not only about bad drugs. Patients fail to enroll. Patient selection is imperfect. Signals arrive late. Protocol complexity eats time.

The AI angle is the plumbing: make trial rules and trial data easier for machines to handle, then use that to match patients, run studies, and monitor results faster.

Why it matters: The standard comeback to the AI × bio revolution is often: “nice molecules, but clinical trials still take years, cost fortunes, and decide what actually reaches patients”. These announcements do not make that bottleneck disappear. But they suggest OpenAI and the FDA are both asking the long-overdue question: can AI speed up not just drug discovery, but also the trials?

Did you know: The FDA’s real-time clinical trial pilot is open for public comment until May 29.

THE EDGE

NVIDIA released Nemotron 3 Nano Omni, an open multimodal model for agents that need to read text, images, audio, video, documents, charts, and interfaces.

ASI used it in Alexandria (see our story above) for fast figure inspection. Try it through Hugging Face, OpenRouter, or build.nvidia.com.

ON OUR RADAR

☑️ Generative design of sequence specific DNA binding proteins.

☑️ GPT-5.5 Pro predicted the outcome of a complex immune-cancer experiment, Derya Unutmaz says.

☑️ Flatiron Health publishes peer-reviewed validation framework for AI-extracted real-world oncology data.

☑️ Why AI Needs a Tissue Foundry to Solve Biology.

Until next time,
Peter at BAIO

Here's the Human Genome Project for virtual cells

THE BRIEFING

NEWS
Biohub puts $500M behind virtual biology

NEWS
Lilly bets $2.25B on AI-designed gene insertion

NEWS
For this literature agent, images matter too

NEWS
AI agents still struggle with scientific judgment

NEWS
AI comes for the clinical trial bottleneck

THE EDGE

ON OUR RADAR

Keep Reading

BAIO

Here's the Human Genome Project for virtual cells

THE BRIEFING

NEWSBiohub puts $500M behind virtual biology

NEWSLilly bets $2.25B on AI-designed gene insertion

NEWSFor this literature agent, images matter too

NEWSAI agents still struggle with scientific judgment

NEWSAI comes for the clinical trial bottleneck

THE EDGE

ON OUR RADAR

Keep Reading

BAIO

NEWS
Biohub puts $500M behind virtual biology

NEWS
Lilly bets $2.25B on AI-designed gene insertion

NEWS
For this literature agent, images matter too

NEWS
AI agents still struggle with scientific judgment

NEWS
AI comes for the clinical trial bottleneck