In partnership with

THE BRIEFING

Verification, the dictionary will tell you, is the act of checking whether something is true, accurate or real.

In AI × bio, that little trio comes at a premium. A model can sound right, rank well, or generate something beautiful on screen. But biology has a habit of asking the rude follow-up question: does it actually work?

This issue is full of attempts to give the correct answer to that question.

☑️ Tacit Labs wants realistic evaluations for drug-discovery agents before the clinical trial gives the final answer.

☑️ Radical Numerics is building genome AI for both design and defense, because a model that writes useful DNA also needs to recognize dangerous biological paraphrases.

☑️ Boltz released new design models, but also a protocol for deciding when an AI-designed protein binder has really been confirmed.

☑️ And Noetik says an ordinary pathology slide can help identify which patients are most likely to respond to an immunotherapy combination.

In this issue we also have GenBio’s argument that a virtual cell should be a world model: a simulator with an internal cell state, not just a model that predicts one readout at a time.

Let’s dive in.

NEWS
Radical Numerics raises $50M for genome AI with a defense layer

Credit: Radical Numerics

Radical Numerics emerged from stealth with a $50 million seed round and previewed Omnii, a genome language model built for health and biodefense.

Its CEO and co-founder, Eric Nguyen, helped build Evo and Evo 2 with the rest of Radical’s founding team. BAIO covered Evo 2 in Issue 5, when the DNA model was published in Nature. Think of a genome language model as an LLM for DNA: it learns sequence grammar, then uses it to predict or design new sequences.

Spectacularly, scientists used Evo 2 to generate complete bacteriophage genomes - viruses that infect bacteria - and 16 of 285 designs killed their target bacteria. Despite that, Nguyen told Fortune the academic work “wasn’t being picked up” the way the team expected, so they decided they had to “show the recipe.”

That recipe is, in a way, the company story. In an AGI House interview, Nguyen said the lesson from Evo was that “DNA was not enough.” Biology has many languages - DNA, RNA, proteins, epigenomics, metabolomics - and Radical wants one system that can learn across them. He also emphasized that science, health and bio are “too important to be a side project” for frontier AI labs built around chatbots, code and image generation.

Compared with earlier genome language models, Radical says Omnii adds three things: more scale, biological information beyond raw DNA, and post-training so the model can be used directly. In this preview, that means DNA plus evolutionary conservation data, a 2 million base-pair context window, and the ability to predict, design and interpret without training a separate model on top. It is not yet the all-omics system Nguyen describes. Just the first step.

One early health application is variant interpretation in noncoding DNA. Most of the genome does not make proteins. It helps control when, where and how strongly genes are used, which can make disease variants hard to interpret: there may be no changed protein to inspect. Radical claims Omnii beats specialist tools on noncoding variant-effect benchmarks and recovers experimentally functional Alzheimer’s-linked variants in microglia, the brain’s immune cells.

The model can also generate RNA aptamers - short RNA molecules designed to bind a target - in a proof-of-concept design task. In addition to that, the company is exploring Omnii for pancreatic and multi-cancer detection with a diagnostics partner.

Intriguingly, Radical is making a sort of “dual-use-native” argument: the design lab and the defense lab now have to be built together. If genome models become good enough to design useful biological sequences, defense has to be part of the stack from the start.

That is where Omnii’s detection work comes in. Radical points out that today’s screening mostly asks whether a sequence looks like something already known. But biology allows paraphrases: sequences that keep a risky function while changing the letters enough to evade ordinary matching.

Their cleanest example is natural, not engineered. Dengue and chikungunya envelope glycoproteins share only 19% sequence identity, yet converge on a similar fold. Radical’s worry is that generative models could produce the adversarial version: sequences that preserve dangerous function while drifting away from known references. The company says Omnii can flag pathogen-associated paraphrases and is being piloted with a U.S. national lab for pathogen detection.

Why it matters: This is generative genomics becoming a company. Radical is betting that genome AI moves from reading DNA to designing and interpreting biological systems - and that the safety layer has to be built in.

Did you know? Radical Numerics is accepting early-access requests for both Omnii for health and Omnii for defense, and is hiring.

NEWS
Tacit Labs wants to give AI agents for drug discovery a long-horizon verifier loop

Credit: Tacit Labs

Here’s something biology lacks: fast, dense feedback loops for AI agents. Tacit Labs, a new applied research lab, wants to build the verifier layer between toy benchmarks and clinical trials.

Code has compilers and tests. Math has formal proof systems. Drug development has clinical trials - the strongest verifier, but one that we all know takes around 10 years, costs about $1 billion, and still fails most programs that reach the clinic.

According to Tacit, agentic drug discovery needs intermediate tests that are faster than trials but closer to real drug development than toy benchmarks. It needs proper long-horizon evaluations.

Those evaluations would test whether agents can make linked decisions across the chain that leads to a drug - target, molecule, assay, biomarker, patient group and development path. Tacit says they will combine human task design with biological foundation models, raw experimental readouts and lab infrastructure. ”We bring,” Tacit writes, “these simulated evaluations as close to reality as possible.”

Tacit just introduced itself with a paper co-authored with OpenAI. LifeSciBench is an early example of the direction: a benchmark designed to test whether models can handle realistic life-science work, not just biology trivia.

It asks models to reason through messy evidence: figures, documents, sequences, structures, experiment design, translational risk and uncertainty. GPT-Rosalind led the benchmark, but passed only 36.1% of tasks.

Tacit’s longer vision is much bigger. In its launch post, the company imagines one-person biotechs: a person specifies a disease area or therapeutic hypothesis, then spins up agents to explore possible paths.

Why it matters: If AI agents are going to do more than solve isolated biology tasks, they need feedback before the clinical trial tells everyone they were wrong. Long-horizon evaluations could become the rehearsal space.

Did you know? Tacit Labs is hiring.

NEWS
Noetik predicts immunotherapy responders from routine slides

Credit: Noetik

A pathology foundation model can identify patients more likely to respond to a specific immunotherapy combination from routine pretreatment biopsy slides, according to a new white paper from Noetik.

The South San Francisco startup builds AI models for cancer biology and drug discovery. Its model, TARIO-2, was trained on paired ordinary pathology slides and spatial transcriptomics data - maps showing where genes are active inside tumor tissue. At inference time, it only needs the kind of slide hospitals already collect, then infers a richer molecular picture of the tumor microenvironment.

In the retrospective white paper, Noetik tested it on 113 patients from a phase 1b trial of BOT+BAL: botensilimab plus balstilimab, two immune checkpoint antibodies. The strongest result was in a hard-to-treat form of metastatic colorectal cancer. TARIO-2 ranked patients by who looked most likely to benefit. Among the top 30%, 64% responded to the drug combination. Among the rest, only 9% did. The selected group also lived longer: too few had died by the data cutoff to calculate a median survival time, while the median in the remaining patients was 13.3 months.

You might recall we’ve covered similar stories before. Like Microsoft’s GigaTIME, that uses pathology slides to approximate expensive tumor-immune imaging. And a Technion-led team estimated a breast-cancer genomic test from a slide.

Why it matters: If this holds up prospectively, ordinary biopsy slides could help trial teams choose patients more likely to benefit from a drug, rather than testing it in a broad group where only a small subset can respond. For now though, the evidence is still preliminary: retrospective, company-led, and based on 113 treated patients.

Did you know? Noetik is hiring.

NEWS
Boltz goes fast, cheap and very wide

Boltz released two design pipelines and an API for running its models in drug-discovery workflows.

Boltz is behind Boltz-2, the open-source structure-and-affinity model BAIO has mentioned several times before. The release adds BoltzMol-1 for small-molecule hit discovery and BoltzProt-1 for protein binder design.

With its API Boltz is aiming for fast, cheap and “the widest distribution of frontier biomolecular models yet through partnerships spanning nearly every major platform and agent in the field”. Predictions start at $0.025, and the API comes with Python and JavaScript SDKs plus support for Claude Code, Codex and Gemini CLI. Boltz is also plugging the models into platforms many scientists already use, including Benchling, Phylo, Amazon Bio Discovery, Tamarind and Cultivarium.

As for the new models, Boltz says the early lab results are promising. BoltzMol-1 found confirmed hits for six of ten difficult protein targets after testing only a few dozen candidates for each. BoltzProt-1 improved on BoltzGen, the team’s earlier binder-design model, in tests where it designed nanobodies - small antibody-like proteins - and measured how many bound the intended target.

Alongside the model release, Boltz published an evaluation protocol for AI-designed binders. It is meant to standardize how labs report whether a designed protein actually binds its intended target. The protocol draws a hard line between an early “screening hit” and a “confirmed binder”: messy assay signals are not enough to call something a confirmed binder. Boltz says teams should report the raw binding curves, sequences, assay setup, valency, and controls needed for others to judge what happened.

Why it matters: Boltz is making biomolecular design faster, cheaper and more widely available. But the evaluation protocol is a reminder that speed and distribution can’t come at the cost of evidence.

Did you know? Boltz is hosting launch events for the new models, API and integrations in Boston (June 18th), San Francisco (June 22nd) and London (July 2nd).

5 THINGS I LEARNED
Virtual cells as world models - to usher in the “industrial age of biology”

Credit: GenBio

The term virtual cell is used pretty loosely these days. Right now, almost any model that predicts how a cell responds to a perturbation can get pulled under the label. Eric Xing and Le Song - GenBio AI’s chief scientist and CTO - have had it with that. It’s time to raise the bar.

In a new GenBio paper, they argue that a real virtual cell should be closer to a world model for biology: a simulator you can act on, roll forward, and use to explore possible cellular futures. The vision is daring: the paper says this could eventually let us bring “biology closer to an industrial age of predictive design and programmable systems.”

And GenBio is not watching this debate from the sidelines. The company is building AIDO, an “AI-driven digital organism”, and the paper reads like a blueprint for what that system is supposed to become.

Here are 5 things I learned:

1 Prediction is only the first step.

Most virtual-cell models today answer a narrow question: if I knock out this gene, what happens to gene expression? Xing and Song argue that this is useful, but too limited. A real virtual cell should let you apply an action - a gene edit, a drug, an environmental change, or a sequence of interventions - and simulate how the cell changes across time.

2 A world model has to keep a cell state.

RNA-seq is not the cell. It is one readout from the cell. A virtual cell world model needs an internal state that can explain many readouts at once: gene activity, protein structure, localization, interactions, function and morphology. If those outputs disagree with each other, you do not have a coherent simulated cell.

3. The missing data.

The authors split the data challenge into three parts. First, you need multimodal data to learn what cellular states look like. Second, you need perturbation-response data to learn how interventions move a cell from one state to another. Third, you need temporal data to learn how trajectories unfold. Biology has many single-modality snapshots, but far less paired, multimodal and time-resolved data showing cells changing under controlled interventions.

4 AI may help build the models.

The paper points to VCHarness, which BAIO covered in Issue 17, as one possible path: use AI agents to assemble, test and improve virtual-cell models from existing foundation models. The reason is practical. Manually building separate models for every modality and dataset does not scale. But the authors also draw a line: stitching models together is not enough if their outputs are not linked by one coherent biological state.

5 The two axes.

The authors use two axes to show how virtual cells could grow beyond a single generic cell model. One axis is diversity: virtual cell banks that represent cell types, developmental stages, disease states, genotypes and eventually individualized cellular states. The other axis is composition: virtual cells connected into larger systems - tissues, organs and eventually digital organisms. The reason is simple: biology does not happen inside one isolated cell. A useful virtual cell has to scale across biological variation and up into the systems where cells actually live.

THE EDGE

NVIDIA published BioNeMo Recipes for fine-tuning biology foundation models with LoRA - training small adapters instead of the whole model. The examples cover ESM2-3B for protein secondary structure and Evo2-1B for splice-site classification. Useful if you want to adapt a protein or DNA model without rebuilding the full training stack. Code and docs are available in BioNeMo Recipes.

ON OUR RADAR

Until next time,
Peter at BAIO

Keep Reading