Steven Salzberg says AI in biology is starting to look like pseudoscience. I’m seeing something else.

Steven Salzberg is a Bloomberg Distinguished Professor at Johns Hopkins with a 167 h-index (read: world class), genuine contributions to genomics, and a long track record of calling out real pseudoscience - anti-vaxxers, homeopathy - and gain-of-function recklessness.

His recent piece argues that AI foundation models for biology are biologically implausible, largely unfalsifiable, and built backwards - solutions in search of problems. He compares the dynamic to homeopathy and acupuncture.

While I’ve been following the AI x biology field for a long time, since I started BAIO six weeks ago my view right now is a lot more detailed. Across 13 issues I’ve provided over 60 news stories.

And from where I’m standing, the landscape looks somewhat different compared to what Salzberg is seeing.

So here's where I think he's right, where the evidence has moved past him, and where the truth is uncomfortable for both sides.

Where Salzberg has a point

The field has a hype problem, and the best people in it know it. A Nature Biotechnology review we covered in Issue 10 is candid that foundation models for single-cell data do not consistently beat simpler baselines on perturbation prediction. When a YC-backed startup released 10,000 AI-designed DNA sequences, Stanford's Anshul Kundaje - one of the field's top computational genomicists - called it “more hype than a serious endeavor” (Issue 8).

Salzberg is likely also right that some groups are building ever-larger models and then looking for problems to solve. And he's probably right that Nature has been too eager to publish extravagant claims.

But lets take a look at what Salzberg is arguing against, and what the evidence actually shows. More often than not, they're not the same thing.

“The notion that you can use DNA sequence alone to predict how genes will behave is biologically implausible.”

Salzberg frames this as an indictment of the whole field. But the models producing the most validated results right now - clinical proteomics, cardiac imaging, perturbed single-cell data, protein design - don't rely on DNA sequence alone.

ProtAIDe-Dx, published in Nature Medicine recently, uses 7,595 protein measurements from blood plasma. X-Cell, Xaira's virtual cell model, is trained on 25.6 million cells that were each experimentally perturbed across seven cellular contexts. EchoJEPA was trained on 18 million echocardiograms. Tripso gives each biological program its own separate representation precisely because compressing a cell into one summary loses the context Salzberg correctly says matters.

“These claims are largely unfalsifiable.”

This was perhaps a reasonable criticism a few years ago. It's harder to sustain now. Evo 2 went from preprint to Nature with experimental proof - AI-designed bacteriophage genomes that killed target bacteria when synthesized and tested in the lab (which we covered in Issue 5). JURA Bio manufactured roughly 10 quadrillion designed sequences and screened 20.9 billion antibody interactions (Issue 10). Latent-Y produced lab-confirmed binders against six of nine targets (Issue 12). Generate Biomedicines took an AI-designed antibody into Phase 3 with 1,600 patients (Issue 4).

These are falsifiable claims that were tested. Sometimes they held up. Sometimes they didn't - Warpspeed's clinical trial forecasting system missed one of five predictions (Issue 13), and Latent-Y's agent designed antibodies that latched onto the inside of a protein where a real antibody could never reach, caught only because a human biologist was in the loop (Issue 12). The failures are reported alongside the successes.

Yes, Salzberg could fairly respond that we're citing the winners. BAIO covers the stories that clear an editorial bar. We don't cover the hundreds of foundation model papers that claim everything and validate nothing. The denominator matters, and it's likely large. But “some claims in the field are unfalsifiable” is a different argument from “the field's claims are largely unfalsifiable.” The latter is no longer true.

“Deep learning scientists are doing this backwards: they have a solution in search of a problem.”

Some surely are. But story after story in BAIO shows researchers starting with specific biological problems and reaching for AI because nothing else works at the required scale. The Lund team behind ProtAIDe-Dx started with a clinical problem - misdiagnosis rates of 25-50% in dementia clinics - and built a model to address it. An MIT team started with the specific problem that codon optimization tools for biologic drug manufacturing are mediocre, trained an LLM on yeast biology, and outperformed all four commercial alternatives (Issue 3).

Salzberg's framing fits a certain kind of AI paper, sure. But it doesn't fit the work where the biology came first.

“None of their predictions have been independently verified.”

Insilico Medicine's rentosertib published Phase 2a results in Nature Medicine. Perimeter's Claire passed a pivotal trial and received FDA approval for use during breast cancer surgery (Issue 6). A deep learning pathology model was validated against one of the largest breast cancer randomized trials ever conducted and published in The Lancet Oncology (Issue 11). ProtAIDe-Dx was validated on a fully held-out external cohort of 1,786 patients. We could keep going.

So: these claims have been externally validated on held-out cohorts, tested in clinical trials, scrutinized by FDA review, and published in peer-reviewed journals. Full independent replication by outside teams remains rare - but as Faisal Mahmood argued in Nature Medicine, that's a benchmarking crisis across all of biomedical machine learning, not a problem unique to these models.

"None of the papers describing DNA models offer any insights into human biology or genetics."

BioReason-Pro didn't just label protein function - it showed its reasoning, and experts preferred its annotations over curated database entries 79% of the time (Issue 10). Tripso discovered a previously unknown immune cell program linked to eczema, confirmed in tissue (Issue 13). Are these the revolution that some foundation model papers promise? No. Are they “no insights”? Also no. But we’re still very early.

The uncomfortable middle

The honest version of Salzberg's argument is that the ratio of validated results to grandiose claims is likely still too low, that Nature probably is guilty of publishing hype too readily, and that some AI groups don't understand the biology well enough to know when their models are fooling them.

The honest version of the counter-argument is that the field is producing FDA-approved devices, Phase 3 clinical data, wet-lab validated protein designs, and genuine biological discoveries at a pace that has accelerated noticeably even in the few months BAIO has existed.

The real question isn't whether AI produces genuine biological results. It does. The question is how reliably, at what scale, and for which problems. That's a scientific question worth arguing about.

Comparing this to homeopathy and calling it pseudoscience? That’s not skepticism. It's just lazy.

Keep Reading