In partnership with

THE BRIEFING

Is May 19 the new AI-scientist Day? Nature seems to think so. Yesterday, as I write this, the journal published a cluster of papers that gave the field one of those “something just changed forever” moments: Google’s Co-Scientist for hypothesis generation, FutureHouse’s research agent for lab-in-the-loop discovery, and Google’s ERA for scientific software search.

Different pieces of the scientific process are being turned into software: hypotheses, experiment choice, data analysis, code, and R&D workflow. For life-science researchers, that means a wave of cool new tools - and questions about what the human role becomes when discovery increasingly comes from machines.

In this issue we also have whole-body disease mapping, a faster open DNA model from Hugging Face, a tougher benchmark for virtual cells, and a practical RNA-seq discovery tool.

Let’s dive in.

PGA and LPGA Winners Already Invested. 12 Angel Groups Too.

AI has created some of the biggest investment opportunities of the decade. Sparrow is bringing that shift into human performance - a $1T+ untapped market.

Sparrow turns your smartphone into a real-time AI coach, starting with golf and expanding into all forms of human motion.

85% revenue growth. 250K users. PGA and LPGA winners already invested.

Invest by 5/31 and receive 10% bonus shares.

Opportunities like this don’t stay open long: Invest in Sparrow now.

^{This is a paid advertisement for Sparrow's Regulation CF offering. Please read the offering circular at}^{invest.sparrowup.com}^.

NEWS
Google turns Co-Scientist into a science platform

Credit: Google

An AI system from Google just generated scientific hypotheses that survived wet-lab testing.

Co-Scientist, published in Nature, is built on Gemini and aimed at a harder task than summarizing papers: coming up with testable ideas. A scientist writes a goal in plain English. Agents search the literature, propose hypotheses, critique them like peer reviewers, rank them in a tournament, and refine the strongest leads.

The Google researchers call it a “structured scientific thinking engine.” Vivek Natarajan, one of the paper’s senior authors, framed the lineage plainly: Co-Scientist builds on the self-play and self-improvement principles behind DeepMind systems like AlphaGo, but redirects them from games to science.

And Co-Scientist has already been put to work:

☑️ In acute myeloid leukemia, Co-Scientist proposed repurposed drugs and drug combinations that were tested in cell lines. Some inhibited cancer-cell viability; some combinations showed synergy, though results varied by cell type.

☑️ In liver fibrosis, it proposed epigenetic targets - gene-control mechanisms - and drugs aimed at them. Two showed anti-fibrotic activity in human liver organoids without causing cellular toxicity.

☑️ In antimicrobial resistance, Co-Scientist was asked how bacteria move dangerous genetic cargo, including antibiotic-resistance genes, between species. In two days, it proposed that these DNA elements can use different phage tails - the docking gear bacterial viruses use to enter cells - matching a mechanism another team had found but not yet published.

Google DeepMind’s blog shows a wider push too: ALS, cellular aging, infectious disease, and more.

This is also becoming a product. Google is rolling Co-Scientist into Gemini for Science, a new collection of experimental science tools. The package includes Hypothesis Generation, powered by Co-Scientist; Computational Discovery, which searches for better models and algorithms by generating and scoring code variations; Literature Insights for working through papers; and Science Skills for research agents. An enterprise version is already in private preview with Daiichi Sankyo, Bayer Crop Science, and US National Labs.

As for Co-Scientist there are caveats. The full code is not public. Access is controlled. Cell-line and organoid results are early validation, not clinical proof. Co-Scientist can inherit hallucinations and unreliable papers, and it may miss paywalled studies or negative results that never made it into the literature.

Why it matters: AI systems that generate hypotheses are not new. What is different here is the package: Google has put a Gemini-based hypothesis engine through peer review in Nature, shown wet-lab follow-up across cancer, fibrosis, and antimicrobial resistance, and is now turning the system into a Gemini for Science tool researchers can request access to. This is the AI scientist idea moving from demo and discourse toward platform.

Did you know? Researchers can register for experimental access to Co-Scientist through Google Labs.

NEWS
FutureHouse takes its AI scientist into pharma

Credit: Edison Scientific

After Google’s Co-Scientist (see above), this FutureHouse story is not just another agent paper. It is the same week’s answer to a different question: can an AI scientist close the discovery loop, then survive inside pharma?

Just like with the flurry of Google announcements on May 19, there are several different news items we need to untangle here.

One: FutureHouse, a San Francisco lab building AI scientists, published Robin in Nature. Robin is a research agent for biology: it reads the literature, proposes experiments and drug candidates, then analyzes the data after humans run the lab work.

Two: On the same day, Edison Scientific, FutureHouse’s commercial spinout, announced that pharma company Incyte will deploy Kosmos, its AI scientist platform for R&D, across discovery and development.

So: Robin is the paper proof. Kosmos is the production layer.

In the Nature paper, Robin was given dry age-related macular degeneration, a major cause of irreversible sight loss. It chose a strategy that sounds almost janitorial: help retinal cells clear worn-out material in the retina. In the first round, it pointed to a class of drugs that can relax the cell’s internal scaffolding, making that cleanup easier. After humans ran the experiment, Robin analyzed the data and proposed a second round.

That led to ripasudil, a glaucoma drug approved in Japan, which made retinal cells better at swallowing waste material in a lab test. The effect also held up in primary human retinal cells.

In addition to that, Robin surfaced KL001, a compound tied to the cell’s internal clock, which the authors say had never previously been proposed as a way to boost this cleanup process.

The paper also reports that Robin processed 551 papers in about 30 minutes. How long would it take you to process that many papers? About 540 hours, according to the researchers.

Over to Kosmos. It is a code-writing agent that can read papers and patents, interpret figures, run GPU-heavy tasks like AlphaFold or molecular docking, coordinate subagents, and work inside secure company environments with little or no data leaving. Edison says Incyte will embed Kosmos first in target discovery, target validation, and translational biology - where experimental, clinical and biomarker data need to be pulled together for go/no-go decisions. The company also says Kosmos has been validated by reproducing key stages of clinical development for a checkpoint inhibitor, from lead identification to a Phase II proof-of-concept study.

BAIO readers have seen the early pieces of this stack before. In Issue 3, we covered FutureHouse/Edison’s PaperQA3 inside Kosmos, which gave Edison Literature the ability to read figures and tables across 150M+ papers, patents and clinical trials.

There are boundaries here. Robin did not run the wet lab. The dry AMD results are still in vitro. And the Incyte deal is a deployment, not proof that Kosmos improves a live drug pipeline yet.

Why it matters: Co-Scientist turns hypothesis generation into a Google science tool. Robin and Kosmos push the idea toward a fuller R&D loop: literature, experiment choice, data analysis, revised hypothesis, and deployment inside pharma.

Did you know? Robin sample trajectories and code are available on GitHub. So is Finch, the data-analysis agent behind the paper. Edison also says its Literature, Analysis and Precedent agents remain available through Edison Playground with 10 free credits per month. Edison Scientific is hiring.

NEWS
Google ushers in a new ERA for scientific software

ERA is a tree-search system. Credit: Google

First came Co-Scientist for hypotheses (this issue’s top story). Then Robin for lab-in-the-loop biology (see above). But Google also published a third Nature paper that hits at a yet another bottleneck: the (until now) slow act of writing the scientific software itself.

ERA, short for Empirical Research Assistance, is Google’s attempt to automate a specific kind of scientific grunt work: building custom software for problems where success can be scored. Give it a dataset, a metric, and some research ideas. It generates code for that task, runs the code in a sandbox, scores the result, then mutates the better versions and tries again. In other words: where researchers might normally explore one or a few hand-built approaches, ERA can try hundreds or thousands of code variations and keep the ones that score best.

The clearest biology result comes from single-cell analysis. Researchers often need to combine datasets from different labs without letting lab-specific noise - what scientists call batch effects - drown out real biology. ERA generated 40 analysis methods that beat all published methods on the OpenProblems benchmark, a public benchmark for single-cell data analysis. Its top method improved the best published score by 14% by recombining two existing approaches - preserving biological signal while removing technical noise.

ERA also generated 14 COVID-19 forecasting strategies that outperformed CovidHub Ensemble - the CDC-coordinated benchmark model built by combining forecasts from expert teams - and produced expert-level software for geospatial analysis, zebrafish brain-activity prediction, difficult numerical integrals, and time-series forecasting.

We covered a related direction in Issue 24, when AlphaEvolve improved specific pieces of computational biology software, including genomics and molecular simulation code. ERA generalizes that pattern. Instead of optimizing one known problem, it takes a scored scientific task, searches through many possible code solutions, and turns the best-performing ones into new analysis methods.

ERA is strongest when the task has a score. It is not the same as explaining a disease mechanism or proving a biological theory. But modern biology has plenty of scored problems: cleaner single-cell datasets, better forecasts, faster simulations, and data-analysis workflows that can be tested against a benchmark.

Why it matters: Co-Scientist asks which hypothesis to test. Robin asks how to move from experiment to analysis to the next hypothesis. ERA asks whether the software layer of science can be searched and improved automatically. Together, that makes this feel less like three separate AI scientist announcements and more like a taxonomy forming in real time.

Did you know? Google released ERA code on GitHub, plus the best candidate solutions and a browser for inspecting the tree-search runs.

NEWS
MouseMapper points toward whole-body virtual biology

Credit: helmholtz-munich.de

Tissue clearing and light-sheet microscopy can make an intact mouse transparent and image nerves, immune cells and organs in 3D. The bottleneck is analysis: the scans are terabyte-scale, and tracing every nerve branch, immune cluster and organ boundary by hand does not scale.

A Nature paper from Ali Ertürk’s group at Helmholtz Munich, the Ludwig Maximilians University Munich (LMU), and collaborating institutions introduces MouseMapper, a foundation-model-based deep-learning framework for that job. It has three modules: nerves, immune cells, and 31 organs and tissues.

The test case was diet-induced obesity. MouseMapper showed body-wide inflammation and found an unexpected nerve signal: obese mice had structural changes in a facial sensory nerve involved in whisker sensation. The mice also responded less to whisker stimulation. Proteomics found altered pathways linked to nerve structure, remodeling and inflammation, with overlapping signals in human post-mortem tissue.

Why it matters: MouseMapper points toward whole-body virtual biology: AI maps that spot system-level disease changes without researchers deciding in advance where to look. “Our goal is to create a comprehensive framework for understanding how diseases affect the body as an interconnected system,” says Ali Ertürk in an article published on the Helmholtz Munich website. “Our long-term vision is to build truly realistic digital twins of mice in health and disease: cell-level atlases that we can query, perturb and screen in silico computationally”.

Did you know? The whole-mouse-body maps are online, and the code is on GitHub.

NEWS
Hugging Face rethinks the DNA model recipe

Credit: Hugging Face

Hugging Face released Carbon, an open family of DNA language models. Hugging Face? Yes. It might not be a biotech company, but an open AI infrastructure company - and this is exactly the kind of thing it exists to make easier to run, inspect and build on.

Carbon is built around the premise that DNA is not just English with four letters. The models use a standard transformer architecture, but change the recipe around it. Instead of reading DNA one base at a time, Carbon reads it in six-letter chunks. That cuts sequence length by 6x, making long DNA cheaper to process.

The obvious worry: biology often turns on single-letter changes. Carbon tries to recover that detail with a training objective that gives the model nucleotide-level feedback inside each six-letter chunk.

In Hugging Face’s training-free tests, Carbon-3B was competitive with Arc’s Evo2-7B despite having less than half the parameters. Carbon-8B improved on Carbon-3B across every task. In the report’s speed plot, Carbon sits far to the right of Evo2: comparable or better benchmark performance, and over 150x faster under that test setup.

Important to note: there is no wet-lab validation here. This is a technical report and model release.

Why it matters: Genomic foundation models will not become everyday tools if every useful run requires frontier-lab compute. Carbon is a bet that smarter DNA-specific design can move the field’s price-performance curve.

Did you know? Hugging Face has released the models, training data, training code and evaluation suite.

NEWS
Genentech gives virtual cells a harder test

Credit: ChatGPT

A virtual cell should not just tell you what changed inside a cell. It should help predict which perturbations are worth testing next.

That is the point of AssayBench, a new Genentech benchmark for AI models. It uses public CRISPR screens - experiments where researchers switch off genes and measure what happens - and asks a practical question: given this experiment, which 100 genes are most likely to be hits?

Other virtual-cell benchmarks focus on molecular readouts, such as predicting gene expression. But many drug-discovery screens ask messier functional questions: did cells survive, stop growing, resist infection, turn on a measurable signal, or move material to the right place?

AssayBench works like this: take a real CRISPR screen, hide the answer, then ask an AI model to guess the 100 genes the experiment found most important. The benchmark checks that ranked list against the real screen results.

A surprise, perhaps, is who did best. General frontier models beat biology-specific systems like Biomni and C2S, and also beat a custom model trained specifically to predict gene hits from the AssayBench training screens. Gemini 3 Pro was the strongest single model, while a small ensemble of frontier LLMs did best overall. But even Gemini 3 Pro was far below the experimental ceiling: a technical repeat of the same screen was almost twice as predictive.

Why it matters: The point of a virtual cell is to say something useful about real biology before the experiment is run. That means the field needs tests that move toward real experimental decisions, not just cleaner molecular predictions. AssayBench asks one of those questions directly: can the model help choose which perturbations are worth testing next?

Did you know? AssayBench is open on GitHub.

THE EDGE

TransXplorer is a free web tool for turning RNA-seq data into possible biomarkers and drug-target ideas. Upload raw or processed data, and it helps compare groups, surface affected pathways, check cancer datasets, and rank possible drug targets.