THE BRIEFING
Two months in, BAIO is already reaching readers at Stanford, Genentech, OpenAI, Benchling, Regeneron, MIT and even Nature. Thank you to everyone who has signed up, forwarded an issue or told a colleague.
The twice-weekly issues are still the best way to stay current. But the specials are where we zoom out. New readers can catch up here:
And today? Physical AI enters the wet lab, ARPA-H wants an AI research engine, donor lungs get digital twins, OpenBind releases sorely needed binding data, and TxConformal warns how many AI-picked molecules are likely to fail.
Let’s dive in.
NEWS
Physical AI enters the wet lab

A still from the Genesis AI GENE-26.5 demo. Credit: Genesis AI
On BAIO’s landing page, a robotic hand holds a test tube. That is partly because AI is easy to visualize as a robot. But it is also because we think a lot of lab work may be done this way in the not-too-distant future.
Genesis AI seems likely to agree.
The Paris- and Palo Alto-based physical AI startup just showed a robot that can pick up a pipette, attach a tip, transfer liquid into a tube, eject the tip, seal the tube, open a centrifuge and place the tube inside. “This requires millimeter-level precision, tool use, fine-motor coordination (e.g., screwing on a 1 cm cap), and dexterous in-hand re-grasping to reposition the pipette for hanging it back onto the rack,” Genesis writes in a technical blog post.
The system is called GENE-26.5. In the blog post, Genesis describes it as its first robotic foundation model system for dexterous manipulation. The company says the same model with shared weights handles real-world tasks across cooking, lab automation, smoothie-making, wire harnessing and multi-object grasping.
Most wet-lab automation still depends on machines built for narrow, structured tasks. Genesis is showing a robot hand using ordinary lab tools built for humans. If that generalizes, the autonomous lab does not have to be rebuilt around automation. The robot adapts to the lab.
Genesis is also making a bet about how robots should learn. Instead of starting with a generic gripper, the company built a five-fingered hand with human-like proportions, then trained it from human demonstrations: glove data that captures finger motion, egocentric video from the person doing the task, and third-person video from the room. The wager is that if the robot hand is shaped enough like a human hand, human data becomes much more useful.
And then, almost simultaneously, AINORO, a wet-lab robotics startup, announced a partnership with the Stanford-Princeton AI4Science Catalyst team behind LabOS, exploring a workflow where AI can reason, build protocols and execute experiments through dexterous robots. AI4Science Catalyst also launched LabOS² and called it “an early look at autonomous cell culture, as a long-horizon physical AI workflow for biomed”.
Don’t forget: all of this is still very much demo-stage. Genesis published a company blog, not a peer-reviewed paper. Wet labs care about repeatability, sterility, calibration, and boring reliability. A robot that can pipette once on video is not yet a lab worker.
Why it matters: Faster discovery is not just about better AI models. New medicines move only as fast as the slowest part of the loop: design, build, test, learn. AI is already speeding up design. But biology still has to be built and tested in the physical world. If dexterous robots can handle more of that lab work, experiments can scale with the models that propose them - and therapies can move faster from idea to evidence.
Did you know? Genesis AI raised $105 million in one of France’s largest seed rounds, with backers including Eric Schmidt, and Khosla Ventures. The company was co-founded by former Mistral researcher Theophile Gervet. Genesis AI is hiring.
NEWS
ARPA-H wants an AI research engine

Well, would you look at that - this is how ARPA-H illustrates IGoR. Great minds think alike (see our story above this one). Credit: ARPA-H
ARPA-H (The Advanced Research Projects Agency for Health) wants to build a more efficient engine for biomedical research: AI proposes the next experiment, qualified labs run it, the data flows back, and the disease model improves.
The program is called IGoR, short for Intelligent Generator of Research. ARPA-H says the goal is to generate hypotheses, design and conduct experiments, and refine biological models at least ten times faster than traditional research approaches.
ARPA-H is not announcing a finished platform. It is defining the architecture it wants funded teams to build. The program calls for four pieces that have to work together: mechanistic disease models, an AI system that finds knowledge gaps and designs experiments, standardized protocols so different labs can run those experiments reproducibly, and a marketplace of qualified labs that return gold-standard data. That sounds like design-build-test-learn for biology, but with the institutional plumbing included.
Caveat: this is a program launch, not a result. ARPA-H is funding the architecture it wants to see. We do not yet know which teams will build it, what disease areas will be first, or whether the pieces will actually interoperate.
Why it matters: ARPA-H is a well-financed US health agency built to fund high-risk biomedical programs. If it is putting its weight behind AI-orchestrated research loops, that is a strong signal.
Did you know? ARPA-H is currently accepting proposals from teams that want to build the IGoR system.
NEWS
Donor lungs now have digital twins

An EVLP System. Credit: UHN
It is now possible to model a donor lung as a computational twin while it is being assessed for transplant. In a new Nature Biotechnology paper, researchers at University Health Network (UHN) in Toronto and the University of Toronto describe multimodal AI models trained on 951 donor lungs assessed on the Ex Vivo Lung Perfusion (EVLP) system. EVLP was pioneered by UHN’s Toronto Lung Transplant Program.
EVLP is a physical system that maintains an isolated donor lung outside the body at body temperature, while a ventilator lets it breathe and a perfusion solution flows through it. This lets transplant teams measure how well the lung exchanges gas before deciding whether it is suitable for transplant.
While lungs were maintained this way, the researchers collected breathing mechanics, blood gases, biochemistry, imaging and molecular data. The digital twin is the AI model trained on EVLP data. It then uses measurements from the physical lung to forecast how that lung’s function will change over the next hours.
With this combination of EVLP and digital twin, researchers can give a real lung a treatment, then ask the digital twin what probably would have happened to that same lung without treatment. In the paper, the team tested alteplase, a clot-dissolving drug, on lungs during EVLP. The digital twin acted as the no-treatment comparison.
It’s a setup that takes care of the missing counterfactual. If a donor lung improves after treatment, researchers need to know whether the drug caused the improvement - or whether that lung would have improved anyway. Usually they compare treated lungs with separate untreated lungs, but donor organs differ too much for that to be clean in small studies. Here, the digital twin acts as the untreated version of the same lung.
Why it matters: Bo Wang, the chief AI scientist at UHN whom BAIO readers have met through EchoJEPA, BioReason-Pro and X-Cell, calls this “a step toward truly precise preclinical evaluation” - smaller, faster and more informative experiments. His larger framing of where this is heading is also familiar: virtual cells, then virtual organs, then virtual patients.
Did you know? The code is on GitHub, trained models are on Hugging Face, and there is a Streamlit demo.
NEWS
Drug-design AI gets (some of) the binding data it needs
AI has become very good at predicting the shape of proteins. But drug discovery needs a harder question answered: will a molecule actually stick to its target, and how tightly?
That is where the data is still thin. Public datasets often contain either a structure or a binding measurement, but not both in a clean, consistent format. OpenBind, a UK-led open-science consortium, just released its first attempt to close that gap.
The first dataset focuses on one viral enzyme. OpenBind tested hundreds of small molecules against it, solved the 3D structures of how many of them attached, and measured how tightly they bound. In total: 925 protein-molecule structures from 699 compounds, with binding-strength measurements for 601 compounds.
The useful part is that this is newly generated experimental data, not scraped leftovers from old databases. That makes it a cleaner test for AI models trying to learn the link between molecular shape and binding strength.
OpenBind also used the dataset to test current AI methods. The result was a useful reality check. Predicting where a molecule sits on a protein was easier than predicting how tightly it binds. And one crude shortcut was still hard to beat: molecular weight. Bigger molecules often make more contact with a protein, so “heavier molecule = stronger binder” can look surprisingly competitive with more sophisticated models.
Why it matters: A lot of AI drug discovery depends on ranking molecules before anyone runs the experiment. But models can only learn what the data lets them learn. OpenBind is building the kind of open, paired structure-and-affinity data the field needs to know whether those rankings are real.
Did you know? The dataset is available through Zenodo and Fragalysis, with benchmark code on GitHub and experimental protocols on protocols.io.
NEWS
AI drug discovery gets a false-positive warning
AI drug discovery often turns huge candidate pools into shortlists for wet-lab testing. TxConformal asks a practical question before anyone starts pipetting: how many of those candidates are likely to be dead ends?
The new bioRxiv preprint uses conformal prediction - a statistical way to put uncertainty around model outputs - to make AI-ranked lists more useful. Instead of only saying “test these 100 molecules,” it estimates how many are likely to fail.
In an antibacterial screen against Acinetobacter baumannii, the team selected 100 molecules for synthesis and testing. Before the experiment, TxConformal predicted that about 80 of them would fail, with a likely range of 70 to 95. The lab result: 91 failed. Only nine were active.
Why it matters: A shortlist is more useful when scientists know the expected error rate. TxConformal makes the risk of trusting the AI model more explicit.
Did you know? The code is available on GitHub.
THE EDGE
xBind is a free webserver that predicts which parts of a protein are likely to bind another protein, DNA or RNA. Give it a protein sequence or structure, choose the interaction type, and xBind returns residue-level binding probabilities, interactive 3D views and downloadable reports.
ON OUR RADAR
Until next time,
Peter at BAIO


