THE BRIEFING
AI models are getting good at biology, and the pace is accelerating. Anthropic's mid-tier model Sonnet almost tripled its score on computational biology benchmarks in six months, nearly matching the flagship, Opus. Eli Lilly is spending billions to build the infrastructure around that shift.
And a wave of startups - from virtual assay companies to immunology-specific foundation models - are turning the new capabilities into products.
Let's dive in.
NEWS
Anthropic's cheap mid-tier model closes the gap on biology benchmarks

Anthropic released Claude Sonnet 4.6, and its life sciences benchmarks tell a clear story: the gap between mid-tier and frontier models is closing fast. On BioPipelineBench - Anthropic's internal test for bioinformatics workflows like sequence analysis, metagenome assembly, and chromatin profiling - Sonnet 4.6 scored 52.1%, nearly matching Opus 4.6 at 53.1%. Sonnet 4.5 scored 19.3% on the same test six months ago.
The pattern repeats across Anthropic's other life sciences evaluations. Structural biology: 85.3% vs. Opus's 88.3%, up from 70.9%. Organic chemistry: 48.4% vs. 53.9%, up from 31.2%.
The biggest remaining gap is in biological reasoning tasks that require interleaving computation with biology - BioMysteryBench: 50.4% vs. Opus's 61.5%.
Sonnet costs $3/$15 per million tokens, roughly one-fifth the price of Opus. For labs running bioinformatics pipelines or screening structural biology data, the cost difference at scale is significant.
Why it matters: Six months ago, a mid-tier model scored 19% on computational biology pipelines. Now it scores 52% - matching the flagship. The practical biology tasks most researchers need are moving to commodity pricing.
Did you know? Anthropic currently has hundreds of open positions.
NEWS
Lilly bets $1 billion on AI drug discovery - and publishes the blueprint

The pharma world is moving fast on AI, and Eli Lilly is a prime example of that. In January, Lilly and NVIDIA announced a $1 billion co-innovation lab in the SF Bay Area. The five-year commitment co-locates Lilly drug hunters with NVIDIA engineers, running on NVIDIA's BioNeMo platform and Vera Rubin architecture. The setup: agentic wet labs tightly linked to computational dry labs in what they call a “scientist-in-the-loop” framework.
Now Lilly has co-authored a paper with Insilico Medicine - after signing a $100M-plus research and licensing deal in November 2025 - published in ACS Central Science. It lays out a vision for a “Prompt-to-Drug” pipeline, where a plain-language prompt could kick off a largely automated workflow from hypothesis through design and synthesis, with scientists in the loop and downstream development planning as an aspirational end state. Insilico has nominated 20+ preclinical candidates using its platform, typically in 12–18 months versus 3–6 years in more traditional workflows.
Why it matters: A billion-dollar lab is a bet on scale. The published Prompt-to-Drug framework, and the partnership with Insilico, is a bet on method. Add Lilly's existing $1.7 billion deal with Isomorphic Labs and its own DGX SuperPOD - one of the most powerful AI supercomputers in biopharma: the total commitment is stacking up fast. Lilly is placing billion-dollar bets across infrastructure, partnerships, and methodology simultaneously.
Did you know: NVIDIA's expanded BioNeMo platform - the backbone of the new lab - now includes open models for RNA structure prediction, molecular synthesis, and toxicity prediction. It's available as an open development platform.
NEWS
MIT trains a language model on yeast genes to improve biologic production
MIT chemical engineers used a large language model called Pichia-CLM to optimize how industrial yeast manufactures protein drugs. The team trained an encoder-decoder LLM on all native proteins of the yeast Komagataella phaffii, learning the organism's codon usage patterns - the subtle grammar of which three-letter DNA sequences it prefers for each amino acid.
The model then redesigned codon sequences for six therapeutic proteins, including human growth hormone and trastuzumab, a monoclonal antibody used to treat cancer. Production was enhanced up to threefold compared to native sequences. For five of six proteins, the MIT model outperformed all four commercial codon optimization tools tested. The model also independently learned biological principles it wasn't taught - like avoiding DNA sequences that inhibit gene expression.
Why it matters: MIT notes that the gene extraction, modification, and integration process can account for 15-20% of the total cost of commercializing biologic drugs, according to the MIT researchers. An LLM that reliably outperforms existing tools could cut development time across the industry. The team has made code and model parameters publicly available on GitHub.
Did you know? K. phaffii (formerly Pichia pastoris) already produces dozens of commercial products including insulin, hepatitis B vaccines, and a monoclonal antibody used to treat chronic migraines.
NEWS
Turbine raises $25M to virtualize biology experiments for pharma

Turbine has run virtual biology experiments across more than 30 drug discovery programs, including work for MSD, AstraZeneca, and Bayer. The Budapest company builds computational copies of biological assays - trained on proprietary perturbation datasets - that let pharma teams test millions of conditions at computational speed. Now it has $25 million in Series B funding to scale up, led by Interactive Venture Partners with participation from Beiersdorf and existing investors MSD Global Health Innovation, Accel, and Mercia.
Alongside the funding, Turbine announced its first immunology partnership with an unnamed top-10 pharma company. The collaboration will train virtual assays on the partner's proprietary immune datasets to model immune cell behavior and identify drug combinations.
Why it matters: Turbine's pitch is straightforward - shift early discovery from broad wet-lab screening to targeted validation guided by in silico assays to cut cost and time in early discovery. The 30+ programs suggest it's working. The immunology partnership and Beiersdorf's investment hint that virtual assays are finding demand beyond Turbine's original oncology focus.
Did you know: Turbine is hiring across 15 roles in Budapest, including machine learning engineers, computational biologists, and lab scientists.
NEWS
Scienta Lab launches EVA - a foundation model built for immunology drug discovery

Paris-based Scienta Lab introduced EVA, a cross-species, multimodal foundation model purpose-built for drug discovery in immunology and inflammation. The model harmonizes transcriptomics data across species, platforms, and resolutions, and integrates histology data to produce unified patient-level representations.
Scienta trained EVA on ImmunAtlas - about 545,000 human-and-mouse gene readouts plus 4,000 pathology slides - and tested it on 39 tasks covering discovery, preclinical translation, and clinical response prediction.
The paper reports performance gains up to ~2× on some benchmarks/tasks.
Why it matters: Immunology and inflammation drugs often fail late because biology that looks promising in models doesn’t translate to patients. EVA is designed specifically to bridge that gap - linking molecular readouts and tissue pathology across species - with the goal of predicting treatment response before expensive trials.
Did you know? Scienta Lab is hiring. There’s also an open version of EVA for transcriptomics.
THE EDGE
FutureHouse/Edison Scientific released PaperQA3 (within the Kosmos platform), a major upgrade to the algorithm behind Edison Literature. The key advance is multimodal figure and table understanding at scale: Edison Literature can now read figures and tables from 150M+ full-text research papers, patents, and clinical trials. Instead of embedding or loading all figures upfront, the agent decides when visuals are needed and pulls only the relevant figures, avoiding distraction from unrelated images.
If you work with scientific literature at scale, this could be worth trying. The underlying algorithm is open source - install it via the paper-qa repo on GitHub and use it to ask a research question and synthesize across papers. For large-scale full-text + figure/table understanding, the commercial version runs inside Edison Literature.
ON OUR RADAR
Until next time,
Peter at BAIO


