
Sell pharma data, rather than partnering on drugs
These companies focus on developing a tech process to develop uniquely valuable datasets. They sell the data itself to pharma, rather than partnering on drugs.
Companies that develop datasets that unlock entirely new target spaces and / or provide existentially important data on currently difficult to drug targets can garner $30M+ up front deals like with Tempus and Fauna. But those are the largest deals ever for the space.
Tempus, UK Biobank, deCode Genetics, 23andMe, Fauna, Basecamp, Ochre, Flatiron Health, Foundation Medicine
On a high-level, these companies can be thought of in terms of their dataset’s value-per-datapoint vs size of dataset. Some have created tiny datasets but ones that are very difficult to replicate with potentially unique biological insights into the hottest target space.
Others have pushed for scale. Epic has the full EHR on 300M+ patients and Tempus has 350+ petabytes of data from 1M+ sequenced tumors and 9M patient’s clinical records.
Those that push for larger scale datasets are either integration plays that connect to existing records (e.g. Epic, Tempus) or those that create a technological platform to massively increase the throughput of data collection on some hard-to-gather data (e.g. Olden Labs, BiomeSense).

Examples of smaller, high-value datasets we at Compound are interested in include ones on the skin microbiome and muscle biopsy biobanks.
If you can do a diagnostics -> biobank business model (like Tempus and Caris), that’s significantly more sustainable because you cover the cost of your data + have high value data and services contracts once your dataset gets to a sufficient size.
With that said, it has generally been a difficult business model in biotech/pharma because the customer is:
Difficulty of selling data to biotech/pharma as opposed to tech or finance
Consider the Bloomberg business model and the respects in which it differs.
§ Time value of information: the bond or currency trader making multi-hundred million-dollar trades lives or dies based on moment-to-moment information. Last time I checked, the drug discoverer can wait a week (a month? six months?) for the latest genetic sequence or proteomic profile.
§ Value of that information: for the trader, that information, and its currency, is the key to making the investment which is the money-making activity. For the drug discoverer, the information is just the start of a multi-year, multi-disciplinary journey lasting a decade or more to the money-making activity (= discovery, development, approval, and marketing of the drug). Hence, a subscription model, not a royalty/profit-share model of payment, is the only one that will sell.
§ Number of high-paying customers: a Bloomberg terminal subscription costs $30,000/year per user (not per firm). In 2016, there were 325,000 terminals in use. How many pharma and biotech subscribers, in a best case, would there be at what user fee?
This industry dynamic of pharma paying pennies for pre-clinic discovery tools no less data may change if and only if pharma starts to view in silico discovery as existential threats. We’re currently seeing this play out with our portfolio company Wayve. Just within the last year, the legacy car OEMs have internalized that AVs are here and are existential to their very survival. Negotiations have shifted from contracts for tiny margins won via hand-to-hand combat over 2-year sales cycles to gargantuan deals over 3 month sales cycles.
This may also suggest a longer-term, radical reshuffling of profit pools in the industry from the marketing of approved drugs collecting effectively all the industries’ profits to being relatively commoditized. We at Compound broadly presume the commoditization of drug discovery, but expect it to play out incrementally over decades.
It’s possible that the data contracts will scale substantially in the coming years if big tech / frontier AI labs start actually getting serious about curing cancer.
For illustrative purposes, Google invested ~$20B into Waymo’s R&D to ~solve~ autonomous driving. How much might it invest into Isomorphic to also ~solve~ drug discovery?
Best of all, those customers should be far, far less price sensitive than what bio startups are used to.
The tricky part is it’s been largely all talk so far and a bio startup employing a data-as-a-service business model is betting it’ll happen in the next couple years.
A signal for when the frontier labs get serious about curing cancer may be Twist’s annual revenue growth re-inflecting upwards from ~25% to say 40% y/y.
Though not quite convincing yet as it's one datapoint and they're guiding towards just 13-15% growth this year, Twist made the following commentary in Q4’25:
Recently, this AI-driven discovery fueled significant growth for Twist. In fiscal 2025, orders from customers working on AI discovery projects grew more than $25 million versus fiscal 2024. And a customer pursuing AI-enabled discovery delivered our single largest purchase order to date.
We’re watching other early signals that the necessary dry powder is being accumulated. OpenAI’s new philanthropic foundation has a $25B budget to “fund work to accelerate health breakthroughs so everyone can benefit from faster diagnostics, better treatments, and cures.” Meanwhile the US announced Genesis and the UK announced:
Typical price-points:

We have further insights that we’d love to share directly. If you’re thinking about this model please reach us at [email protected]!
Compound thesis on Untitled
https://www.dennisgong.com/blog/TempusIPO/?utm_source=chatgpt.com
Tempus prospectus
https://www.aibiodesign.com/p/selling-ai-products-services-to-big
The global market for biobanks was estimated at $42B in 2020
https://shawndimantha.substack.com/p/the-techbio-idea-maze-to-be-or-not
https://www.michaeldempsey.me/blog/2025/10/03/sequencing-vs-equal-odds-applied-research/
Largest Deals in the Space
| Buyer | Startup | Deal type | Headline terms | Notes on dataset |
|---|---|---|---|---|
| GSK | Ochre Bio | Data liscence (multi‑year) | Up to $37.5M | Human liver disease atlases/perturbations; “foundational” dataset. (PMLiVE, Business Wire) |
| AstraZeneca + Pathos | Tempus | Data licensing + model build | $200M fees | De‑identified multimodal oncology RWD for a foundation model. (Tempus Investors) |
| Recursion | Tempus | Data access (preferred) | Up to $160M / 5 yrs | Large oncology cohort licensing; explicit “Licensed Data” terms. (Tempus, SEC) |
| Roche/Genentech | Recursion | Discovery collab w/ dataset access | $150M upfront; up to $12B | 10‑K: precedent for selling access to Recursion’s dataset. (SEC, GEN, Fintel) |
| Boehringer Ingelheim | Ochre Bio | Discovery collab | $35M upfront; up to ~$1B | Builds on Ochre’s human liver datasets for target ID. (Financial Times) |
| Eli Lilly | Fauna Bio | Target discovery collab | Up to $494M (incl. equity, milestones, royalties) | Uses Fauna’s cross‑species genomics dataset + AI. (Ropes & Gray, PR Newswire) |
| Novo Nordisk | Fauna Bio | Research collab | Upfront + research support (undisclosed) | Access to hibernation biology datasets for obesity. (BioSpace) |
| AstraZeneca | Immunai | AI/data collab | $18M | Leverages single‑cell immune atlas to inform trials. (Reuters) |
| Roche (M&A) | Flatiron Health | Acquisition | $1.9B | Lock‑in of oncology EHR/RWD datasets & products. (BioPharma Dive) |
| GSK | Relation | Target discovery collab | $45M upfront (incl. $15M equity), up to ~$200M/target in tiered royalties | Relation to run observational studies to create two proprietary functional disease for target ID |
| Amgen | deCode | Acquisition | $415M cash acquisition | The genetically homogenous population of Iceland plus the Icelandic national database of EHRs |
| GSK | 23andMe | Data collab | $350M as an equity investment in parent company | 5-yr exclusive discovery collaboration using 23andMe database; 50/50 on certain R&D programs |
| Regeneron | TriNetX | Data collab / investment | $200M as equity investment | Exclusive opportunitiy to connect RGC’s internal genomic and proteomic data to TriNetX’s 300M de-identified, anonymous EHRs |

Biobanks
Large-scale largely public efforts like UK Biobank price their data access at cost‑recovery National and institutional actors increasingly codify cost‑recovery rather than profit—e.g., NHS SDE principles, UK Biobank pricing, and many institutional fee schedules.
Storing, handling, and sharing specimens is the biobank’s core business. If specimens are left unused, the biobank fails to fulfil its mission. Many studies acknowledged that large numbers of underutilised specimens were a major problem for the financial sustainability of biobanks (Campos et al. 2015; Lin et al. 2019). A global survey of 276 biobanks (Henderson et al. 2019b, p. 217) indicated that in over half of the institutions, the utilisation rate was 10% or lower, and the actual annual utilisation rates of samples were by 2.5 to 5 times lower than the target. Henderson et al. (2019b, p. 217) argued that underutilisation ‘breaks the trust between the scientists/biobanks and the donors’ and is a threat to the social sustainability of biobanks.
https://www.mdpi.com/2076-0760/11/7/288
We found that researchers placed the greatest relative importance on the quality of specimens (26%), followed by the characterization of specimens (21%). Researchers with prior experience purchasing biological samples also valued access to key endemic in-country sites (11.6%) and low handling fees (5.5%) in biobanks.
https://www.nature.com/articles/s41598-023-49394-6?utm_source=chatgpt.com
[From 2016]: Nowadays IMS automatically receives petabytes (1015 bytes or more) of data from the computerized records held by pharmacies, insurance companies and other medical organizations—including federal and many state health departments. Three quarters of all retail pharmacies in the U.S. send some portion of their electronic records to IMS. All told, the company says it has assembled half a billion dossiers on individual patients globally.
IMS and other data brokers are not restricted by medical privacy rules in the U.S., because their records are designed to be anonymous—containing only year of birth, gender, partial zip code and doctor's name. HIPAA for instance governs only the transfer of medical information that is tied directly to an individual's identity.
Even anonymized data command meaningful prices. Every year, for example, Pfizer spends $12 million to buy health data from a variety of sources, including IMS, according to Marc Berger, who oversees the analysis of anonymized patient data at Pfizer.
https://www.scientificamerican.com/article/how-data-brokers-make-money-off-your-medical-records/
Earlier this week, health data company Truveta, which normally traffics data like patient immunizations, social determinations of health, lab tests, and pharmacy and insurance claims, announced that it will be starting a new Truveta Genome Project to create a massive database of genetic information from 10 million patients over the next five years to pair with their health record data. A crop of companies including Avandra, Gradient Health, Segmed, and Protege offers de-identified patient images to companies and researchers.
As of ~2016, many biobanks hadn’t even formalized a business model at all. Almost all relied on non-profit funding.

https://link.springer.com/article/10.1186/s12967-019-02187-w/figures/7
UK Biobank’s pricing structure (Non-profit):


UK Biobank provides its annual financials:
https://www.ukbiobank.ac.uk/wp-content/uploads/2025/01/Report-and-consolidated-financial-statements-2023.pdf, https://www.ukbiobank.ac.uk/wp-content/uploads/2025/01/Report-and-consolidated-financial-statements-2020.pdf, https://www.ukbiobank.ac.uk/wp-content/uploads/2025/01/Report-and-consolidated-financial-statements-2022.pdf, https://www.ukbiobank.ac.uk/about-us/our-funding/

https://ibb.iarc.who.int/access-policy/iarc-biobank-price-list-external-april2025.pdf

Other pricing lists: