Illumina Launches Billion Cell Atlas to Power AI Drug Discovery

Illumina has unveiled what it calls the world’s largest genome-wide genetic perturbation dataset, a major initiative designed to accelerate artificial intelligence–driven drug discovery across the global pharmaceutical industry. The new resource, named the Illumina Billion Cell Atlas, represents the first phase of a broader effort to build a five-billion-cell atlas over the next three years, which the company says will become the most comprehensive map of human disease biology ever created.

The Atlas is being developed under an alliance framework with AstraZeneca, Merck, and Eli Lilly and Company as founding participants. It is already in progress for a curated set of disease-relevant human cell lines and is intended to support drug target validation, large-scale AI model training, and deeper investigation into disease mechanisms that have historically been difficult to study.

According to Illumina Chief Executive Officer Jacob Thaysen, the project is designed to unlock a new level of scale for AI in drug discovery. He said the Atlas will provide an unprecedented resource for training next-generation AI models in precision medicine, enabling researchers to better map the biological pathways underlying some of the world’s most complex and devastating diseases.

The Atlas will capture how one billion individual cells respond to genetic perturbations introduced through CRISPR technology across more than 200 disease-relevant cell lines. These lines span a wide range of therapeutic areas, including immune disorders, cancer, cardiometabolic disease, neurological conditions, and rare genetic diseases. By systematically switching genes on and off across key cell types, researchers can directly observe how genetic changes affect cellular behavior.

Pharmaceutical partners plan to use the data to improve drug discovery and development decisions. Merck, for example, will leverage the Atlas to train proprietary AI and machine learning foundation models and build virtual cell models aimed at improving disease indication prediction. AstraZeneca highlighted the value of translating genetic signals into mechanistic biology that can directly inform drug development, while Eli Lilly emphasized the importance of large-scale, diverse biological datasets as the foundation for the next generation of AI-driven discovery.

The Billion Cell Atlas is the first data product from Illumina’s newly established BioInsight business. Enabled by Illumina’s Single Cell 3’ RNA platform, the initiative is expected to generate around 20 petabytes of single-cell transcriptomic data annually. The data will be processed using Illumina’s DRAGEN pipeline and hosted on the Illumina Connected Analytics cloud platform.

Illumina plans to continue expanding multi-billion-cell atlases with partners over time, building toward its long-term five-billion-cell vision.

Comments (0)
Add Comment