top of page

One Million New Species Discovered

Basecamp Research has announced the discovery of over one million previously unknown species as part of their new BaseData™ dataset.

DNA profile graphic (decorative)

This discovery stems from years of intensive global biodiscovery efforts and has culminated in BaseData™, now the world’s largest and most diverse dataset of biological protein sequences. Built specifically to power the next generation of AI in biology, this data platform is redefining the possibilities of modern life science.


Currently containing a staggering 9.8 billion protein sequences, BaseData™ eclipses the size of all publicly available biological sequence repositories combined. More than just a scientific milestone, this development aims to shatter a long-standing barrier in the life sciences: the “data wall” that has stalled the progress of AI models in biology. 


AI is transforming biology, as shown previously by BaseFold, an AI-powered tool also developed by Basecamp Research which harnesses the company's trademark data diversity, and which predicts protein structure with better accuracy than AlphaFold2. This tool, which is particularly adept at predicting the structure for complex proteins, is now an invaluable tool in AI-driven discovery. 


The field of generative biology promises to revolutionize everything from medicine to climate science. But to thrive, these AI models need vast, diverse, and high-quality datasets.


Public biological databases, though widely accessed (logging over 100 million hits daily), were originally built for academic research, not AI applications. Critically, 70% of all publicly available protein data comes from just ten species, making these datasets highly redundant and narrow in scope.


This lack of diversity has constrained AI’s ability to generalize, innovate, and make meaningful predictions in the life sciences. In short, models have been trying to understand the richness of nature using an incomplete picture. This is the data wall.


Basecamp Research took a bold approach to solving this problem. Through partnerships with over 125 communities across 26 countries, the company collected genetic samples from some of the most remote and extreme environments on Earth, all under a scalable and ethical framework aligned with the UN’s Nagoya Protocol.


The result is BaseData™, a purpose-built, clean, and redundancy-free dataset more than ten times larger than any public alternative. It's already proving essential for training next-generation biological foundation models, the same type of models driving breakthroughs in drug discovery, enzyme engineering, and synthetic biology.


The discoveries within BaseData™ are as extraordinary as they are diverse, representing entirely new microbial species, not just genetic variants.


Examples include a new species of Burkholderia found on a WWII shipwreck, capable of extracting heavy metals from seawater, with powerful potential for bioremediation and pollution control. A thermophilic archaeon from the Sulfolobaceae family, isolated from acidic volcanic hot springs. Its heat-stable proteins could enhance drug delivery systems and extend the shelf-life of biological therapeutics. A unique species of Candidatus Eremiobacterota discovered in Antarctic soil, able to metabolize hydrogen and extract water from the air, an adaptation with potential for next-gen drug delivery or space-based life support systems.


These breakthroughs are more than academic curiosities. They could become the biological building blocks for new antibiotics, climate-resilient crops, and biosensors for disease detection.


Foundation AI models, the same class of models behind ChatGPT and image generation tools, are being rapidly adapted to biology. But their success depends on training data that captures the full breadth of nature’s complexity.


By making BaseData™ available to researchers and institutions, Basecamp is laying the groundwork for a new era of biological discovery, one driven by ethically sourced, commercially scalable, and scientifically rigorous data.


As the life sciences race to solve global challenges like antibiotic resistance, rare diseases, and environmental collapse, the discovery of one million new species offers more than just hope, it offers a new foundation for biological innovation.


Learn more at www.basecamp-research.com.

BioFocus square logo

Author

BioFocus Newsroom

bottom of page