KAUST releases largest catalog of ocean DNA

A new study by KAUST and collaborators in Spain provides the world's most comprehensive database yet for understanding microbial distribution and function in the ocean. Using the KAUST Metagenomic Analysis Platform (KMAP), a KAUST invention from 2021, the scientists analyzed massive amounts of sequencing data to release Global Ocean Gene Catalog 1.0. This catalog is the world's largest open-source catalog of marine microbes and matches microbial class with gene function, geographic location and habitat type, including 317 million unique gene clusters.  

The ocean's microbes represent the earliest lifeforms on Earth and have evolved the capacity to metabolize compounds that affect the cycles of elements like nitrogen, sulfur and carbon, which control ocean productivity and affect climate stability. Their beginnings at the bottom of the sea have made them a fascinating study not only for the evolution of life but also for biotechnology. For example, enzymes sourced from bacteria living around hydrothermal vents have been used to support the polymerase chain reaction, the test used for COVID-19 detection. Moreover, the use of marine genetic resources in industrial processes yields an estimated $6 billion annually, a number that doubles every six years as more genes in ocean microbes are found. 

"Scientists can access the catalog remotely to investigate how different ocean ecosystems work, track the impact of pollution and global warming, and search for biotechnology applications such as new antibiotics or new ways to break down plastics," said KAUST Ibn Sina Distinguished Professor Carlos Duarte, who led the project. "The acceleration of AI we are currently experiencing is likely to play a major role in identifying genes of biotechnology interest contained within the massive catalog we are releasing." 

The catalog reveals major differences in microbial activity in open oceans and ocean floors as well as discovers a surprising number of fungi contributing to the genomic diversity of the mesopelagic ocean (depths of 200 meters to 1000 meters). It also provides an extraordinary wealth of information on benthic microbes. These organisms live on the seafloor and are far less studied than their open ocean cousins, pelagic microbes; both types of microbes and more are exhaustively surveyed in the catalog.  

The ability to collect and analyze such a new wealth of ocean data comes from major developments in DNA sequencing technology and computational power. Thus, the scientists could sequence 2102 ocean samples taken from different depths and locations around the world and, using Shaheen II, KAUST's previous supercomputer, analyze the sequences to identify the more than 317 million gene groups.   

Considering the dynamics of oceans and ocean life, the catalog provides an invaluable database for how marine ecosystems are adapting to an environment that is constantly changing because of natural and anthropogenic causes.  

"Our analysis highlights the need to continue sampling the oceans, focusing on areas that are under-studied, such as the deep sea and ocean floor," said KAUST Ph.D. student and co-author of the study Elisa Liaolo. 

"Impressive as it is, the 317 million gene groups documented in the Ocean Gene Catalog 1.0 likely represents the tip of the iceberg of the massive library of functional capacities the long evolutionary history of life in the ocean has accrued," added Duarte. "Further projects focused on sampling and massive sequencing of understudied habitats in the ocean, including organisms such as corals and seagrass, not included in the study, which are known to host large numbers of microbial species, will likely reveal many times the number of genes included in this initial gene catalog."   

The study, which is published in Frontiers in Science, further cements KAUST's position as a leader in marine biology and strengthens its alignment with Saudi Arabia's national priorities.