Menu
Top

Synthetic data can accurately track environmental disasters

  • KAUST scientists develop synthetic datasets in the absence of real data to predict how oil spills spread in the ocean 

King Abdullah University of Science and Technology (KAUST) and SARsatX, a Saudi company specializing in Earth Observation technologies and with a mission to integrate AI into environmental monitoring, have developed computer-generated data to train deep learning models for oil spill predictions. Validating the use of synthetic data is especially critical for environmental disaster monitoring, where early detection and response can significantly reduce the risk of environmental harm.  

Matthew McCabe, dean of the KAUST Biological and Environmental Science and Engineering Division, co-founder of SARsatX, and coauthor of the study, said: “One of the biggest challenges in environmental applications of AI is the lack of quality training data. Our solution was to use deep learning to create synthetic data from a very small sample of real data and train the predictive AI models on this.” 

McCabe and colleagues used a deep learning method known as Generative Adversarial Networks (GANs) to create new data that mimics a training set. Oil spills are typically detected using synthetic aperture radar (SAR) imagery, but in these images, oil slicks resemble calm ocean surfaces or natural organic films, making them difficult to distinguish.  

Using SARsatX knowledge and operational expertise in SAR-based environmental monitoring,  

the researchers began with just 17 SAR images to generate a synthetic dataset of more than 2,000 images. These images were used to train a second deep learning model known as Multi-Attention Network (MANet), designed to extract and classify subtle patterns in complex imagery. 

The researchers demonstrated that when trained exclusively with the GAN-generated synthetic dataset, the MANet model could correctly identify about 75% of the area covered by oil, closely matching the accuracies of similar methods trained with much larger collections of images. This finding shows that AI models can be developed and validated without requiring large volumes of real-world spill imagery. 

This approach could significantly enhance marine protection efforts by enabling faster, more reliable monitoring of spills while reducing logistical and environmental challenges associated with data collection.  

Peter Schulte, another coauthor and head of engineering at SARsatX, said: “Using one deep learning method to create data and another one to interpret it, this study demonstrated that AI can learn effectively from synthetic examples. This approach shows that AI models for environmental applications can be trained without waiting for real disasters to occur.”