Evolution of AI-based change detection for low-to-medium resolution satellite imagery and design strategies for user-customized generalized platforms

Heein Yang; Daehee Kim

doi:10.23386/joss.2026.3.1.002

Preview

JOURNAL OF SPACE SECURITY. 30 June 2026. 13-21
https://doi.org/10.23386/joss.2026.3.1.002

Evolution of AI-based change detection for low-to-medium resolution satellite imagery and design strategies for user-customized generalized platforms

Heein Yang¹^*

Daehee Kim¹

¹SAR Image Technology Team, CONTEC, Daejeon 30474, Republic of Korea

^{*Corresponding Author}

License (open-access, https://creativecommons.org/licenses/by-nc/4.0/):

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

ABSTRACT

With the rapid advancement of Earth observation satellite technology and the widespread operation of small satellite constellations, optical image data capable of monitoring wide areas on a regular and continuous basis is exploding. The accumulation of this time-series data has highlighted the importance of change detection technology, which identifies and interprets changes in the Earth’s surface in a rapidly changing global environment driven by climate change, large-scale natural disasters, urbanization, and international conflict. Change detection based on low- to mid-resolution optical satellite imagery offers significant value for wide-area and long-term monitoring. However, it presents technical challenges that differ from high-resolution change detection, which focuses on detecting the presence and changes of targets, due to differences in pixel characteristics across time periods, location alignment errors, and radiometric mismatches caused by the atmosphere, clouds, and seasonality. This paper assesses the current state of change detection technology based on low- to mid-resolution satellite imagery and provides an in-depth analysis of the evolution of deep learning architectures. This study quantitatively and qualitatively examines how the evolution of deep learning models has contributed to improved change detection performance, starting with traditional statistical techniques, moving on to Convolutional Neural Networks (CNNs), UNet-based Siamese networks, and the Vision Transformer (ViT), which captures global context. Finally, we present the State Space Model (SSM)-based Mamba architecture, which has recently become a leader in change detection, demonstrating superior spatial feature capture while maintaining linear computational complexity. However, despite the advancements in deep learning model architecture, a new model-centric approach alone cannot fully satisfy the complex and diverse requirements of real-world applications. This study emphasizes that while improving the performance of the AI model itself is crucial, designing a monitoring system tailored to the end user’s needs (analysis objectives, target objects, and resolution) is a critical factor in determining practical success or failure. Furthermore, the paper concludes by suggesting the necessity of building a ‘next-generation generalized change detection platform’ that will accommodate various foundation models and heterogeneous sensor data (hyperspectral, SAR, etc.) and provide general-purpose functions by integrating intelligent preprocessing such as cloud and haze removal, and its specific development direction.

Keywords

Satellite change detection

Multiresolution imagery

Siamese networks

Transformers

Semi-supervised learning

Imbalance

Super-resolution

Attention

Deformable convolution

MAIN

1. INTRODUCTION
1.1. Background and Necessity of Change Detection Research
1.2. Research Purpose and Key Contributions
2. CHANGE DETECTION TARGETS AND DATASET COMPOSITION METHODS BY SATELLITE IMAGE RESOLUTION
2.1. Analysis of the Types and Characteristics of Changes that can be Analyzed by Resolution
2.2. Key Constraints and Refinement Strategies for Change Detection Dataset Construction
3. DEVELOPMENTS IN DEEP LEARNING MODELS AND MODERN ARCHITECTURES
3.1. Introduction and Overcoming Limitations of CNNs, UNet, and Siamese Networks
3.2. The Emergence of the Vision Transformer (ViT) and Global Context Capturing
3.3. The Emergence of the Mamba Architecture and the Shift in Change Detection Paradigm
4. THE IMPORTANCE OF TOP-DOWN MONITORING DESIGN BASED ON USER REQUIREMENTS
4.1. Limitations of Reliance on SOTA Models and the Practical Disparity
4.2. Customized Design Strategy Focused on Analysis Purpose, Target, and Resolution
4.3. Overcoming Class Imbalance and a Semi-Supervised Learning Framework
5. DEVELOPMENT DIRECTION OF A NEXT-GENERATION GENERALIZED CHANGE DETECTION PLATFORM
5.1. Convergence of Vision Foundation Models
5.2. Integration of Heterogeneous Modalities and Internalization of Intelligent Assistive Technologies
5.3. Intelligent Change Detection Framework
6. CONCLUSION

1. INTRODUCTION

1.1. Background and Necessity of Change Detection Research

In modern society, Earth observation satellites provide irreplaceable capabilities for wide-area, periodic monitoring of the entire Earth. The commercialization of reusable launch vehicles, which has reduced the cost of space entry, and the remarkable advancements in CubeSats and small satellite technology have led to a growing trend of operating satellite constellations, not only by government agencies but also by various private companies. Consequently, the demand for change detection technology, which observes the same object or phenomenon at different points in time to identify differences in status and their implications, is growing exponentially. Change detection, which identifies meaningful changes in images observed over the same area at different points in time, has diverse public and industrial applications, including urban expansion/redevelopment management, forest and agricultural monitoring, water resource and coastline changes, disaster damage assessment, and security surveillance.

In particular, with the recent increase in extreme weather events such as torrential rains, heat waves, droughts, and wildfires due to climate change, real-time monitoring of flood and inundated areas and assessment of wildfire damage have become increasingly important. Furthermore, the instability of international affairs, such as the Ukraine War and the Israel-Hamas conflict, is further exacerbating the demand for precision change detection based on time-series optical imagery.

Traditional visual interpretation methods, while capable of recognizing complex changes based on the contextual understanding of highly trained interpreters, suffer from high visual fatigue, are unsuitable for processing large amounts of imagery, and have clear limitations in objectivity and scalability due to subjectivity among experts.

Consequently, the introduction of automated change detection technologies based on artificial intelligence (AI) deep learning that can consistently and rapidly process large-scale satellite imagery has become a pressing need. The common practical value of change detection boils down to:

(i) location of change (where),

(ii) type of change (what to what),

(iii) magnitude of change (how much),

(iv) confidence (how certain is it), and

(v) decision linkage (what to do),

suggesting that change detection should be defined as a system of requirements definition à data quality management à model selection/training à operation/feedback, rather than a model performance competition.

1.2. Research Purpose and Key Contributions

The purpose of this paper is to track the evolution of deep learning models and demonstrate the importance of tailored monitoring design for end users, based on comprehensive research and analysis conducted to maximize the effectiveness of change detection technology based on low- to mid-resolution satellite imagery. The main academic and practical contributions of this study are summarized as follows:

First, it systematically establishes standards for structuring satellite imagery data by resolution and purpose. Based on the spatial resolution of satellite imagery (low, medium, and high), it precisely classifies recognizable objects and types of change suitable for analysis. Furthermore, it systematically identifies environmental and technical factors necessary for training data construction, such as shooting geometry, weather conditions, seasonal variability, and image alignment quality, thereby presenting guidelines for tailored data design for end users.

Second, it provides an in-depth analysis of the evolution of deep learning-based change detection architectures and performance enhancement techniques. We clearly traced the evolution of CNN-based multi-layer fusion, through the Transformer, which captures global context, and finally, Mamba, the latest leading model that models spatiotemporal features with linear computational complexity. Furthermore, we critically examined the limitations of various modules (Attention, Deformable Convolution, etc.) and validation metrics for performance enhancement.

Third, this paper pointed out the limitations of model-centric approaches and proposed a blueprint for a demand-tailored monitoring design and a next-generation generalization framework. Beyond the superiority of deep learning architectures, we demonstrated that top-down design tailored to the target characteristics and monitoring objectives determines practical success. Furthermore, we clearly outlined the future direction for a generalized change detection platform that combines heterogeneous modalities and foundation models.

2. CHANGE DETECTION TARGETS AND DATASET COMPOSITION METHODS BY SATELLITE IMAGE RESOLUTION

The performance of a change detection algorithm is absolutely dependent on the spatial, spectral, and temporal characteristics of the input satellite imagery. Therefore, prior to algorithm development, a thorough analysis is required to determine which resolution images are appropriate for which analysis and what constraints should be considered when constructing the dataset for model training.

2.1. Analysis of the Types and Characteristics of Changes that can be Analyzed by Resolution

Currently, approximately 45.5% of deep learning-based change detection research focuses on building change detection. This is because large-scale, precisely labeled public datasets, such as LEVIR-CD [1], WHU-CD [2], S2Looking [3], and SYSU-CD [4], are primarily focused on high-resolution urban buildings. The level of details in satellite image varies depending upon the spatial resolution of the image.

As shown in Fig. 1, fine-grained object boundaries are clearly distinguishable in sub-meter imagery, whereas medium and low-resolution imagery primarily retain coarse land-cover patterns rather than precise object-level geometry.

https://cdn.apub.kr/journalsite/sites/JOSS/2026-003-01/N0670030102/images/Figure_joss_2026_31_13_F1.jpg

FIG. 1.

The impact of spatial resolution on the level of observable structural details within the same geographic area

However, the actual monitoring needs of governments and public institutions are focused on wide-area time-series monitoring at low to mid-resolution, such as monitoring nationwide deforestation, flooding, and desertification. Satellite images from Landsat with a resolution of 30 m or Sentinel-2 with a resolution of 10 m have difficulty identifying the details of individual buildings, but their rich multispectral bands and long observation history spanning decades provide overwhelming technical utility in identifying situational changes in land use on a global scale that go beyond the simple existence of objects. The resolution-dependent characteristics of change detection targets and their associated dataset construction constraints are summarized in Table 1.

TABLE 1.

Resolution-dependent change detection characteristics in optical satellite imagery. As spatial resolution increases from 30 m to sub-meter levels, detectable change types shift from large-scale land-cover transitions to object-level structural modifications, accompanied by distinct technical limitations and dataset construction considerations.

Category	Spatial Resolution	Representative Satellite and Sensor	Major ROI and the Purpose of Change Detection	Technical Limitation and Characteristics
Low- resolution	10~30 m	Landsat-8/9, Sentinel-2	Changes in agricultural land cultivation patterns, large-scale deforestation, changes in water bodies (submerged areas), and macroscopic land cover changes.	Even buildings measuring 90 meters in length are depicted at the 3-pixel level within the image, making it impossible to identify the shapes of individual buildings or the structural identification of small objects (vehicles, detailed roads). This relies on changes in macroscopic spectral characteristics.
Mid- resolution	3~10 m	PlanetScope (Dove), SPOT-6	Large-scale infrastructure construction and apartment complex development, changes in offshore structures and port density, and changes in medium-sized farmland boundaries.	While the ability to extract clear shapes of narrow alleys or individual vehicles remains limited, it provides sufficient resolution for detecting the presence of buildings and changes in the composition of large-scale road networks.
High- resolution	0.5~1 m	KOMPSAT-3A, GeoEye-1, WorldView-3	New construction/collapse of individual buildings, identification of detailed road network lanes, tracking of small military installations and vehicle movement, and precise analysis of urban infrastructure.	While it offers precise identification capabilities, it incurs significant data acquisition and processing costs for wide-area monitoring. Furthermore, its high revisit frequency makes it difficult to build long-term time-series data.

2.2. Key Constraints and Refinement Strategies for Change Detection Dataset Construction

To achieve effective change detection results using low- to mid-resolution imagery, dataset construction requirements must be closely controlled prior to algorithm design. The major external factors hindering change detection performance and countermeasures investigated in this study are as follows.

First, satellite imagery geometry and registration errors. Satellite images project terrain slopes and the positions of tall buildings differently depending on the orbit and attitude at the time of capture, especially the off-nadir angle. This geometric pixel mismatch is a primary cause of false positives, causing deep learning models to misidentify structural changes even when no actual ground changes exist. Since pixel-based change detection algorithms rely on numerical pixel differences, the prior application of a time-series image registration algorithm precisely tuned to within one-time the Ground Sampling Distance (GSD) (e.g., within 0.5 m for 0.5 m resolution images) is essential.

Second, there is spectral distortion due to seasonal and meteorological changes. Dense vegetation in summer and snow-covered ground in winter create significant differences in the color, texture, and spectral characteristics of images, even when they are in the same location. To prevent deep learning models from misrecognizing simple seasonal color variations as permanent changes in land cover, robust data augmentation techniques (e.g., brightness/contrast transformation, RGB shift, snow/shadow effect synthesis, etc.) that simulate seasonal variability during model training are essential to ensure model robustness.

Third, the physical exclusion of clouds and haze. Due to the nature of optical imaging, atmospheric conditions are the most critical constraint affecting image quality.

More than 30-60% of available imagery is obscured by clouds or haze, which not only causes permanent information loss but also radiometric inconsistency between images. Recently, technologies that go beyond simple masking and automatically identify thick clouds, thin clouds, and even cloud shadows using CNN-based semantic segmentation techniques, or apply dehazing techniques to correct image discoloration caused by atmospheric scattering, are emerging as core preprocessing elements in change detection pipelines.

3. DEVELOPMENTS IN DEEP LEARNING MODELS AND MODERN ARCHITECTURES

Past change detection using satellite imagery primarily relied on statistical techniques that calculated the differences in pixel values between two time periods. Representative techniques include image differencing, image ratioing (e.g., utilizing optical indices like NDVI), principal component analysis (PCA), multivariate alteration detection (MAD), and its complementary technique, iteratively re-weighted MAD (IR-MAD). MAD, in particular, analyzes linearly combined components based on canonical correlation analysis, demonstrating robustness to radiance and atmospheric correction errors. However, it exhibited significant limitations in detecting complex, nonlinear changes, such as seasonal vegetation changes. To overcome these statistical limitations and automatically learn complex spatial patterns and nonlinear relationships within imagery, deep learning architectures were widely adopted. This advancement has led to the development of convolutional neural networks (CNNs), Transformers, and, more recently, the Mamba model.

3.1. Introduction and Overcoming Limitations of CNNs, UNet, and Siamese Networks

In the field of remote sensing, deep learning-based change detection began in earnest with the combination of Fully Convolutional Networks (FCNs) [5] and Siamese architectures. Daudt et al. (2018) proposed the FC-EF (Early Fusion) architecture, which processes input images in a single pass, and the FC-Siam-Conc (Concatenation) and FC-Siam-Diff (Difference) architectures, which extract features from two viewpoints using two parallel encoders that share weights [6]. Table 2 presents the quantitative performance comparison of FC-EF and Siamese-based variants evaluated on standard change detection benchmarks.

TABLE 2.

F1-score comparison of FC-based architectures evaluated on OSCD [3] and Air Change dataset (Szada, Tiszadob) [4]

Dataset	FC-EF	FC-Siam-Conc	FC-Sima-Diff
OSCD [7]	48.89	45.20	48.86
OSCD [7]	56.91	51.36	57.92
Szada [8]	51.40	50.41	52.66
Tiszadob [8]	93.40	82.65	77.78

Among these, the Siamese network architecture maps image features from different viewpoints to a latent space of the same dimension, narrowing the domain gap, a factor in false positives caused by lighting or atmospheric changes, and dramatically improving the stability of change detection performance.

To address the issue of spatial resolution loss that occurs with increasing network depth, the UNet and UNet++ architectures, which reduce the semantic gap between the encoder and decoder, have been widely adopted. In particular, the improved UNet++ precisely fuses multi-layer feature information through the Multiple Side-Output Fusion (MSOF) module, contributing to the precise extraction of boundaries for thin road networks and small objects. Depth-wise separable convolution techniques were employed to reduce the number of model parameters and improve computational efficiency.

Furthermore, there are examples of spatial reconstruction capabilities further enhanced by applying memory-efficient Pixel Shuffle to prevent loss of location information during resolution restoration.

3.2. The Emergence of the Vision Transformer (ViT) and Global Context Capturing

CNN-based models are known to have an inherent weakness in capturing the global spatial context across an image due to their limited receptive fields. To overcome this, the Transformer architecture [9], a breakthrough in natural language processing (NLP), has been introduced for change detection.

A representative example is the Bi-temporal Image Transformer (BIT) model proposed by Chen et al. (2021) [10]. The BIT model compresses features extracted by the CNN backbone into a very small number of “semantic tokens.” The BIT then globally models the spatiotemporal relationships between tokens using the Transformer encoder’s self-attention mechanism. The BIT decoder then converts these features back into pixel-level features while preventing loss of fine-grained information, effectively recognizing semantic changes occurring over large areas.

However, when dealing with data with a high pixel count, such as high-resolution satellite imagery or extensive low- to mid-resolution images, the Transformer model suffers from a critical drawback: computational load (FLOPs) and memory requirements increase exponentially with the square of the input sequence length (the number of image patches). Maintaining high-resolution patches is essential for extracting precise boundaries in pixel-level change detection, but this quadratic complexity of the Transformer model has resulted in a hardware resource bottleneck, hindering practical application.

3.3. The Emergence of the Mamba Architecture and the Shift in Change Detection Paradigm

Since 2024, the Mamba architecture, based on the State Space Model (SSM), has been introduced to remote sensing change detection, overcoming the quadratic complexity limitations of the Transformer and capable of processing long sequences with linear computational complexity. This architecture has led the way in state-of-the-art (SOTA) performance [11]. The overall architecture of the proposed ChangeMamba framework is illustrated in Fig. 2.

https://cdn.apub.kr/journalsite/sites/JOSS/2026-003-01/N0670030102/images/Figure_joss_2026_31_13_F2.jpg

FIG. 2.

Overall architecture of the ChangeMamba framework. The framework employs a VMamba-based encoder for efficient global spatial modeling with linear complexity and task-specific decoders for (a) binary change detection (MambaBCD), (b) semantic change detection (MambaSCD), and (c) building damage assessment (MambaBDA).

The Mamba model, originally borrowed from the linear time-invariant system (LTS) paradigm, maps a one-dimensional input sequence to a response via a hidden state. It also incorporates a selective scan mechanism that dynamically adjusts parameters depending on the input. To apply this architecture to 2D spatial data such as satellite imagery, the Visual Mamba (VMamba) [12] architecture is utilized. This utilizes a 2D Cross-Scan (SS2D) technique that repositions input 2D feature tokens in four directions (top left-bottom right, bottom right-top left, top right-bottom left, bottom left-top right), thereby fully and globally capturing the spatial context between pixels within a linear operation. The internal structure of the VSS block and the 2D Selective Scan (SS2D) mechanism are illustrated in Fig. 3.

https://cdn.apub.kr/journalsite/sites/JOSS/2026-003-01/N0670030102/images/Figure_joss_2026_31_13_F3.jpg

FIG. 3.

Visual Mamba’s VSS block and SS2D module. SS2D transforms 2D feature maps into structured sequences for state-space modeling, enabling efficient global context aggregation with linear complexity.

Recent research has modified this Mamba architecture to suit the characteristics of change detection, achieving significant performance improvements over existing approaches.

ChangeMamba [13]: By applying the Visual State Space (VSS) block to the encoder to extract global spatial information with linear complexity, and combining the Spatio-Temporal State Space (STSS) block with the decoder, time-series features between two viewpoints are dynamically rearranged and modeled. This model demonstrated F1-score outperforms existing CNN (SiamCRNN, HRSCD) and Transformer (BIT, SwinSUNet [14])-based models in tasks such as binary change detection on the SYSU-CD dataset (MambaBCD), multi-class cover change detection on the SECOND dataset (MambaSCD), and building damage assessment on the xBD dataset (MambaBDA).

CDMamba [15]: This model was proposed to overcome the weakness of existing Mamba, which is capable of capturing global information but can miss local details in dense prediction tasks. We designed the Scaled Residual Convolutional Mamba (SRCM) module, which combines the local feature extraction capabilities of CNNs with Mamba, and the Adaptive Global Local Guided Fusion (AGLGF) block, which fuses change features using features from other viewpoints as a guide. As a result, the system achieved the highest performance in detecting irregular or very subtle building change areas, perfectly extracting even detailed contours missed by other ViT or pure Mamba models.

FA-Mamba (Feature-Augmented Mamba) [15]: Maximizing the advantages of Mamba’s relatively low memory consumption, FA-Mamba adds a dual time-phase semantic augmented attention module during the decoding stage to maximize feature map fusion between two viewpoints. This achieves outstanding performance, achieving an F1-score of 87.48 on LEVIR-CD, a large-scale optical and aerial imaging benchmark, and an F1-score of 82.71 on SYSU-CD, even in a computationally limited environment.

As this comparative analysis clearly demonstrates, the evolution of deep learning models has evolved from simple filter-based operations (CNN) to global context recognition (Transformer), and then to an optimization architecture (Mamba) that combines overwhelming computational efficiency with spatial modeling capabilities, contributing to a dramatic improvement in subpixel-level boundary extraction and time-series change detection performance.

4. THE IMPORTANCE OF TOP-DOWN MONITORING DESIGN BASED ON USER REQUIREMENTS

While it is true that the innovative advancements in modern deep learning architectures such as Mamba have been a powerful driving force in improving change detection performance, this study emphasizes that the design area that plans data and pipelines according to the requirements of end users (what targets they want to monitor and in what manner) is a much more crucial factor in determining practical success or failure than the structural advancement of deep learning models themselves.

4.1. Limitations of Reliance on SOTA Models and the Practical Disparity

In academia, standardized, high-resolution public datasets like LEVIR-CD [1] and WHU-CD [2] are often used to publish SOTA models to demonstrate subjective performance. However, an experimental study by Corley et al. (2024) showed that the superiority of even the most complex state-of-the-art architectures often stems from sophisticated hyperparameter tuning, optimization techniques (optimizers), and learning scheduler settings [16]. The comparative evaluation of representative change detection models on the LEVIR-CD and WHU-CD datasets under controlled training conditions is presented in Tables 3 and 4.

Under fair conditions with fully controlled variables, even a basic U-Net Siamese architecture with a well-constructed backbone can achieve performance comparable to or even superior to SOTA models like BIT or ChangeFormer. This suggests that performance is directly related to the quality of the design, which involves understanding the essential characteristics of the data to be learned and how to refine and train it, rather than blindly increasing the complexity of the algorithmic structure.

TABLE 3.

Performance comparison of change detection networks on the LEVIR-CD dataset under controlled training conditions

Network	Backbone	Precision (%)	Recall (%)	F1 (%)
FC-EF	-	86.91	80.17	83.40
FC-Siam-Conc	-	91.99	76.77	83.69
FC-Siam-Diff	-	89.53	83.31	86.31
DTCDSCN	-	88.53	86.83	87.67
STANet	SE-ResNet-34	83.81	91.00	87.26
CDNet	ResNet-18	91.60	86.50	89.00
BIT	ResNet-18	89.24	89.37	89.31
ChageEX	ResNet-18	92.97	90.61	91.77
U-Net	EfficientNet-b4	92.69	87.16	89.25
U-Net	ResNet-50	92.97	89.78	90.38
U-Net-Siam-Conc	ResNet-50	92.87	89.48	90.41
U-Net-Siam-Diff	ResNet-50	93.21	89.50	90.46

TABLE 4.

Performance comparison of change detection networks on the WHU-CD dataset under controlled training conditions

Network	Backbone	Precision (%)	Recall (%)	F1 (%)
Change Former	Mit-b1	82.60	78.57	77.75
Tiny-CD	EfficentNet-b4	80.15	77.56	78.53
BIT	ResNet-18	78.58	82.13	77.68
U-Net	ResNet-50	88.65	83.08	84.17
U-Net-Siam-Conc	ResNet-50	83.69	86.56	82.75
U-Net-Siam-Diff	ResNet-50	88.56	85.63	84.01

4.2. Customized Design Strategy Focused on Analysis Purpose, Target, and Resolution

Change detection monitoring systems require a top-down approach, tailored to end-user requirements.

As the Table 5 above illustrates, different detection objectives necessitate different evaluation metrics for detection models. For example, in a deforestation site where changes occur relatively slowly, a bi-temporal approach that simply compares images from two time periods may be ineffective, and a time-series-based network with numerous accumulated images should be applied to analyze accumulated changes. Conversely, in a site hit by an unexpected earthquake, there is no time to wait for periodic time-series data, so a bi-temporal approach or single-temporal post-event change detection is overwhelmingly advantageous. In practical settings, a hybrid change detection strategy that combines these two approaches is essential: first, indexing areas of suspected change with a time-series network and then deploying a precise bi-temporal (or single-temporal) network to those areas.

TABLE 5.

Customized monitoring design strategy according to user scenarios, target objects, and satellite image resolution

User Scenario	Target Objects and Resolution	Key Evaluation Metrics Strategy	Optimal Monitoring Approach
Natural Disaster/ War Damage Assessment	Wide-area flooding, bombed buildings (mixed low, medium, and high resolution)	Maximizing Recall: Even if a few false positives occur, not missing a single true positive is crucial for lifesaving and recovery policies.	Bi-temporal change detection focuses on comparing only two images before and after a disaster. Speed is paramount, and anomaly detection is crucial through the Attention module.
Illegal Building Monitoring/ Urban Expansion Analysis	Individual buildings and detailed roads in urban areas (high resolution preferred)	Maximizing Precision: To prevent incorrect administrative actions or policy distortions due to false positives, increasing the proportion of predicted changes among actual changes is crucial.	Applying Deformable Convolution, specialized for edge extraction, and parallel Object-Based Change Detection (OBCD) postprocessing.
Agricultural Crop Monitoring/ Forest Ecosystem Monitoring	Macroscopic land cover and vegetation index patterns (mainly low- to mid-resolution)	F1-score and time-series trend indicators: Balanced accuracy is required, as the model must learn long-term trends and seasonal cycles.	Time-series change detection using dozens of images. 3D spatio-temporal modeling, such as HiSTENet, that dynamically adjusts short- and long-term dependencies is essential.

4.3. Overcoming Class Imbalance and a Semi-Supervised Learning Framework

Another key aspect of designing a practical change detection model is how to overcome the inherent limitations of the data during the training process. Within satellite imagery, the ratio of areas with actual changes to areas without changes is extremely sparse (typically showing a severe imbalance of 1:20 or more). If the network is trained with a standard loss function while ignoring this issue, the model may fall into the trap of predicting all pixels as “no change” for safe judgment, thereby increasing overall accuracy [17]. To address this, designing the objective function with Unchanged Area Loss (UAL), which forces consistent latent features between two time points in areas without changes, or Focal Loss, which weights sparse classes, is as important a performance determinant as model structure modification.

Furthermore, to address the costly absence of ground truth labels, a semi-supervised learning-based framework is proposed. The overall semi-supervised learning framework adopted in this study is illustrated in Fig. 4. It utilizes a seed model trained on a very small set of ground truth data to incrementally generate pseudo-labels for massive Mirabel time-series images [18]. This approach transcends mere research and represents the pinnacle of robust, user-customizable operational design, enabling the model to evolve by incorporating user feedback during field operations [16].

https://cdn.apub.kr/journalsite/sites/JOSS/2026-003-01/N0670030102/images/Figure_joss_2026_31_13_F4.jpg

FIG. 4.

Semi-supervised learning-based change detection framework. The framework consists of four main stages: (1) construction of an initial supervised baseline model using a small ground truth (GT) dataset, (2) generation and filtering-based refinement of pseudo ground truth (pGT) from large-scale unlabeled time-series imagery, (3) integration of GT and pGT for class imbalance-aware retraining, and (4) quantitative performance evaluation using pixel-, object-, and boundary-level metrics. The process is iteratively refined based on evaluation results, enabling continuous model improvement and adaptation to user-specific operational requirements.

5. DEVELOPMENT DIRECTION OF A NEXT-GENERATION GENERALIZED CHANGE DETECTION PLATFORM

The explosive growth of diverse satellite sensor data and the continued emergence of new deep learning architectures like Mamba have enhanced the performance of change detection technology. However, the majority of these models are task-specific. However, in practice, monitoring requirements are diversifying across industries, and single-purpose change detection models present limitations in practical applications.

For deep learning-based change detection technology to achieve ultimate social and economic impact, it is essential to build a next-generation generalized change detection platform that integrates diverse modalities and continuously accommodates new intelligent modules as plug-ins, providing universal change analysis capabilities.

5.1. Convergence of Vision Foundation Models

The AI paradigm is rapidly shifting from transfer learning based on limited labeled data to foundation models based on self-supervised learning, which pretrains vast amounts of unlabeled data. In the field of remote sensing, specialized foundation models, such as the Segment Anything Model (SAM), which learns general features of the Earth’s surface on a large scale, are emerging.

Generalization platforms should build on the powerful general knowledge of these foundation models. For example, by incorporating SAM knowledge into change detection, such as with the Time-Traveling Pixels (TTP) technique, issues such as domain shift and inter-sensor resolution/spectral heterogeneity can be overcome through knowledge distillation. Through this, we should aim for a large-scale architecture that can perform zero-shot, high-precision detection in completely new terrains and environments with only fine-tuning or prompt-based learning on a small amount of data.

5.2. Integration of Heterogeneous Modalities and Internalization of Intelligent Assistive Technologies

With the explosive demand for satellites, satellites with diverse sensors are being operated in constellations. The integration of multi-sensor satellite imagery with diverse physical characteristics is becoming a core capability of general change detection algorithm.

Integrating Super-Resolution (SR) Technology: To overcome the alignment errors and spatial sampling limitations that arise when cross-analyzing time-series images with different orbits and sensor resolutions, the platform must incorporate an end-to-end pipeline that integrates a super-resolution module, such as SR-CDNet, that reconstructs low-resolution input images to high resolution, and a change detection module.

Hyperspectral Fusion: To identify subtle spectral changes in materials (e.g., signs of vegetation disease, soil contamination, camouflage formation, etc.) that are difficult to identify with conventional optical imagery (RGB/Multispectral) due to lack of morphological differences, the platform must support multimodal detection capabilities that combine the high spatial resolution of optical imaging with the material classification precision of hyperspectral sensors (decision fusion or feature fusion).

Cloud and Haze Removal: To address the most fundamental atmospheric challenge of optical satellite imagery, the platform must implement a standard preprocessing process that proactively performs cloud masking and haze dehazing immediately upon image ingestion, utilizing uncertainty-aware learning structures. This automatically ensures data availability and radiometric consistency.

5.3. Intelligent Change Detection Framework

Future generalization platforms should go beyond fragmented analysis tools and provide a seamless, cyclical workflow across the entire process: data preprocessing, AI-based semantic change analysis, indicator objectification and postprocessing, and user display and interactive feedback.

End users don’t simply require a collection of changed pixels from the platform. A generalization platform should utilize a multi-task learning architecture, like ChangeMamba’s MambaSCD, to not only identify areas where changes have occurred (localization), but also simultaneously derive semantic context (semantic change detection) to determine whether the change represents a transformation from forest to farmland or from farmland to urban facilities (from-to-class).

Furthermore, during the display stage, a decision-making interface should be provided that allows staff to visually assess the reliability of changed areas on a map viewer and directly integrate with action workflows, such as illegal construction crackdowns or disaster alerts. This interface allows field practitioners to easily correct false positives with a click, and this feedback indicator is collected in real time and immediately incorporated into model retraining data based on semi-supervised learning, thereby establishing a self-evolving loop. As a result, the platform can evolve into an intelligent system that perfectly adapts to the domain over time.

6. CONCLUSION

This study assessed the current status of change detection technology, leveraging the exponentially accumulating low- to high-resolution optical image data, to accurately and meaningfully monitor dynamic changes on the Earth’s surface, and examined its future direction.

Deep learning models, which have overcome the limitations of past statistical models, particularly the Transformer, which overcomes the local feature extraction limitations of CNNs, and the rapid advancement of modern SSM-based architectures like Mamba, which maximize computational efficiency and spatiotemporal context capture, have significantly contributed to elevating change detection accuracy. These advancements in AI models have enabled subpixel-level boundary extraction and time-series-based, fine-grained change detection.

However, advances in deep learning technology alone cannot solve all the complex and diverse problems of the real world. This study demonstrated that, while advancing AI performance, clearly defining the end-user’s analytical objectives, the characteristics of the target object, and available image resolution requirements is crucial for the practical success of the technology. This requires a “customer-tailored monitoring design,” which strategically integrates data acquisition, loss function design, and dual-temporal and time-series analysis techniques appropriately. Even the most powerful Mamba model will yield meaningless results if it lacks pre- and post-design measures to address image alignment quality, cloud-induced radial distortion, and extreme data class imbalance.

As satellite observation resources diversify and the scope of change monitoring expands to encompass national security, disaster response, and climate change monitoring, a new paradigm is required, moving beyond the individual development of single-task deep learning models. Future research and industry development in change detection technology should focus on intelligently integrating the general-purpose inference capabilities of the Large Visual Foundation Model (VFM) with heterogeneous modality fusion technologies such as super-resolution, hyperspectral, and SAR data within a single pipeline. Building a next-generation Generalized Change Detection Platform that flexibly responds to any geographic environment and the ever-changing demands of users, and provides an intelligent decision-making interface that evolves through user feedback, will be a practical and crucial milestone that fully translates the remarkable advancements in deep learning technology into the creation of social and economic value.

Acknowledgements

This study was supported by the Space Challenge Project of NRF (National Research Foundation of Korea).

References

H. Chen, and Z. Shi, A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sensing. 12, 2020, Article 1662. DOI: 10.3390/rs12101662.

10.3390/rs12101662

S. Ji, S. Wei, and M. Lu, Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery dataset. IEEE Transactions on Geoscience and Remote Sensing. 57, 2019, 574-586. DOI: 10.1109/ TGRS.2018.2858817

10.1109/TGRS.2018.2858817

L. Shen, Y. Lu, H. Chen, H. Wei, D. Xie, Y. Yue, R. Chen, S. Lv, and B. Jiang, S2Looking: A satellite side-looking dataset for building change detection. Remote Sensing. 13(24), 2021, Article 5094. DOI: 10.3390/rs13245094.

10.3390/rs13245094

L. Shen, Y. Lu, H. Chen, H. Wei, D. Xie, J. Yue, R. Chen, S. Lv, and B. Jiang, Building change detection for remote sensing images using a dual-task constrained deep Siamese convolutional network model. IEEE Geoscience and Remote Sensing Letters. 17(5), 2021, pp. 811-815.

10.1109/LGRS.2020.2988032

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7-12 June, 2015, pp. 3431-3440. DOI: 10.1109/CVPR.2015.7298965.

10.1109/CVPR.2015.7298965

R.C. Daudt, B. Le Saux, and A. Boulch, Fully convolutional siamese networks for change detection, 2018 IEEE International Conference on Image Processing (ICIP), 7-10 October, 2018, pp. 4063-4067. DOI: 10.1109/ICIP.2018.8451652.

10.1109/ICIP.2018.8451652

R. C. Daudt, B. Le Saux, A. Boulch, and Y. Gousseau, Urban change detection for multispectral earth observation using convolutional neural networks, IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, 22-27 July, 2018, pp. 2115-2118. DOI: 10.1109/IGARSS.2018.8518015.

10.1109/IGARSS.2018.8518015

C. Benedek and T. Szirányi, Change detection in optical aerial images by a multilayer conditional mixed Markov model. IEEE Transactions on Geoscience and Remote Sensing. 47, 2009, pp. 3416-3430. DOI: 10.1109/TGRS.2009.2022633.

10.1109/TGRS.2009.2022633

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Kaiser, and I. Polosukhin, Attention is all you need. Advances in neural information processing systems. 30, 2017, Article 30.

H. Chen and Z. Shi, Remote sensing image change detection with transformers. IEEE Transactions on Geoscience and Remote Sensing. 60, 2021, pp. 1-14. DOI: 10.1109/TGRS.2021.3095166.

10.1109/TGRS.2021.3095166

A. Gu and T. Dao, Mamba: Linear-time sequence modeling with selective state spaces. 2023. arXiv: 2312.00752.

Y. Liu, Y. Tian, Y. Zhao, H. Yu, L. Xie, Y. Wang, Q. Ye, J. Jiao, and Y. Liu, VMamba: Visual state space model. 2024. arXiv: 2401.10166.

10.52202/079017-3273

H. Chen, J. Song, C. Han, J. Xia, and N. Yokoya, ChangeMamba: Remote Sensing Change Detection With Spatiotemporal State Space Model. IEEE Transactions on Geoscience and Remote Sensing. 62, 2024, pp. 1-20

10.1109/TGRS.2024.3417253

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, Swin Transformer: Hierarchical vision transformer using shifted windows, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 10-17, October, 2021, pp. 10012-10022.

10.1109/ICCV48922.2021.00986

Q. Sun, Y. Liu, L. Li, H. Sun, L. Zhao, and C. Tian, FA-Mamba: a Mamba-based approach for remote sensing change detection, Proceedings of the SPIE, 2025, Volume 13697.

10.1117/12.3076336

I. Corley, C. Robinson, and A. Ortiz, A change detection reality check. 2024. arXiv: 2402.06994.

Q. Shi, M. Liu, S. Li, X. Liu, F. Wang, and L. Zhang, A deeply supervised attention metric-based network and an open aerial image dataset for remote sensing change detection. IEEE Transactions on Geoscience and Remote Sensing. 60, 2021, pp. 1-16.

10.1109/TGRS.2021.3085870

L. Zhao, L. Wan, L. Ma, and Y. Zhang, HiSTENet: History-Integrated Spatial-Temporal Information Extraction Network for Time Series Remote Sensing Image Change Detection. Remote Sensing. 17(5), 2025, Article 792.

10.3390/rs17050792

JOURNAL OF SPACE SECURITY ISSN:3058-5759(Print) 한국우주안보학회지

Preview

Evolution of AI-based change detection for low-to-medium resolution satellite imagery and design strategies for user-customized generalized platforms

ABSTRACT

MAIN

FIG. 1.

The impact of spatial resolution on the level of observable structural details within the same geographic area

TABLE 1.

TABLE 2.

F1-score comparison of FC-based architectures evaluated on OSCD [3] and Air Change dataset (Szada, Tiszadob) [4]

FIG. 2.

FIG. 3.

Visual Mamba’s VSS block and SS2D module. SS2D transforms 2D feature maps into structured sequences for state-space modeling, enabling efficient global context aggregation with linear complexity.

TABLE 3.

Performance comparison of change detection networks on the LEVIR-CD dataset under controlled training conditions

TABLE 4.

Performance comparison of change detection networks on the WHU-CD dataset under controlled training conditions

TABLE 5.

Customized monitoring design strategy according to user scenarios, target objects, and satellite image resolution

FIG. 4.

Acknowledgements

References