A comprehensive survey on AI-based infrared satellite image analysis: From radiative principles to multi-task detection

Dongik Lee; Bomin Kim; Hyeonji Choi; Sungho Kim

doi:10.23386/joss.2026.3.1.001

Preview

JOURNAL OF SPACE SECURITY. 30 June 2026. 1-12
https://doi.org/10.23386/joss.2026.3.1.001

A comprehensive survey on AI-based infrared satellite image analysis: From radiative principles to multi-task detection

Dongik Lee¹

Bomin Kim¹

Hyeonji Choi¹

Sungho Kim¹^*

¹Department of Electronic Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea

^{*Corresponding Author}

License (open-access, https://creativecommons.org/licenses/by-nc/4.0/):

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

ABSTRACT

As national security threats diversify and the demand for persistent surveillance grows, infrared (IR) satellite imagery has become increasingly important because it enables day-and-night observation and provides thermal information complementary to visible imagery. The rapid advancement of deep learning has further shifted IR image analysis from conventional methods to data-driven intelligent detection. However, existing surveys are mostly limited to specific tasks, lacking a comprehensive review of the full AI-based IR image analysis pipeline. This paper presents an integrated survey spanning from the physical principles of IR radiation and preprocessing to AI-based detection and applications. Radiative transfer fundamentals and band-dependent sensing characteristics are first reviewed, followed by an examination of how physics-based atmospheric correction and deep learning-based restoration methods jointly enhance image reliability. For detection, we classify technologies into four tasks: object detection, infrared small target detection (IRSTD), change detection, and anomaly detection, and systematically compare key algorithms, benchmark datasets, and performance for each. Through this comprehensive analysis, we identify key open challenges and future directions for advancing satellite-based IR detection toward operational deployment.

Keywords

Infrared

Infrared search and track

Radiation

Target detection

Deep learning

MAIN

1. INTRODUCTION
2. THEORETICAL BACKGROUND
2.1. Radiation
2.2. Characteristics across infrared spectral bands
3. PREPROCESSING TECHNIQUES
3.1. Atmospheric Correction
3.2. Preprocessing Methods
4. AI-DRIVEN INFRARED IMAGE ANALYSIS
4.1. Object Detection
4.2. Infrared Small Target Detection (IRSTD)
4.3. Change Detection
4.4. Abnormal/Anomaly Detection
5. APPLICATIONS
6. CONCLUSION AND FUTURE WORKS

1. INTRODUCTION

Space-based surveillance systems provide persistent wide-area observation for defense early warning, disaster response, and environmental monitoring. Conventional electro-optical (EO) sensors, however, rely on reflected sunlight and are therefore limited to daytime operation under clear atmospheric conditions. Infrared (IR) sensors partially overcome these constraints by detecting self-emitted thermal radiation, enabling day-and-night observation and providing thermal information complementary to visible imagery. For example, ballistic missile plumes produce strong MWIR emission, while thermal events such as wildfires and volcanic activity exhibit distinct signatures in the MWIR and LWIR regions, which are largely undetectable by visible-light sensors.

Despite these advantages, IR satellite imagery poses inherent challenges: low spatial resolution often reduces targets to a few pixels, single-channel grayscale acquisition lacks texture and color cues, thermal contrast fluctuates with diurnal and seasonal cycles, and atmospheric absorption degrades the signal-to-noise ratio (SNR). Traditional filtering and threshold-based methods struggle to achieve simultaneously high detection probability and low false alarm rate under such conditions, relying heavily on manual parameter tuning.

The rapid advancement of deep learning has shifted the paradigm of IR image analysis from handcrafted features to data-driven intelligent detection. Significant improvements have been reported across object detection, infrared small target detection (IRSTD), change detection, and anomaly detection, with architectures increasingly incorporating global context modeling and multi-scale feature extraction. Yet existing surveys remain largely confined to individual tasks, and a comprehensive review covering the full pipeline from physical principles through preprocessing to multi-task AI-based detection is lacking. Although this survey focuses on infrared satellite imagery, we selectively discuss multispectral and hyperspectral remote-sensing studies only when they provide transferable methodological insights for IR preprocessing, change detection, or anomaly detection in areas where IR-specific literature remains limited.

This paper presents an integrated survey addressing this gap. The overall structure and organization of this paper are summarized in Fig. 1. Section 2 reviews the physical principles of IR radiation. Section 3 discusses atmospheric correction and deep learning-based preprocessing. Section 4 systematically compares AI-based detection methods across four tasks with key algorithms, benchmarks, and performance. Section 5 covers application domains, and Section 6 identifies open challenges and future directions.

https://cdn.apub.kr/journalsite/sites/JOSS/2026-003-01/N0670030101/images/Figure_joss_2026_31_1_F1.jpg

FIG. 1.

Systematic mind-mapping of infrared detection technologies: Physics, Preprocessing, and Multi-task AI.

2. THEORETICAL BACKGROUND

2.1. Radiation

All objects at temperatures above 0 K emit radiant energy in the form of electromagnetic waves due to the vibration and transitions between energy levels. The spectral radiance emitted by a black body at absolute temperature $T$ and wavelength $λ$ is described by Planck’s radiation law Eq. (1):

(1)

E_{b} (λ, T) = \frac{2 h c^{2}}{λ^{5}} \frac{1}{[\exp (\frac{h c}{λ k_{B} T}) - 1]}

where $E_{b} (λ, T)$ is the spectral radiance $[W ∙ m^{- 2} ∙ s r^{- 1} ∙ μ m^{- 1}]$ of the black body, $k_{B}$ is the Boltzmann constant (1.38 × 10^-23 J/K), $h$ the Planck constant (6.626 × 10^-34 J･s), and $c$ is the speed of light (2.998 × 10⁸ m/s).

Since most natural objects are not ideal black bodies, the emissivity ( $ε$ ) must be taken into account, defined as the ratio of the radiant energy emitted by an actual surface to that emitted by a black body at the same temperature and wavelength. Emissivity ranges from 0 to 1 depending on the material, surface condition, and wavelength, and the spectral radiance of a real object is expressed as $ε E_{b} (λ, T)$ . Meanwhile, the radiant energy emitted from the Earth’s surface must pass through the atmosphere before reaching a satellite sensor, during which absorption and re-emission by atmospheric constituents ( $H_{2} O, C O_{2}, e t c$ ) occur. Wavelength regions where atmospheric absorption is relatively low, allowing radiant energy to transmit effectively, are referred to as atmospheric windows. In the infrared spectrum, the major atmospheric windows are located at the SWIR (2.1~2.5 µm), MWIR (3~5 µm), LWIR (8~14 µm) bands. Since infrared remote sensing can only acquire valid surface information through these atmospheric windows, the selection of observation bands according to the detection objective is of critical importance.

Wien’s displacement law Eq. (2):

(2)

λ_{m a x} ∙ T = 2.898 \times 10^{- 3} m ∙ K

Accordingly, the radiation peak of the Earth’s surface (~300 K) is located at approximately 9.7 µm, making the 8-14 µm LWIR atmospheric window optimal for land surface temperature (LST) observation, whereas fires (~800 K) exhibit a peak near 3.6 µm, falling within the 3-5 µm MWIR atmospheric window [1]. These band-dependent observation characteristics provide the physical basis for the sensor and band selection in each detection task discussed in Section 4.

The top of atmosphere (TOA) radiance measured by a satellite sensor is a composite signal comprising surface emission, upwelling path radiance from the atmosphere itself, and reflected downwelling radiance from the surface, which is modeled by the radiative transfer equation (RTE). Consequently, atmospheric correction to remove atmospheric effects is essential for accurate extraction of surface thermal information, as discussed in Section 3.

2.2. Characteristics across infrared spectral bands

A comparative overview of radiative mechanisms, key image characteristics, and atmospheric influences across infrared spectral bands is provided in Table 1. This comparison highlights the fundamental differences in sensing principles and practical applications among the spectral bands.

TABLE 1.

Physical characteristics and application relevance across infrared spectral bands

Spectral Band	Wavelength Range (µm)	Radiative Mechanism	Key Image Characteristics	Atmospheric Effects
NIR	0.7-1.0	Reflected solar radiation	- High sensitivity to surface reflectance - Water bodies appear dark due to absorption	- Reduced scattering compared to visible - Water vapor absorption in specific sub-bands
SWIR	1.0-2.5	Reflected solar radiation	- Material discrimination - Improved haze penetration - Moisture sensitivity	- Lower scattering sensitivity than NIR - Pronounced water vapor absorption features
MWIR	3-5	Thermal emission	- High-temperature target sensitivity - Strong thermal contrast - Nighttime imaging capability	- Generally robust to scattering - Affected by water vapor absorption lines
LWIR	8-14	Thermal emission	- Ambient thermal contrast - Independent of illumination conditions - Suitable for long-term thermal observation	- Operates within a major atmospheric window - Residual absorption by ozone and water vapor

3. PREPROCESSING TECHNIQUES

Preprocessing is a key step for improving the reliability of satellite image analysis by mitigating atmospheric, structural, and resolution-related degradations. The preprocessing techniques reviewed in this survey are organized into two main categories: physics-based correction and deep learning-based preprocessing (Fig. 2). Physics-based correction mainly focuses on atmospheric and radiometric normalization, whereas deep learning-based preprocessing addresses data-driven restoration tasks such as cloud removal, dehazing, and super-resolution.

https://cdn.apub.kr/journalsite/sites/JOSS/2026-003-01/N0670030101/images/Figure_joss_2026_31_1_F2.jpg

FIG. 2.

Taxonomy of Image Preprocessing Methods.

3.1. Atmospheric Correction

Atmospheric correction is a crucial preprocessing step that retrieves surface reflectance by removing atmospheric effects (scattering, absorption, and path radiance) from TOA measurements. Surface reflectance represents the intrinsic spectral properties of the Earth’s surface and is essential for quantitative analyses such as land cover classification, vegetation indices, change detection, and multi-sensor fusion. Without it, atmospheric and geometric variations degrade data reliability and comparability.

Atmospheric correction is especially important in the reflective region (NIR-SWIR), where sensor signals include not only surface reflectance but also atmospheric effects such as molecular and aerosol scattering, downward solar scattering, and surface-atmosphere multiple reflections. As a result, image brightness cannot be directly interpreted as true surface reflectance. Therefore, atmospheric variations must be removed to obtain consistent and physically meaningful reflectance values. A typical approach is the RTM-based inverse estimation method, which retrieves surface reflectance from TOA signals by physically modeling radiative transfer along the sun-atmosphere-surface-sensor path [2].

This process incorporates sensor spectral response, solar-sensor geometry, atmospheric state variables, and effects such as molecular and aerosol scattering, gas absorption, and multiple scattering. The surface reflectance $ρ_{t}$ can be expressed as follows Eq. (3):

(3)

ρ_{t} = \frac{ρ^{*} - ρ_{a} - ρ_{e}}{T_{s} \cdot T_{ν} + s \cdot (ρ^{*} - ρ_{a} - ρ_{e})}

In this equation, $ρ^{*}$ denotes the Top-of-Atmosphere (TOA) reflectance measured by the satellite sensor. The terms $ρ_{a}$ and $ρ_{e}$ represent atmospheric path reflectance (from molecular and aerosol scattering) and adjacency reflectance from surrounding surfaces, respectively. $T_{s}$ and $T_{ν}$ indicate downward and upward atmospheric transmittance, while $s$ is the atmospheric spherical albedo accounting for multiple scattering. This formulation removes atmospheric and adjacency effects and compensates for transmittance losses to retrieve physically consistent surface reflectance.

In practice, RTM-based tools such as 6S improve computational efficiency using LUTs or approximations. Although physics-based atmospheric correction is widely adopted for its interpretability and robustness, limitations such as resolution constraints, spatiotemporal gaps, and noise remain. Therefore, deep learning-based preprocessing methods are increasingly studied to complement these limitations.

3.2. Preprocessing Methods

Recently, deep learning-based preprocessing methods have emerged to address residual degradations that remain after physics-based correction. Although RTM-based atmospheric correction ensures radiometric consistency, it does not fully resolve quality losses such as cloud occlusion, haze-induced contrast reduction, or limited spatial resolution. In particular, in the case of infrared (IR) satellite imagery, since it relies on temperature and radiation data derived from radiant energy, this degradation in image quality directly affects the accuracy of target detection and surface temperature estimation. Therefore, satellite preprocessing is increasingly treated as a data-driven restoration problem (Fig. 3).

https://cdn.apub.kr/journalsite/sites/JOSS/2026-003-01/N0670030101/images/Figure_joss_2026_31_1_F3.jpg

FIG. 3.

Representative visual examples of preprocessing tasks. Cloud removal and super-resolution are shown with infrared remote-sensing-related examples, while dehazing is included as a general restoration example to illustrate atmospheric degradation handling.

In the existing literature, deep learning-based preprocessing methods can be categorized into three major domains - cloud removal, dehazing, and super-resolution (SR). Although each domain addresses a distinct degradation mechanism, they share the common objective of enhancing structural restoration and improving the reliability of satellite image analysis.

Cloud removal aims to reconstruct surface information obscured by thin or thick clouds. Early approaches were primarily GAN-based, including unpaired translation models to reduce reliance on paired datasets [3,4]. Recent methods incorporate spatial consistency constraints and diffusion-based frameworks to improve stability and detail preservation [5,6].

In the case of infrared satellite imagery, clouds block or distort the radiant energy emitted from the Earth’s surface, significantly reducing the accuracy of surface temperature estimates and heat-based anomaly detection. Existing cloud removal techniques have been developed primarily based on multispectral or RGB imagery, and there are limitations that make them difficult to apply directly to IR imagery, which has single-channel characteristics. In particular, IR images often exhibit similar radiative characteristics between the ground and clouds, making it even more challenging to distinguish between the two regions. To address these issues, deep learning-based cloud removal techniques specialized for IR images have recently been proposed, with representative approaches utilizing structures that combine multi-scale feature fusion, Transformer-based global information extraction, and attention mechanisms. MRF-Net [7] combines CNNs and Transformers to simultaneously utilize local features and global contextual information, thereby effectively removing thin clouds and improving performance by maintaining consistency between blocks. Nevertheless, IR-based cloud removal still faces several technical limitations, including data scarcity, block boundary discontinuities during large-scale image processing, and issues with preserving radiative properties. Consequently, an integrated approach combining physics-based information with deep learning is emerging as a key research direction to address these challenges.

Dehazing methods address contrast attenuation and edge blurring caused by atmospheric scattering. Deep learning approaches either estimate intermediate physical representations or learn end-to-end mappings using multi-scale and attention mechanisms [8,9,10,11]. Although they outperform traditional model-driven methods in complex scenes, performance is sensitive to discrepancies between synthetic and real atmospheric conditions, particularly in multispectral or hyperspectral imagery.

Infrared (IR) imagery, due to its longer wavelength compared to visible light, exhibits relatively high penetration through fog and haze; however, image quality can degrade due to atmospheric absorption and scattering. Furthermore, since IR imagery often inherently features low contrast and limited texture information, fog removal requires a balance between preserving thermal signals and restoring structural information, going beyond simple visual restoration. Consequently, IR-specific dehazing techniques utilizing CNNs and state-space models have recently been proposed, and approaches that enhance restoration performance by leveraging complementary information through visible-infrared fusion are gaining attention [12,13]. However, these methods still face challenges such as the difficulty of obtaining aligned multimodal data, domain mismatches between real atmospheric conditions and synthetic data, and increased computational complexity. Consequently, there is a growing need for research into lightweight, physics-based fusion models suitable for IR imaging environments.

Super-resolution (SR) seeks to overcome sensor resolution limits and recover fine structural details. Modern SR models have evolved from CNN-based architectures to GAN-, Transformer-, and diffusion-assisted frameworks, emphasizing perceptual sharpness and spectral fidelity [14,15,16,17]. Despite significant advances, SR remains constrained by limited aligned training data, high computational cost, and the risk of hallucinated details that may affect quantitative analysis. In particular, due to the characteristics of their sensors and wavelengths, infrared images often exhibit lower spatial resolution, insufficient high-frequency information, and blurred edges compared to visible light images, making the reconstruction of fine structures and edges an even greater challenge during the super-resolution process [18]. Furthermore, since IR images consist of single-channel thermal radiation data without color information, applying existing visible-light-based SR models directly may result in degraded performance [19]. Consequently, recent studies have proposed lightweight CNN-based models that account for the characteristics of IR images, Transformer-based global feature learning, and architectures utilizing channel separation and attention mechanisms [20]. However, issues such as information loss during the high-resolution reconstruction process, computational cost, and the lack of generalization to real-world environments remain major limitations.

Deep learning-based preprocessing complements physics-based calibration by prioritizing structural restoration over radiometric normalization. In particular, for infrared (IR) satellite imagery, this preprocessing goes beyond simple image enhancement to directly contribute to the preservation of radiative energy-based information and improvements in target detection and temperature estimation accuracy. Nevertheless, common challenges such as robustness under severe image degradation, domain generalization, and operational efficiency persist, and these issues are further exacerbated in IR imagery due to its single-channel nature and limited data. Consequently, research directions are needed that include the construction of realistic datasets, multi-source conditioning, lightweight architecture design, and integration with physics-based models.

4. AI-DRIVEN INFRARED IMAGE ANALYSIS

Unlike optical imagery, infrared satellite imagery provides temperature-driven signatures with limited texture, lower spatial resolution, and time-varying thermal contrast, which makes AI-based analysis strongly task-dependent. In this survey, we organize AI-based infrared detection into four representative tasks (Fig. 4)—object detection, infrared small target detection (IRSTD), change detection, and anomaly detection—because they cover the major operational objectives of satellite IR analysis: instance-level localization, few-pixel target discovery, temporal scene comparison, and deviation-from-normality analysis. Accordingly, this section focuses on representative algorithms, benchmark datasets, and reported performance indicators for each task, while application-level use cases are discussed separately in Section 5.

https://cdn.apub.kr/journalsite/sites/JOSS/2026-003-01/N0670030101/images/Figure_joss_2026_31_1_F4.jpg

FIG. 4.

Categorization of infrared satellite detection tasks and representative applications.

4.1. Object Detection

Object detection in infrared satellite imagery aims at instance-level localization of semantically meaningful targets such as ships, aircraft, and active fires. Unlike infrared small target detection (IRSTD), which mainly focuses on point-like targets occupying only a few pixels, object detection addresses targets with identifiable categories and operational semantics. Because infrared sensing captures self-emitted thermal radiation, it supports day-and-night observation and can complement visible-light sensing under low-illumination conditions. Nevertheless, this task remains challenging due to low spatial resolution, weak texture cues, scene-dependent thermal contrast, and the presence of thermally bright background objects that can easily trigger false alarms.

4.1.1. Traditional Methods

Before the widespread adoption of deep learning, object detection in satellite thermal infrared imagery was dominated by thresholding- and context-based radiometric analysis. Representative examples include classical active-fire detection pipelines for MODIS, ASTER, and VIIRS, which exploit thermal anomalies and local background statistics to identify anomalously hot pixels while suppressing false alarms [21,22,23]. These methods are computationally efficient and physically interpretable, making them useful for hotspot screening and operational baseline systems. However, they are typically task-specific, rely on manually designed thresholds or heuristic tests, and are sensitive to heterogeneous backgrounds, atmospheric perturbations, and sensor-dependent radiometric variation. As a result, traditional approaches remain effective for thermal anomaly screening but are less suitable for generalized multi-class object detection in complex satellite infrared scenes.

4.1.2. Deep learning Methods

Recent studies have increasingly adopted deep learning-based object detectors to overcome the limitations of handcrafted pipelines. In the broader computer-vision literature, modern detection architectures have evolved from proposal-based two-stage frameworks such as Faster R-CNN [24], to one-stage detectors such as YOLO and RetinaNet [25,26], and more recently to transformer-based end-to-end models such as DETR and Deformable DETR [27,28]. These detector families introduced key design strategies—including multi-scale representation learning, class-imbalance handling, and global context modeling—that are also highly relevant to infrared satellite imagery. However, their direct transfer to satellite IR data is nontrivial because targets are often extremely small, texture-poor, and thermally unstable across sensors, bands, and imaging conditions.

In maritime monitoring, Li et al. [29] proposed an optimized YOLOv5s-based model with a Squeeze-and-Excitation attention mechanism to improve ship detection under complex thermal backgrounds. Related efforts such as TISD [30] further emphasized the importance of real spaceborne thermal infrared datasets for all-day ship detection. In aviation monitoring, Li et al. [31] introduced TIFAD.v1, the first global space-based thermal infrared aircraft dataset, and demonstrated the feasibility of detecting small aircraft targets from airframe and exhaust-plume thermal signatures. In disaster response, de Almeida Pereira et al. [32] constructed a large-scale Landsat-8 active-fire dataset and showed that deep learning-based detectors can substantially outperform conventional threshold-based products such as MODIS and VIIRS in wildfire hotspot detection. Collectively, these studies show that deep learning improves robustness to cluttered backgrounds and target diversity, but performance still depends heavily on satellite-specific training data and remains sensitive to cross-sensor domain shift.

4.2. Infrared Small Target Detection (IRSTD)

Small target detection in infrared search and track (IRST) systems is a critical technology that determines early warning and tracking performance. SPIE describes a small target as having a contrast ratio of less than 15%, an SNR of less than 1.5, and a target size of less than 0.15% of the whole image [33]. Such targets consist of only a few pixels, lacking shape and texture information, and are easily buried in background clutter.

4.2.1. Traditional Methods

Pre-deep learning IRSTD methods are categorized into three groups. First, filter-based methods such as HPF [34], Max-Mean/Max-Median filters [35], and Top-Hat transforms [36] suppress low-frequency background components to enhance high-frequency target signals. Second, local contrast-based methods such as LCM [37] and MPCM [38] compute brightness contrast between a central pixel and surrounding regions to extract target candidates. Third, low-rank/sparse decomposition-based methods such as the IPI model [39] employ RPCA to separate the background (low-rank component) from the target (sparse component). While these traditional methods can be applied without training data, they suffer from increased false alarm rates in complex backgrounds and rely heavily on manual parameter tuning.

4.2.2. Deep learning Methods

The introduction of deep learning has enabled IRSTD to overcome the limitations of handcrafted feature-based methods and achieve robust detection performance in complex backgrounds.

RISTD-Net [40] established the foundation for deep learning-based IRSTD research by effectively separating small targets from background clutter through multi-scale convolution kernels and an encoder-decoder architecture.

4.2.3. Single-frame Detection Methods

Subsequent research focused on improving single-frame small target detection performance. Notably, ALCNet [41] modularized local contrast measures as network layers and preserved small target features through bottom-up attention mechanisms, improving detection accuracy. However, early studies primarily addressed land or aerial platform environments and did not sufficiently consider the unique characteristics of satellite-based observation conditions.

MTU-Net [42] explicitly designed for satellite platform environments. By incorporating Vision Transformers, they enhanced representational capabilities for extremely small targets in satellite imagery and validated detection feasibility in maritime environments using the NUDT-SIRST-Sea dataset, which contains actual satellite images.

Table 2 indicates that the single-frame methods exhibit a clear trade-off between detection sensitivity and false-alarm suppression. ResU-Net achieves the lowest false alarm rate, but it’s detection probability (46.05) and IoU (60.18) remain substantially lower than those of MTU-Net. This suggests that ResU-Net behaves more conservatively, suppressing clutter effectively but missing many extremely small or dim ship targets. In contrast, MTU-Net achieves the highest detection probability (85.44) and IoU (64.14), indicating that multilevel ViT-CNN feature extraction is more effective for satellite infrared imagery, where long-range contextual cues are crucial for distinguishing true targets from maritime clutter and confusing bright structures.

TABLE 2.

Detection accuracy and background suppression performance of different methods on the NUDT-SIRST dataset. $P_{d}$ (×10⁻²) denotes the detection probability, $F_{a}$ (×10⁻⁶) denotes the false alarm rate, and IoU indicates the intersection-over-union metric

Model	$P_{d}$	$F_{a}$ ↓	$I o U$ ↓
ACM [43]	70.46	21.31	47.57
ALC-Net [41]	58.65	9.13	48.9
DNANet [44]	61.60	17.19	42.17
ResU-Net [45]	46.05	7.92	60.18
MTU-Net [42]	85.44	11.72	64.14

4.2.4. Multi-frame Detection Methods

To overcome single-frame limitations, MIRST (Multi-frame IRSTD) techniques leveraging temporal continuity have gained significant attention. RFR framework [46] learns spatio-temporal dependencies across frames to compensate for registration errors caused by satellite platform motion and released the IRSatVideo-LEO dataset containing approximately 90,000 frames.

Currently, public satellite-based IRSTD datasets remain limited to NUDT-SIRST-Sea and IRSatVideo-LEO, posing challenges for model generalization and real-world deployment. Addressing data scarcity through domain adaptation and acquiring diverse satellite observation data remain critical priorities for advancing satellite-based small target detection capabilities.

Table 3 further shows that temporal modeling improves satellite video-based IRSTD, but the gain is metric-dependent. ResUNet_RFR achieves the highest $P_{d}$ (91.58) and a competitive AUC (91.59), indicating that recurrent feature refinement effectively exploits long-term temporal dependency and compensates for satellite motion. However, STDMANet yields the lowest false alarm rate, implying stronger false-alarm ( $F_{a}$ ) suppression under the current setting. Therefore, the main strength of RFR is not that it is uniformly best on every metric, but that it provides stronger detection sensitivity and stable overall discriminability in satellite video conditions.

TABLE 3.

Detection performance comparison of different methods on the IRSatVideo-LEO dataset. Metrics are defined as in Table 1

Model	$P_{d}$	$F_{a}$ ↓	$A U C$ ↓
STDMANet [47]	89.96	4.10	90.13
DNANet_DTUM [44]	86.88	13.68	89.02
ResUNet_RFR [46]	91.58	18.58	91.59

4.3. Change Detection

This section reviews representative change detection methods in remote sensing, categorizing them into traditional model-based approaches and deep learning-based methods according to their underlying principles and methodological evolution.

4.3.1. Traditional Methods

Traditional change detection methods aim to identify changes by analyzing radiometric or spectral differences between multitemporal remote sensing images based on predefined mathematical and statistical models.

Representative approaches such as Image Differencing and Image Ratioing are among the most basic techniques, as they directly compare pixel intensity values between two acquisition times. Although these methods are simple to implement, they suffer from high sensitivity to illumination variations and noise. To alleviate these limitations, Change Vector Analysis (CVA) [48] was introduced by extending multiband information into a vector space, enabling simultaneous analysis of both the magnitude and direction of change, thereby allowing joint interpretation of change intensity and type. Principal Component Analysis (PCA) [49]-based methods emphasize change-related information by exploiting the correlation structure within images. However, they have inherent limitations in direct inter-temporal comparison. To address this issue, Multivariate Alteration Detection (MAD) [50] and Iteratively Reweighted MAD (IR-MAD) [51] utilize correlation structures to suppress no-change components, thereby performing radiometric normalization and change detection simultaneously. In addition, Markov Random Fields (MRF) [52] and Conditional Random Fields (CRF) [53] are employed as probabilistic graph-based post-processing techniques to incorporate spatial continuity and contextual information into change detection results. Nevertheless, these traditional methods remain sensitive to environmental variations and image misregistration, and they exhibit limited capability in modeling complex object-level changes or nonlinear patterns in high-resolution imagery. Consequently, recent studies have increasingly expanded toward using traditional methods as baselines and integrating them with machine learning and deep learning-based approaches.

4.3.2. Deep Learning Methods

Deep learning-based change detection methods have evolved toward end-to-end learning frameworks that take multitemporal remote sensing images as input and directly learn change patterns. Representative Siamese CNN-based approaches, such as FC-EF, FC-Siam-conc, FC-Siam-diff [54], adopt U-Net-based fully convolutional architectures to extract change information through feature concatenation or differencing between bi-temporal images. These architectures exhibit different levels of change sensitivity and robustness to misregistration errors depending on the input fusion strategy and feature differencing scheme. Subsequently, SNUNet-CD [55] enhanced the detection of subtle changes and boundary reconstruction by introducing dense skip connections and attention mechanisms. Meanwhile, STANet [56] and BiT [57] leverage attention mechanisms and transformer-based architectures to effectively incorporate global contextual information, enabling stable detection of changes across multiple spatial scales. In particular, these models are capable of simultaneously capturing large-scale changes and localized variations. More recently, approaches such as DDPM-CD [58] and SMDNet [59] have been proposed, which utilize diffusion models for feature extraction or boundary refinement. In addition, methods that integrate self-supervised pretraining based on SeCo [60] have been introduced to further improve generalization performance.

Furthermore, recent studies have shifted toward more practical frameworks to address limitations such as data scarcity and domain shift. Weakly supervised approaches, such as the Dual U-Net [61] based weak temporal supervision framework, enable change detection without explicit change labels, improving scalability and generalization. Meanwhile, the focus has expanded beyond conventional bi-temporal analysis to satellite image time series (SITS)-based techniques [62]. By incorporating temporal attention mechanisms, these models capture long-term dependencies across multiple timestamps. Additionally, DeepLabV3+ [63]-based models have demonstrated strong performance on high-resolution satellite imagery, particularly in identifying specific change types such as construction. Despite their improved performance, they remain vulnerable to spatial and temporal domain variations, which can significantly degrade detection accuracy. Additionally, recent advances in thermal infrared satellite image analysis have demonstrated the effectiveness of frequency-domain enhancement and transformer-based architectures for fine-grained feature extraction. For instance, the MFcontrail [64] framework integrates a MaxViT encoder, a frequency-aware fusion decoder, and an edge-aware loss function to achieve precise segmentation of thin and elongated structures in thermal infrared imagery, outperforming conventional segmentation networks in terms of IoU and F1-score. Quantitative results on the Landsat-8 dataset further demonstrate its effectiveness, where MFcontrail (full) achieves an IoU of 55.94% and an F1-score of 71.74%, surpassing established models such as PSPNet (36.30% IoU, 53.26% F1) and DeepLabV3+ (51.87% IoU, 68.31% F1). These developments reflect a transition toward more sophisticated approaches that integrate weak supervision, temporal reasoning, and advanced feature representation, while key challenges for real-world deployment still persist (Table 4).

TABLE 4.

Performance comparison of different models on Landsat8 datasets

Models	Precision [%]	IoU [%]	F1 [%]
PSPnet (ResNet50)	80.44	36.3	53.26
FPN (ResNet50)	75.45	51.05	67.59
Segformer (MiT-B5)	75.75	52.48	68.83
DeepLab V3+ (ResNet152d-SE)	77.16	51.87	68.31
MFcontrail (ResNet50)	75.28	54.38	70.45
MFcontrail (full)	76.48	55.94	71.74

4.4. Abnormal/Anomaly Detection

This section reviews anomaly detection methods in remote sensing and categorizes them into traditional methods based on background modeling and deep learning-based approaches that learn normality and deviation patterns in an end-to-end manner.

4.4.1. Traditional Methods

Traditional anomaly detection methods follow an approach in which background characteristics within an image are first modeled, and pixels that significantly deviate from the modeled background are identified as anomalies. Reed-Xiaoli (RX) [65]-based methods assume the background to follow a Gaussian distribution and detect anomalies using Mahalanobis distance-based detection statistics. These methods have been extended into several variants, including Global RX (GRX) based on global statistics, Local RX (LRX) based on local statistics, and Kernel RX (KRX), which incorporates nonlinear mappings. The Adaptive Coherence/Cosine Estimator (ACE) [66] whitens the data using the background covariance matrix and then computes the cosine similarity between the whitened data and a target spectrum, providing robustness to illumination variations while preserving the Constant False Alarm Rate (CFAR) property. Meanwhile, Low-Rank and Sparse Representation (LRASR) [67]-based methods decompose the background into a low-rank component and anomalies into sparse components, enabling effective separation of anomalous signals even in complex backgrounds. Although these traditional anomaly detection methods have relatively simple structures and high interpretability, they suffer from limitations in handling complex backgrounds and in terms of computational cost.

4.4.2. Deep Learning Methods

Deep learning-based anomaly detection methods have evolved toward learning the distribution or structural characteristics of normal data and identifying patterns that deviate from normality. Early approaches primarily relied on autoencoder-based reconstruction schemes. Memory-augmented autoencoders, such as MemAE [68], alleviated the issue of simultaneously reconstructing anomalous patterns by selectively reconstructing only normal representations. Subsequently, methods such as AE-IT [69] integrated low-rank and sparse decomposition concepts into autoencoder architectures, enabling more explicit separation of background and anomalous components. Meanwhile, DROCC [70] introduced a discriminative approach that directly learns the manifold boundary formed by normal data, demonstrating effective anomaly discrimination without relying on reconstruction errors. More recently, transformer-based models such as 3DTR [71] and SDENet [72] have been proposed, which leverage self-attention mechanisms to model global spatial-spectral context and stably detect subtle anomalies even in complex background environments. Furthermore, diffusion model-based approaches such as DBD [73] combine probabilistic background generation with low-rank representations, thereby bridging traditional statistical methods and deep learning-based models.

Beyond these general frameworks, recent studies have demonstrated the applicability of deep learning to thermal anomaly detection in real-world satellite infrared imagery. Unlike conventional remote sensing imagery, thermal infrared data presents unique challenges: subtle anomalies often exhibit temperature differences of less than 1°C above the background, while solar irradiance, seasonal variation, and atmospheric effects introduce confounding thermal signals that are difficult to disentangle from genuine anomalous activity. A supervised semantic segmentation approach based on the U-Net architecture has been employed on ASTER thermal infrared imagery [74], trained on approximately 1,500 labeled images from multiple volcanoes. By leveraging spatial pattern learning rather than pixel-wise intensity thresholds and incorporating Focal Loss to address severe class imbalance, this method achieved a Macro F1-score of approximately 0.93 and demonstrated strong generalization to previously unseen volcanic regions. To improve data efficiency, an alternative image-level classification framework utilizing transfer learning has been proposed [75], where a pre-trained SqueezeNet was fine-tuned and integrated within an ensemble classification scheme using only about 200 Sentinel-2 and Landsat 8 infrared images. This approach achieved an overall accuracy of 98.3%, but it does not provide pixel-wise spatial localization of anomalies. Together, these studies highlight the trade-off between spatial granularity and data efficiency in satellite thermal infrared anomaly detection and underscore the advantages of spatial feature learning over intensity-based approaches in complex infrared backgrounds.

5. APPLICATIONS

Infrared and multispectral and hyperspectral remote sensing-based detection technologies are being practically employed across a wide range of application domains, including military and security, disaster and environmental monitoring, and transportation and mobile object surveillance. In infrared imagery, there is a strong demand for early identification of small thermal anomalies or targets with low contrast against complex backgrounds. This capability plays a critical role in applications such as early wildfire detection, industrial facility overheating monitoring, and long-range surveillance and reconnaissance missions. In these operational environments, stable detection under complex backgrounds and low-contrast conditions is essential, making the simultaneous achievement of high detection performance and reliability a key challenge.

In the military and security domain, space-based infrared sensor systems for missile early warning and tracking represent a prominent application. Systems such as SBIRS, STSS, and HBTSS enable real-time detection of launch signatures associated with ballistic missiles and hypersonic threats and are integrated with interception systems to significantly enhance defensive capabilities. In this context, infrared image-based anomaly detection and temporal change analysis serve as core technologies for the early identification of launch events [76,77,78].

Infrared and spectral remote sensing-based automated detection techniques are also widely applied in disaster and environmental monitoring. Infrared imagery is particularly advantageous for long-term surveillance, as it enables observation of thermally distinctive phenomena such as wildfires, energy and industrial facility accidents, and marine thermal anomalies under both day and night conditions and across diverse weather environments. Applications such as wildfire monitoring using satellite thermal infrared imagery and the surveillance of thermal effluents from nuclear power plants have been reported as representative use cases demonstrating the practical feasibility of these technologies [79,80,81].

Furthermore, infrared and spectral remote sensing imagery has been utilized in applications such as nuclear facility monitoring, where it serves as an indirect means of estimating the operational status of reactors and reprocessing facilities. Recurrent thermal signatures and discharge patterns observed over specific periods can be analyzed to infer facility activity and operational conditions [82].

In the field of transportation and mobile object surveillance, aircraft and vessel detection using thermal infrared sensors constitutes a major application area. Even under limited spatial resolution, aircraft and ships often appear as small-scale thermal signatures spanning only a few pixels, and it has been demonstrated that deep learning-based detection models can reliably identify such targets. Notably, the ability of infrared imagery to support detection under day/night cycles and cloudy conditions allows it to effectively complement the limitations of conventional visible-spectrum surveillance systems [29,31].

Overall, infrared and spectral image-based detection technologies have demonstrated their effectiveness through real-world operational deployments across diverse application domains. Looking ahead, further advancements are expected toward onboard processing and real-time surveillance systems, driven by improvements in sensor resolution and the adoption of lightweight deep learning models.

6. CONCLUSION AND FUTURE WORKS

This paper presented an integrated survey of AI-based infrared (IR) satellite image analysis, covering the full pipeline from physical principles and atmospheric correction to deep learning-based preprocessing and multi-task detection. By reviewing radiative foundations, band-dependent sensing characteristics, and radiative transfer modeling, we clarified the physical basis of IR remote sensing. We then examined how physics-based correction and data-driven restoration jointly enhance image reliability. Across object detection, infrared small target detection, change detection, and anomaly detection, we observed a clear methodological evolution from CNN-based architectures to Transformer based models, reflecting the increasing need for global context modeling and computational efficiency.

Despite significant progress, challenges remain, including limited satellite-specific IR datasets, domain generalization issues, and the demand for lightweight onboard inference. Beyond improving detection accuracy, future research should focus on integrating IR detection results into large-scale AI-driven intelligence systems. Platforms such as Palantir Technologies exemplify how satellite-derived detection outputs can be fused with geospatial data, auxiliary intelligence sources, and predictive analytics to support real-time decision-making. Accordingly, the next stage of IR satellite intelligence lies in combining physically grounded sensing, robust AI detection, and system-level data integration to enable reliable, operational, and scalable surveillance capabilities.

Future research should focus on constructing standardized satellite-specific IR benchmarks, developing physics-aware deep learning models that incorporate radiative constraints, and enabling lightweight, real-time onboard inference. In addition, multi-modal fusion strategies integrating IR with other geospatial and spectral sources will be essential to improve robustness and operational reliability, facilitating the transition from algorithm-level advancements to deployable intelligence systems.

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. IRIS RS-2023-00219725).

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (IRIS RS-2023-00240109)

This research was supported by 2025 KASS Commissioned Research Project.

References

Z.L. Li, B.H. Tang, H. Wu, H. Ren, G. Yan, Z. Wan, I.F. Trigo, and J.A. Sobrino, Satellite-derived land surface temperature: Current status and perspectives. Remote Sensing of Environment. 131, 2013, pp. 14-37.

10.1016/j.rse.2012.12.008

E.F. Vermote, D. Tanré, J.L. Deuzé, M. Herman, and J.J. Morcrette, Second Simulation of the Satellite Signal in the Solar Spectrum, 6S: An Overview. IEEE Transactions on Geoscience and Remote Sensing. 35(3), 1997, pp. 675-686.

10.1109/36.581987

P. Singh and N. Komodakis, Cloud-GAN: Cloud Removal for Sentinel-2 Imagery Using a Cyclic Consistent Generative Adversarial Network. IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, 22-27 July, 2018, pp. 1772-1775.

10.1109/IGARSS.2018.8519033

K. Enomoto, K. Sakurada, W. Wang, H. Fukui, M. Matsuoka, R. Nakamura, and N. Kawaguchi, Filmy Cloud Removal on Satellite Imagery with Multispectral Conditional Generative Adversarial Nets. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), July, 2017, pp. 1533-1541.

10.1109/CVPRW.2017.197

M. Li, Q. Xu, J. Guo, and W. Li, DecloudNet: Cross-Patch Consistency is a Nontrivial Problem for Thin Cloud Removal from Wide-Swath Multispectral Images. IEEE Transactions on Geoscience and Remote Sensing. 62, 2024, 5407614.

10.1109/TGRS.2024.3427788

X. Zou, K. Li, J. Xing, Y. Zhang, S. Wang, L. Jin, and P. Tao, DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images. IEEE Transactions on Geoscience and Remote Sensing. 62, 2024, pp. 1-14.

10.1109/TGRS.2024.3365806

Q. Xu, J. Chen, X. Yan, and W. Li, MRF-Net: An Infrared Remote Sensing Image Thin Cloud Removal Method with the Intra-Inter Coherent Constraint. IEEE Transactions on Geoscience and Remote Sensing. 62, 2024, pp. 1-19.

10.1109/TGRS.2024.3474711

B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, DehazeNet: An End-to-End System for Single Image Haze Removal. IEEE Transactions on Image Processing. 25(11), 2016, pp. 5187-5198.

10.1109/TIP.2016.2598681

X. Liu, Y. Ma, Z. Shi, and J. Chen, GridDehazeNet: Attention-Based Multi-Scale Network for Image Dehazing. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), October, 2019, pp. 7313-7322.

10.1109/ICCV.2019.00741

S. Zhao, L. Zhang, Y. Shen, and Y. Zhou, RefineDNet: A Weakly Supervised Refinement Framework for Single Image Dehazing. IEEE Transactions on Image Processing. 30, 2021, pp. 3391-3404.

10.1109/TIP.2021.3060873

B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, AOD-Net: All-in-One Dehazing Network. 2017 IEEE International Conference on Computer Vision (ICCV), October, 2017, pp. 4780-4788.

10.1109/ICCV.2017.511

S.A. Hovhannisyan, Mamba-based Thermal Image Dehazing. Mathematical Problems of Computer Science. 62, 2024, pp. 126-144.

10.51408/1963-0126

M. Yu, T. Cui, H. Lu, and Y. Yue, VIFNet: An End-to-End Visible-Infrared Fusion Network for Image Dehazing. Neurocomputing. 599, 2024, 128105.

10.1016/j.neucom.2024.128105

K. He, Y. Cai, S. Peng, and M. Tan, A Diffusion Model-Assisted Multiscale Spectral Attention Network for Hyperspectral Image Super-Resolution. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 17, 2024, pp. 8612-8625. DOI: 10.1109/JSTARS.2024.3386702.

10.1109/JSTARS.2024.3386702

Y. Tang, J. Li, L. Yue, X. Liu, Y. Li, Y. Xiao, and Q. Yuan, A CNN-Transformer Embedded Unfolding Network for Hyperspectral Image Super-Resolution. IEEE Transactions on Geoscience and Remote Sensing. 62, 2024, pp. 1-16.

10.1109/TGRS.2024.3431924

C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July, 2017, pp. 4681-4690.

10.1109/CVPR.2017.19

A. Sharifi and M.M. Safari, Enhancing the Spatial Resolution of Sentinel-2 Images Through Super-Resolution Using Transformer-Based Deep-Learning Models. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 18, 2025, pp. 1-17.

10.1109/JSTARS.2025.3526260

L. Hu, L. Hu, and M. Chen, Edge-Enhanced Infrared Image Super-Resolution Reconstruction Model under Transformer. Scientific Reports. 14, 2024) 15585.

10.1038/s41598-024-66302-838971844PMC11227526

X. Ai and W. Yang, Super-Resolution Reconstruction of Infrared Images Based on Convolutional Neural Network. 2024 2nd International Conference on Computer Network Technology and Electronic and Information Engineering (CNTEIE), 2024.

10.1109/CNTEIE66268.2024.00024

S. Liu, K. Yan, F. Qin, C. Wang, R. Ge, K. Zhang, J. Huang, Y. Peng, and J. Cao, Infrared Image Super-Resolution via Lightweight Information Split Network. Advanced Intelligent Computing Technology and Applications (ICIC 2024). 14869, 2024, pp. 293-304.

10.1007/978-981-97-5603-2_24

L. Giglio, J. Descloitres, C.O. Justice, and Y. J. Kaufman, An enhanced contextual fire detection algorithm for MODIS. Remote Sensing of Environment. 87(2-3), 2003, pp. 273-282.

10.1016/S0034-4257(03)00184-6

L. Giglio, I. Csiszar, Á. Restás, J.T. Morisette, W. Schroeder, D. Morton, and C.O. Justice, Active fire detection and characterization with the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER). Remote Sensing of Environment. 112(6), 2008, pp. 3055-3063.

10.1016/j.rse.2008.03.003

W. Schroeder, P. Oliva, L. Giglio, and I.A. Csiszar, The new VIIRS 375 m active fire detection data product: Algorithm description and initial assessment. Remote Sensing of Environment. 143, 2014, pp. 85-96.

10.1016/j.rse.2013.12.008

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Advances in Neural Information Processing Systems 28 (NIPS 2015), 7-12 December, 2015.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June, 2016, pp. 779-788.

10.1109/CVPR.2016.91

T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, Focal Loss for Dense Object Detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 22-29 October, 2017, pp. 2980-2988.

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, End-to-End Object Detection with Transformers. European Conference on Computer Vision (ECCV), 23-28 August, 2020, pp. 213-229.

10.1007/978-3-030-58452-8_13

X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, Deformable DETR: Deformable Transformers for End-to-End Object Detection. International Conference on Learning Representations (ICLR), 3-7 May, 2021.

L. Li, L. Jiang, J. Zhang, S. Wang, and F. Chen, A Complete YOLO-Based Ship Detection Method for Thermal Infrared Remote Sensing Images under Complex Backgrounds. Remote Sensing. 14(7), 2022, pp. 1534.

10.3390/rs14071534

L. Li, X. Zhou, W. Zhang, Y. Zhong, L. Gao, J. Yu, X. Li, and F. Chen, Thermal sentinel: Low-earth orbit infrared intelligent system for flying civil aircraft safety. Remote Sensing of Environment. 328, 2025, p. 114826.

10.1016/j.rse.2025.114826

G.H. de Almeida Pereira, A.M. Fusioka, B.T. Nassu, and R. Minetto, Active fire detection in Landsat-8 imagery: A large-scale dataset and a deep-learning study. ISPRS Journal of Photogrammetry and Remote Sensing. 178, 2021, pp. 171-186.

10.1016/j.isprsjprs.2021.06.002

P.B. Chapple, D.C. Bertilone, R.S. Caprari, S. Angeli, and G.N. Newsam, Target detection in infrared and SAR terrain images using a non-gaussian stochastic model. Proc. Targets Backgrounds, Characterization Representation, International Society for Optics and Photonics (SPIE), 5-7 April, 1999, pp. 122-132.

10.1117/12.352951

J. Peng and W. Zhou, Infrared background suppression for segmenting and detecting small target. Acta Electronica Sinica. 27(12), 1999, pp. 47-52.

S.D. Deshpande, M.H. Er, R. Venkateswarlu, and P. Chan, Max-mean and max-median filters for detection of small targets. Proc. SPIE 3809, Signal and Data Processing of Small Targets 1999, 4 October, 1999.

10.1117/12.364049

X. Bai and F. Zhou, Analysis of new top-hat transformation and the application for infrared dim small target detection. Pattern Recognition. 43(6), 2010, pp. 2145-2156.

10.1016/j.patcog.2009.12.023

W. Wang, Z. Li, and A. Siddique, Infrared maritime smalltarget detection based on fusion gray gradient clutter suppression. Remote Sensing. 16(7), 2024, pp. 1255.

10.3390/rs16071255

Y. Wei, X. You, and H. Li, Multiscale patch-based contrast measure for small infrared target detection. Pattern Recognition. 58, 2016, pp. 216-226.

10.1016/j.patcog.2016.04.002

J. Han, Y. Ma, B. Zhou, Y. Fan, and K. Liang, Infrared patch-image model for small target detection in a single image. IEEE Transactions on Image Processing. 22(12), 2014, pp. 4996-5009.

10.1109/TIP.2013.2281420

Q. Hou, Z. Wang, F. Tan, Y. Zhao, H. Zheng, and W. Zhang, RISTDnet: robust IR small target detection network. IEEE Geoscience and Remote Sensing Letters. 19, 2022, pp. 1-5.

10.1109/LGRS.2021.3050828

Y. Dai, Y. Wu, F. Zhou, and K. Barnard, Attentional local contrast networks for infrared small target detection. IEEE transactions on geoscience and remote sensing. 59(11), 2021, pp. 9813-9824.

10.1109/TGRS.2020.3044958

T. Wu, B. Li, Y. Luo, Y. Wang, C. Xiao, T. Liu, J. Yang, W. An, and Y. Guo, Mtu-net: Multilevel transunet for space-based infrared tiny ship detection. IEEE Transactions on Geoscience and Remote Sensing. 61, 2023, pp. 1-15.

10.1109/TGRS.2023.3235002

Y. Dai, Y. Wu, F. Zhou, and K. Barnard, Asymmetric contextual modulation for infrared small target detection. IEEE Winter Conference on Applications of Computer Vision (WACV 2021), January, 2021, pp. 949-958.

10.1109/WACV48630.2021.00099

B. Li, C. Xiao, L. Wang, Y. Wang, Z. Lin, M. Li, W. An, and Y. Guo, Dense nested attention network for infrared small target detection. IEEE Transactions on Image Processing. 32, 2022, pp. 1745-1758.

10.1109/TIP.2022.3199107

F.I. Diakogiannis, F. Waldner, P. Caccetta, and C. Wu, ResUNet—A: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS Journal of Photogrammetry and Remote Sensing 162, 2020, pp. 94-114.

10.1016/j.isprsjprs.2020.01.013

X. Ying, L. Liu, Z. Lin, Y. Shi, Y. Wang, R. Li, X. Cao, B. Li, S. Zhou, and W. An, Infrared small target detection in satellite videos: A new dataset and a novel recurrent feature refinement framework. IEEE Transactions on Geoscience and Remote Sensing. 63, 2025, pp. 5002818.

10.1109/TGRS.2025.3542368

P. Yan, R. Hou, X. Duan, C. Yue, X. Wang, and X. Cao, Stdmanet: Spatio-temporal differential multiscale attention network for small moving infrared target detection. IEEE Transactions on Geoscience and Remote Sensing. 61, 2023, pp. 1-16

10.1109/TGRS.2023.3241311

E.F. Lambin and A.H. Strahler, Change-vector analysis in multitemporal space: A tool to detect and categorize land-cover change processes using high temporal-resolution satellite data. Remote Sensing of Environment. 48(2), 1994, pp. 231-244.

10.1016/0034-4257(94)90144-9

H. Hotelling, Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology. 24, 1933, pp. 417-441.

10.1037/h0071325

A.A. Nielsen and K. Conradsen, Multivariate alteration detection (MAD) in multispectral, bi-temporal image data: a new approach to change detection studies. Technical University of Denmark, Lyngby, 1997.

A.A. Nielsen, The regularized iteratively reweighted MAD method for change detection in multi- and hyperspectral data. IEEE Transactions on Image Processing. 16, 2007, pp. 463-478.

10.1109/TIP.2006.888195

L. Bruzzone and D.F. Prieto, Automatic analysis of the difference image for unsupervised change detection. IEEE Transactions on Geoscience and Remote Sensing. 38(3), 2000, pp. 1171-1182.

10.1109/36.843009

J. Lafferty, A. McCallum, and F. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data. 18th International Conference on Machine Learning (ICML 2001), 28 June-1 July, 2001.

R.C. Daudt, B. Le Saux, and A. Boulch, Fully convolutional Siamese networks for change detection. IEEE International Conference on Image Processing (ICIP), 7-10 October, 2018.

S. Fang, K. Li, J. Shao, and Z. Li, SNUNet-CD: A densely connected Siamese network for change detection of VHR images. IEEE Geoscience and Remote Sensing Letters. 19, 2022, pp. 1-5. DOI: 10.1109/LGRS.2021.3056416.

10.1109/LGRS.2021.3056416

H. Chen, Z. Qi, and Z. Shi, A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sensing. 12(10), 2020, pp. 1662.

10.3390/rs12101662

H. Chen, Z. Qi, and Z. Shi, Remote sensing image change detection with transformers. IEEE Transactions on Geoscience and Remote Sensing. 60, 2021, pp. 1-14.

10.1109/TGRS.2021.3095166

W.G.C. Bandara, N.G. Nair, and V.M. Patel, DDPM-CD: Denoising diffusion probabilistic models as feature extractors for remote sensing change detection. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 3-8 January, 2024.

J. Jia, G. Lee, Z. Wang, L. Zhi, and Y. He, Siamese meets diffusion network: SMDNet for enhanced change detection in high-resolution RS imagery. arXiv preprint, 2024.

10.1109/JSTARS.2024.3384545

O. Mañas, A. Lacoste, X. Giró-i-Nieto, D. Vazquez, and P. Rodriguez, Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data. IEEE/CVF International Conference on Computer Vision (ICCV), 11-17 October, 2021, pp. 9414-9423.

10.1109/ICCV48922.2021.00928

X. Bou, E. Vincent, G. Facciolo, R. Grompone von Gioi, J.M. Morel, and T. Ehret, Remote sensing change detection via weak temporal supervision. arXiv preprint, 2026.

E. Vincent, J. Ponce, and M. Aubry, Satellite image time series semantic change detection: Novel architecture and analysis of domain shift. arXiv preprint, 2024.

C.W. Song and W. Wahyu, Urban change detection for high-resolution satellite images using DeepLabV3+, Proceedings of the KSRS Spring Meeting, 2021, pp. 441-442.

S. Shi, J. Wu, K. Yao, and Q. Meng, Deep learning-based contrail segmentation in thermal infrared satellite cloud images via frequency-domain enhancement. Remote Sensing. 17(18), 2025, pp. 3145. DOI: 10.3390/rs17183145.

10.3390/rs17183145

I.S. Reed and X. Yu, Adaptive multiple-band CFAR detection of an optical pattern with unknown spectral distribution. IEEE Transactions on Acoustics, Speech, and Signal Processing. 38(10), 1990, pp. 1760-1768.

10.1109/29.60107

L.L. Scharf and L.T. McWhorter, Adaptive matched subspace detectors and adaptive coherence estimators. IEEE Transactions on Signal Processing. 45(5), 1997, pp. 1114-1127.

10.1109/ACSSC.1996.599116

P. Xiang, J. Song, H. Li, L. Gu, and H. Zhou, Hyperspectral anomaly detection with harmonic analysis and low-rank decomposition. Remote Sensing. 11(24), 2019, pp. 3028.

10.3390/rs11243028

D. Gong, L. Liu, V. Le, B. Saha, M.R. Mansour, S. Venkatesh, and A. van den Hengel, Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. IEEE/CVF International Conference on Computer Vision (ICCV), 27 October-2 November, 2019, pp. 1705-1714.

10.1109/ICCV.2019.00179

S. Chen, X. Li, and Y. Yan, Hyperspectral anomaly detection with auto-encoder and independent target. Remote Sensing. 15(21), 2023, pp. 5266.

10.3390/rs15225266

S. Goyal, A. Raghunathan, M. Jain, H.V. Simhadri, and P. Jain, DROCC: Deep robust one-class classification. 37th International Conference on Machine Learning (ICML), 13-18 July, 2020.

Z. Wu and B. Wang, Background reconstruction via 3D-transformer network for hyperspectral anomaly detection. Remote Sensing. 15(18), 2023, pp. 4592.

10.3390/rs15184592

S. Liu, H. Guo, S. Gao, and W. Zhang, The spectrum difference enhanced network for hyperspectral anomaly detection. Remote Sensing. 16(23), 2024, pp. 4518.

10.3390/rs16234518

Y. Wu, Y. Meng, and L. Sun, Diffusing background dictionary for hyperspectral anomaly detection. Asian Conference on Computer Vision (ACCV), 9-13 December, 2024.

10.1007/978-981-96-0917-8_3

C. Corradino, M.S. Ramsey, S. Pailot-Bonnétat, A.J.L. Harris, and C. Del Negro, Detection of subtle thermal anomalies: Deep learning applied to the ASTER global volcano dataset. IEEE Transactions on Geoscience and Remote Sensing. 61, 2023, pp. 1-15.

10.1109/TGRS.2023.3241085

E. Amato, C. Corradino, F. Torrisi, and C. Del Negro, A deep convolutional neural network for detecting volcanic thermal anomalies from satellite images. Remote Sensing. 15(15), 2023, pp. 3718.

10.3390/rs15153718

C.J. Waters, Space warfighter heritage: The first space-based infrared system geosynchronous element launched into space. Space Operations Command News, 2025.

H. Roe, Aegis missile defense system intercepts target in test. U.S. Pacific Fleet Press Release, 2013.

N. Jones-Bonbrest, MDA and Navy accomplish next step in hypersonic missile defense. Missile Defense Agency News (DVIDS), 2025.

K.H. Lee, G.C. Kim, D. Lee, J.S. Ha, and H.S. Kim, Simultaneous Monitoring of Wildfire and Smoke Using Infrared Channel Observations from a Geostationary Satellite. Journal of the Korean Society for Atmospheric Environment. 40(3), 2024, pp. 337-348.

10.5572/KOSAE.2024.40.3.337

NOAA, NOAA unveils powerful convergence of AI and science with revolutionary next-generation fire system technology [Online], 2025. Available at: https://www.noaa.gov/news-release/noaa-unveils-powerful-convergence-of-ai-and-science-with-revolutionary-next-generation-fire-system [Accessed 27/03/2026].

X. Wang, X. Su, L. Wang, X. Wang, Q. Meng, and J. Xu, Quantifying thermal discharges from nuclear power plants: A remote sensing analysis of environmental function zones. Applied Sciences. 15(2), 2025, pp. 738. DOI: 10.3390/app15020738.

10.3390/app15020738

Beyond Parallel, Enhancing understanding of Yongbyon through thermal imagery (Part 1) [Online], n.d. Available at: https://beyondparallel.csis.org [Accessed 27/03/2026].

JOURNAL OF SPACE SECURITY ISSN:3058-5759(Print) 한국우주안보학회지

Preview

A comprehensive survey on AI-based infrared satellite image analysis: From radiative principles to multi-task detection

ABSTRACT

MAIN

FIG. 1.

Systematic mind-mapping of infrared detection technologies: Physics, Preprocessing, and Multi-task AI.

(1)

(2)

TABLE 1.

Physical characteristics and application relevance across infrared spectral bands

FIG. 2.

Taxonomy of Image Preprocessing Methods.

(3)

FIG. 3.

Representative visual examples of preprocessing tasks. Cloud removal and super-resolution are shown with infrared remote-sensing-related examples, while dehazing is included as a general restoration example to illustrate atmospheric degradation handling.

FIG. 4.

Categorization of infrared satellite detection tasks and representative applications.

TABLE 2.

Detection accuracy and background suppression performance of different methods on the NUDT-SIRST dataset. Pd (×10⁻²) denotes the detection probability, Fa (×10⁻⁶) denotes the false alarm rate, and IoU indicates the intersection-over-union metric

TABLE 3.

Detection performance comparison of different methods on the IRSatVideo-LEO dataset. Metrics are defined as in Table 1

TABLE 4.

Performance comparison of different models on Landsat8 datasets

Acknowledgements

References

Detection accuracy and background suppression performance of different methods on the NUDT-SIRST dataset. $P_{d}$ (×10⁻²) denotes the detection probability, $F_{a}$ (×10⁻⁶) denotes the false alarm rate, and IoU indicates the intersection-over-union metric