1. INTRODUCTION
2. THEORETICAL BACKGROUND
2.1. Radiation
2.2. Characteristics across infrared spectral bands
3. PREPROCESSING TECHNIQUES
3.1. Atmospheric Correction
3.2. Preprocessing Methods
4. AI-DRIVEN INFRARED IMAGE ANALYSIS
4.1. Object Detection
4.2. Infrared Small Target Detection (IRSTD)
4.3. Change Detection
4.4. Abnormal/Anomaly Detection
5. APPLICATIONS
6. CONCLUSION AND FUTURE WORKS
1. INTRODUCTION
Space-based surveillance systems provide persistent wide-area observation for defense early warning, disaster response, and environmental monitoring. Conventional electro-optical (EO) sensors, however, rely on reflected sunlight and are therefore limited to daytime operation under clear atmospheric conditions. Infrared (IR) sensors partially overcome these constraints by detecting self-emitted thermal radiation, enabling day-and-night observation and providing thermal information complementary to visible imagery. For example, ballistic missile plumes produce strong MWIR emission, while thermal events such as wildfires and volcanic activity exhibit distinct signatures in the MWIR and LWIR regions, which are largely undetectable by visible-light sensors.
Despite these advantages, IR satellite imagery poses inherent challenges: low spatial resolution often reduces targets to a few pixels, single-channel grayscale acquisition lacks texture and color cues, thermal contrast fluctuates with diurnal and seasonal cycles, and atmospheric absorption degrades the signal-to-noise ratio (SNR). Traditional filtering and threshold-based methods struggle to achieve simultaneously high detection probability and low false alarm rate under such conditions, relying heavily on manual parameter tuning.
The rapid advancement of deep learning has shifted the paradigm of IR image analysis from handcrafted features to data-driven intelligent detection. Significant improvements have been reported across object detection, infrared small target detection (IRSTD), change detection, and anomaly detection, with architectures increasingly incorporating global context modeling and multi-scale feature extraction. Yet existing surveys remain largely confined to individual tasks, and a comprehensive review covering the full pipeline from physical principles through preprocessing to multi-task AI-based detection is lacking. Although this survey focuses on infrared satellite imagery, we selectively discuss multispectral and hyperspectral remote-sensing studies only when they provide transferable methodological insights for IR preprocessing, change detection, or anomaly detection in areas where IR-specific literature remains limited.
This paper presents an integrated survey addressing this gap. The overall structure and organization of this paper are summarized in Fig. 1. Section 2 reviews the physical principles of IR radiation. Section 3 discusses atmospheric correction and deep learning-based preprocessing. Section 4 systematically compares AI-based detection methods across four tasks with key algorithms, benchmarks, and performance. Section 5 covers application domains, and Section 6 identifies open challenges and future directions.
2. THEORETICAL BACKGROUND
2.1. Radiation
All objects at temperatures above 0 K emit radiant energy in the form of electromagnetic waves due to the vibration and transitions between energy levels. The spectral radiance emitted by a black body at absolute temperature and wavelength is described by Planck’s radiation law Eq. (1):
where is the spectral radiance of the black body, is the Boltzmann constant (1.38 × 10-23 J/K), the Planck constant (6.626 × 10-34 J・s), and is the speed of light (2.998 × 108 m/s).
Since most natural objects are not ideal black bodies, the emissivity () must be taken into account, defined as the ratio of the radiant energy emitted by an actual surface to that emitted by a black body at the same temperature and wavelength. Emissivity ranges from 0 to 1 depending on the material, surface condition, and wavelength, and the spectral radiance of a real object is expressed as . Meanwhile, the radiant energy emitted from the Earth’s surface must pass through the atmosphere before reaching a satellite sensor, during which absorption and re-emission by atmospheric constituents () occur. Wavelength regions where atmospheric absorption is relatively low, allowing radiant energy to transmit effectively, are referred to as atmospheric windows. In the infrared spectrum, the major atmospheric windows are located at the SWIR (2.1~2.5 µm), MWIR (3~5 µm), LWIR (8~14 µm) bands. Since infrared remote sensing can only acquire valid surface information through these atmospheric windows, the selection of observation bands according to the detection objective is of critical importance.
Wien’s displacement law Eq. (2):
Accordingly, the radiation peak of the Earth’s surface (~300 K) is located at approximately 9.7 µm, making the 8-14 µm LWIR atmospheric window optimal for land surface temperature (LST) observation, whereas fires (~800 K) exhibit a peak near 3.6 µm, falling within the 3-5 µm MWIR atmospheric window [1]. These band-dependent observation characteristics provide the physical basis for the sensor and band selection in each detection task discussed in Section 4.
The top of atmosphere (TOA) radiance measured by a satellite sensor is a composite signal comprising surface emission, upwelling path radiance from the atmosphere itself, and reflected downwelling radiance from the surface, which is modeled by the radiative transfer equation (RTE). Consequently, atmospheric correction to remove atmospheric effects is essential for accurate extraction of surface thermal information, as discussed in Section 3.
2.2. Characteristics across infrared spectral bands
A comparative overview of radiative mechanisms, key image characteristics, and atmospheric influences across infrared spectral bands is provided in Table 1. This comparison highlights the fundamental differences in sensing principles and practical applications among the spectral bands.
TABLE 1.
Physical characteristics and application relevance across infrared spectral bands
3. PREPROCESSING TECHNIQUES
Preprocessing is a key step for improving the reliability of satellite image analysis by mitigating atmospheric, structural, and resolution-related degradations. The preprocessing techniques reviewed in this survey are organized into two main categories: physics-based correction and deep learning-based preprocessing (Fig. 2). Physics-based correction mainly focuses on atmospheric and radiometric normalization, whereas deep learning-based preprocessing addresses data-driven restoration tasks such as cloud removal, dehazing, and super-resolution.
3.1. Atmospheric Correction
Atmospheric correction is a crucial preprocessing step that retrieves surface reflectance by removing atmospheric effects (scattering, absorption, and path radiance) from TOA measurements. Surface reflectance represents the intrinsic spectral properties of the Earth’s surface and is essential for quantitative analyses such as land cover classification, vegetation indices, change detection, and multi-sensor fusion. Without it, atmospheric and geometric variations degrade data reliability and comparability.
Atmospheric correction is especially important in the reflective region (NIR-SWIR), where sensor signals include not only surface reflectance but also atmospheric effects such as molecular and aerosol scattering, downward solar scattering, and surface-atmosphere multiple reflections. As a result, image brightness cannot be directly interpreted as true surface reflectance. Therefore, atmospheric variations must be removed to obtain consistent and physically meaningful reflectance values. A typical approach is the RTM-based inverse estimation method, which retrieves surface reflectance from TOA signals by physically modeling radiative transfer along the sun-atmosphere-surface-sensor path [2].
This process incorporates sensor spectral response, solar-sensor geometry, atmospheric state variables, and effects such as molecular and aerosol scattering, gas absorption, and multiple scattering. The surface reflectance can be expressed as follows Eq. (3):
In this equation, denotes the Top-of-Atmosphere (TOA) reflectance measured by the satellite sensor. The terms and represent atmospheric path reflectance (from molecular and aerosol scattering) and adjacency reflectance from surrounding surfaces, respectively. and indicate downward and upward atmospheric transmittance, while is the atmospheric spherical albedo accounting for multiple scattering. This formulation removes atmospheric and adjacency effects and compensates for transmittance losses to retrieve physically consistent surface reflectance.
In practice, RTM-based tools such as 6S improve computational efficiency using LUTs or approximations. Although physics-based atmospheric correction is widely adopted for its interpretability and robustness, limitations such as resolution constraints, spatiotemporal gaps, and noise remain. Therefore, deep learning-based preprocessing methods are increasingly studied to complement these limitations.
3.2. Preprocessing Methods
Recently, deep learning-based preprocessing methods have emerged to address residual degradations that remain after physics-based correction. Although RTM-based atmospheric correction ensures radiometric consistency, it does not fully resolve quality losses such as cloud occlusion, haze-induced contrast reduction, or limited spatial resolution. In particular, in the case of infrared (IR) satellite imagery, since it relies on temperature and radiation data derived from radiant energy, this degradation in image quality directly affects the accuracy of target detection and surface temperature estimation. Therefore, satellite preprocessing is increasingly treated as a data-driven restoration problem (Fig. 3).
In the existing literature, deep learning-based preprocessing methods can be categorized into three major domains - cloud removal, dehazing, and super-resolution (SR). Although each domain addresses a distinct degradation mechanism, they share the common objective of enhancing structural restoration and improving the reliability of satellite image analysis.
Cloud removal aims to reconstruct surface information obscured by thin or thick clouds. Early approaches were primarily GAN-based, including unpaired translation models to reduce reliance on paired datasets [3,4]. Recent methods incorporate spatial consistency constraints and diffusion-based frameworks to improve stability and detail preservation [5,6].
In the case of infrared satellite imagery, clouds block or distort the radiant energy emitted from the Earth’s surface, significantly reducing the accuracy of surface temperature estimates and heat-based anomaly detection. Existing cloud removal techniques have been developed primarily based on multispectral or RGB imagery, and there are limitations that make them difficult to apply directly to IR imagery, which has single-channel characteristics. In particular, IR images often exhibit similar radiative characteristics between the ground and clouds, making it even more challenging to distinguish between the two regions. To address these issues, deep learning-based cloud removal techniques specialized for IR images have recently been proposed, with representative approaches utilizing structures that combine multi-scale feature fusion, Transformer-based global information extraction, and attention mechanisms. MRF-Net [7] combines CNNs and Transformers to simultaneously utilize local features and global contextual information, thereby effectively removing thin clouds and improving performance by maintaining consistency between blocks. Nevertheless, IR-based cloud removal still faces several technical limitations, including data scarcity, block boundary discontinuities during large-scale image processing, and issues with preserving radiative properties. Consequently, an integrated approach combining physics-based information with deep learning is emerging as a key research direction to address these challenges.
Dehazing methods address contrast attenuation and edge blurring caused by atmospheric scattering. Deep learning approaches either estimate intermediate physical representations or learn end-to-end mappings using multi-scale and attention mechanisms [8,9,10,11]. Although they outperform traditional model-driven methods in complex scenes, performance is sensitive to discrepancies between synthetic and real atmospheric conditions, particularly in multispectral or hyperspectral imagery.
Infrared (IR) imagery, due to its longer wavelength compared to visible light, exhibits relatively high penetration through fog and haze; however, image quality can degrade due to atmospheric absorption and scattering. Furthermore, since IR imagery often inherently features low contrast and limited texture information, fog removal requires a balance between preserving thermal signals and restoring structural information, going beyond simple visual restoration. Consequently, IR-specific dehazing techniques utilizing CNNs and state-space models have recently been proposed, and approaches that enhance restoration performance by leveraging complementary information through visible-infrared fusion are gaining attention [12,13]. However, these methods still face challenges such as the difficulty of obtaining aligned multimodal data, domain mismatches between real atmospheric conditions and synthetic data, and increased computational complexity. Consequently, there is a growing need for research into lightweight, physics-based fusion models suitable for IR imaging environments.
Super-resolution (SR) seeks to overcome sensor resolution limits and recover fine structural details. Modern SR models have evolved from CNN-based architectures to GAN-, Transformer-, and diffusion-assisted frameworks, emphasizing perceptual sharpness and spectral fidelity [14,15,16,17]. Despite significant advances, SR remains constrained by limited aligned training data, high computational cost, and the risk of hallucinated details that may affect quantitative analysis. In particular, due to the characteristics of their sensors and wavelengths, infrared images often exhibit lower spatial resolution, insufficient high-frequency information, and blurred edges compared to visible light images, making the reconstruction of fine structures and edges an even greater challenge during the super-resolution process [18]. Furthermore, since IR images consist of single-channel thermal radiation data without color information, applying existing visible-light-based SR models directly may result in degraded performance [19]. Consequently, recent studies have proposed lightweight CNN-based models that account for the characteristics of IR images, Transformer-based global feature learning, and architectures utilizing channel separation and attention mechanisms [20]. However, issues such as information loss during the high-resolution reconstruction process, computational cost, and the lack of generalization to real-world environments remain major limitations.
Deep learning-based preprocessing complements physics-based calibration by prioritizing structural restoration over radiometric normalization. In particular, for infrared (IR) satellite imagery, this preprocessing goes beyond simple image enhancement to directly contribute to the preservation of radiative energy-based information and improvements in target detection and temperature estimation accuracy. Nevertheless, common challenges such as robustness under severe image degradation, domain generalization, and operational efficiency persist, and these issues are further exacerbated in IR imagery due to its single-channel nature and limited data. Consequently, research directions are needed that include the construction of realistic datasets, multi-source conditioning, lightweight architecture design, and integration with physics-based models.
4. AI-DRIVEN INFRARED IMAGE ANALYSIS
Unlike optical imagery, infrared satellite imagery provides temperature-driven signatures with limited texture, lower spatial resolution, and time-varying thermal contrast, which makes AI-based analysis strongly task-dependent. In this survey, we organize AI-based infrared detection into four representative tasks (Fig. 4)—object detection, infrared small target detection (IRSTD), change detection, and anomaly detection—because they cover the major operational objectives of satellite IR analysis: instance-level localization, few-pixel target discovery, temporal scene comparison, and deviation-from-normality analysis. Accordingly, this section focuses on representative algorithms, benchmark datasets, and reported performance indicators for each task, while application-level use cases are discussed separately in Section 5.
4.1. Object Detection
Object detection in infrared satellite imagery aims at instance-level localization of semantically meaningful targets such as ships, aircraft, and active fires. Unlike infrared small target detection (IRSTD), which mainly focuses on point-like targets occupying only a few pixels, object detection addresses targets with identifiable categories and operational semantics. Because infrared sensing captures self-emitted thermal radiation, it supports day-and-night observation and can complement visible-light sensing under low-illumination conditions. Nevertheless, this task remains challenging due to low spatial resolution, weak texture cues, scene-dependent thermal contrast, and the presence of thermally bright background objects that can easily trigger false alarms.
4.1.1. Traditional Methods
Before the widespread adoption of deep learning, object detection in satellite thermal infrared imagery was dominated by thresholding- and context-based radiometric analysis. Representative examples include classical active-fire detection pipelines for MODIS, ASTER, and VIIRS, which exploit thermal anomalies and local background statistics to identify anomalously hot pixels while suppressing false alarms [21,22,23]. These methods are computationally efficient and physically interpretable, making them useful for hotspot screening and operational baseline systems. However, they are typically task-specific, rely on manually designed thresholds or heuristic tests, and are sensitive to heterogeneous backgrounds, atmospheric perturbations, and sensor-dependent radiometric variation. As a result, traditional approaches remain effective for thermal anomaly screening but are less suitable for generalized multi-class object detection in complex satellite infrared scenes.
4.1.2. Deep learning Methods
Recent studies have increasingly adopted deep learning-based object detectors to overcome the limitations of handcrafted pipelines. In the broader computer-vision literature, modern detection architectures have evolved from proposal-based two-stage frameworks such as Faster R-CNN [24], to one-stage detectors such as YOLO and RetinaNet [25,26], and more recently to transformer-based end-to-end models such as DETR and Deformable DETR [27,28]. These detector families introduced key design strategies—including multi-scale representation learning, class-imbalance handling, and global context modeling—that are also highly relevant to infrared satellite imagery. However, their direct transfer to satellite IR data is nontrivial because targets are often extremely small, texture-poor, and thermally unstable across sensors, bands, and imaging conditions.
In maritime monitoring, Li et al. [29] proposed an optimized YOLOv5s-based model with a Squeeze-and-Excitation attention mechanism to improve ship detection under complex thermal backgrounds. Related efforts such as TISD [30] further emphasized the importance of real spaceborne thermal infrared datasets for all-day ship detection. In aviation monitoring, Li et al. [31] introduced TIFAD.v1, the first global space-based thermal infrared aircraft dataset, and demonstrated the feasibility of detecting small aircraft targets from airframe and exhaust-plume thermal signatures. In disaster response, de Almeida Pereira et al. [32] constructed a large-scale Landsat-8 active-fire dataset and showed that deep learning-based detectors can substantially outperform conventional threshold-based products such as MODIS and VIIRS in wildfire hotspot detection. Collectively, these studies show that deep learning improves robustness to cluttered backgrounds and target diversity, but performance still depends heavily on satellite-specific training data and remains sensitive to cross-sensor domain shift.
4.2. Infrared Small Target Detection (IRSTD)
Small target detection in infrared search and track (IRST) systems is a critical technology that determines early warning and tracking performance. SPIE describes a small target as having a contrast ratio of less than 15%, an SNR of less than 1.5, and a target size of less than 0.15% of the whole image [33]. Such targets consist of only a few pixels, lacking shape and texture information, and are easily buried in background clutter.
4.2.1. Traditional Methods
Pre-deep learning IRSTD methods are categorized into three groups. First, filter-based methods such as HPF [34], Max-Mean/Max-Median filters [35], and Top-Hat transforms [36] suppress low-frequency background components to enhance high-frequency target signals. Second, local contrast-based methods such as LCM [37] and MPCM [38] compute brightness contrast between a central pixel and surrounding regions to extract target candidates. Third, low-rank/sparse decomposition-based methods such as the IPI model [39] employ RPCA to separate the background (low-rank component) from the target (sparse component). While these traditional methods can be applied without training data, they suffer from increased false alarm rates in complex backgrounds and rely heavily on manual parameter tuning.
4.2.2. Deep learning Methods
The introduction of deep learning has enabled IRSTD to overcome the limitations of handcrafted feature-based methods and achieve robust detection performance in complex backgrounds.
RISTD-Net [40] established the foundation for deep learning-based IRSTD research by effectively separating small targets from background clutter through multi-scale convolution kernels and an encoder-decoder architecture.
4.2.3. Single-frame Detection Methods
Subsequent research focused on improving single-frame small target detection performance. Notably, ALCNet [41] modularized local contrast measures as network layers and preserved small target features through bottom-up attention mechanisms, improving detection accuracy. However, early studies primarily addressed land or aerial platform environments and did not sufficiently consider the unique characteristics of satellite-based observation conditions.
MTU-Net [42] explicitly designed for satellite platform environments. By incorporating Vision Transformers, they enhanced representational capabilities for extremely small targets in satellite imagery and validated detection feasibility in maritime environments using the NUDT-SIRST-Sea dataset, which contains actual satellite images.
Table 2 indicates that the single-frame methods exhibit a clear trade-off between detection sensitivity and false-alarm suppression. ResU-Net achieves the lowest false alarm rate, but it’s detection probability (46.05) and IoU (60.18) remain substantially lower than those of MTU-Net. This suggests that ResU-Net behaves more conservatively, suppressing clutter effectively but missing many extremely small or dim ship targets. In contrast, MTU-Net achieves the highest detection probability (85.44) and IoU (64.14), indicating that multilevel ViT-CNN feature extraction is more effective for satellite infrared imagery, where long-range contextual cues are crucial for distinguishing true targets from maritime clutter and confusing bright structures.
TABLE 2.
Detection accuracy and background suppression performance of different methods on the NUDT-SIRST dataset. (×10⁻²) denotes the detection probability, (×10⁻⁶) denotes the false alarm rate, and IoU indicates the intersection-over-union metric
| Model | ↓ | ↓ | |
| ACM [43] | 70.46 | 21.31 | 47.57 |
| ALC-Net [41] | 58.65 | 9.13 | 48.9 |
| DNANet [44] | 61.60 | 17.19 | 42.17 |
| ResU-Net [45] | 46.05 | 7.92 | 60.18 |
| MTU-Net [42] | 85.44 | 11.72 | 64.14 |
4.2.4. Multi-frame Detection Methods
To overcome single-frame limitations, MIRST (Multi-frame IRSTD) techniques leveraging temporal continuity have gained significant attention. RFR framework [46] learns spatio-temporal dependencies across frames to compensate for registration errors caused by satellite platform motion and released the IRSatVideo-LEO dataset containing approximately 90,000 frames.
Currently, public satellite-based IRSTD datasets remain limited to NUDT-SIRST-Sea and IRSatVideo-LEO, posing challenges for model generalization and real-world deployment. Addressing data scarcity through domain adaptation and acquiring diverse satellite observation data remain critical priorities for advancing satellite-based small target detection capabilities.
Table 3 further shows that temporal modeling improves satellite video-based IRSTD, but the gain is metric-dependent. ResUNet_RFR achieves the highest (91.58) and a competitive AUC (91.59), indicating that recurrent feature refinement effectively exploits long-term temporal dependency and compensates for satellite motion. However, STDMANet yields the lowest false alarm rate, implying stronger false-alarm () suppression under the current setting. Therefore, the main strength of RFR is not that it is uniformly best on every metric, but that it provides stronger detection sensitivity and stable overall discriminability in satellite video conditions.
TABLE 3.
Detection performance comparison of different methods on the IRSatVideo-LEO dataset. Metrics are defined as in Table 1
| Model | ↓ | ↓ | |
| STDMANet [47] | 89.96 | 4.10 | 90.13 |
| DNANet_DTUM [44] | 86.88 | 13.68 | 89.02 |
| ResUNet_RFR [46] | 91.58 | 18.58 | 91.59 |
4.3. Change Detection
This section reviews representative change detection methods in remote sensing, categorizing them into traditional model-based approaches and deep learning-based methods according to their underlying principles and methodological evolution.
4.3.1. Traditional Methods
Traditional change detection methods aim to identify changes by analyzing radiometric or spectral differences between multitemporal remote sensing images based on predefined mathematical and statistical models.
Representative approaches such as Image Differencing and Image Ratioing are among the most basic techniques, as they directly compare pixel intensity values between two acquisition times. Although these methods are simple to implement, they suffer from high sensitivity to illumination variations and noise. To alleviate these limitations, Change Vector Analysis (CVA) [48] was introduced by extending multiband information into a vector space, enabling simultaneous analysis of both the magnitude and direction of change, thereby allowing joint interpretation of change intensity and type. Principal Component Analysis (PCA) [49]-based methods emphasize change-related information by exploiting the correlation structure within images. However, they have inherent limitations in direct inter-temporal comparison. To address this issue, Multivariate Alteration Detection (MAD) [50] and Iteratively Reweighted MAD (IR-MAD) [51] utilize correlation structures to suppress no-change components, thereby performing radiometric normalization and change detection simultaneously. In addition, Markov Random Fields (MRF) [52] and Conditional Random Fields (CRF) [53] are employed as probabilistic graph-based post-processing techniques to incorporate spatial continuity and contextual information into change detection results. Nevertheless, these traditional methods remain sensitive to environmental variations and image misregistration, and they exhibit limited capability in modeling complex object-level changes or nonlinear patterns in high-resolution imagery. Consequently, recent studies have increasingly expanded toward using traditional methods as baselines and integrating them with machine learning and deep learning-based approaches.
4.3.2. Deep Learning Methods
Deep learning-based change detection methods have evolved toward end-to-end learning frameworks that take multitemporal remote sensing images as input and directly learn change patterns. Representative Siamese CNN-based approaches, such as FC-EF, FC-Siam-conc, FC-Siam-diff [54], adopt U-Net-based fully convolutional architectures to extract change information through feature concatenation or differencing between bi-temporal images. These architectures exhibit different levels of change sensitivity and robustness to misregistration errors depending on the input fusion strategy and feature differencing scheme. Subsequently, SNUNet-CD [55] enhanced the detection of subtle changes and boundary reconstruction by introducing dense skip connections and attention mechanisms. Meanwhile, STANet [56] and BiT [57] leverage attention mechanisms and transformer-based architectures to effectively incorporate global contextual information, enabling stable detection of changes across multiple spatial scales. In particular, these models are capable of simultaneously capturing large-scale changes and localized variations. More recently, approaches such as DDPM-CD [58] and SMDNet [59] have been proposed, which utilize diffusion models for feature extraction or boundary refinement. In addition, methods that integrate self-supervised pretraining based on SeCo [60] have been introduced to further improve generalization performance.
Furthermore, recent studies have shifted toward more practical frameworks to address limitations such as data scarcity and domain shift. Weakly supervised approaches, such as the Dual U-Net [61] based weak temporal supervision framework, enable change detection without explicit change labels, improving scalability and generalization. Meanwhile, the focus has expanded beyond conventional bi-temporal analysis to satellite image time series (SITS)-based techniques [62]. By incorporating temporal attention mechanisms, these models capture long-term dependencies across multiple timestamps. Additionally, DeepLabV3+ [63]-based models have demonstrated strong performance on high-resolution satellite imagery, particularly in identifying specific change types such as construction. Despite their improved performance, they remain vulnerable to spatial and temporal domain variations, which can significantly degrade detection accuracy. Additionally, recent advances in thermal infrared satellite image analysis have demonstrated the effectiveness of frequency-domain enhancement and transformer-based architectures for fine-grained feature extraction. For instance, the MFcontrail [64] framework integrates a MaxViT encoder, a frequency-aware fusion decoder, and an edge-aware loss function to achieve precise segmentation of thin and elongated structures in thermal infrared imagery, outperforming conventional segmentation networks in terms of IoU and F1-score. Quantitative results on the Landsat-8 dataset further demonstrate its effectiveness, where MFcontrail (full) achieves an IoU of 55.94% and an F1-score of 71.74%, surpassing established models such as PSPNet (36.30% IoU, 53.26% F1) and DeepLabV3+ (51.87% IoU, 68.31% F1). These developments reflect a transition toward more sophisticated approaches that integrate weak supervision, temporal reasoning, and advanced feature representation, while key challenges for real-world deployment still persist (Table 4).
TABLE 4.
Performance comparison of different models on Landsat8 datasets
4.4. Abnormal/Anomaly Detection
This section reviews anomaly detection methods in remote sensing and categorizes them into traditional methods based on background modeling and deep learning-based approaches that learn normality and deviation patterns in an end-to-end manner.
4.4.1. Traditional Methods
Traditional anomaly detection methods follow an approach in which background characteristics within an image are first modeled, and pixels that significantly deviate from the modeled background are identified as anomalies. Reed-Xiaoli (RX) [65]-based methods assume the background to follow a Gaussian distribution and detect anomalies using Mahalanobis distance-based detection statistics. These methods have been extended into several variants, including Global RX (GRX) based on global statistics, Local RX (LRX) based on local statistics, and Kernel RX (KRX), which incorporates nonlinear mappings. The Adaptive Coherence/Cosine Estimator (ACE) [66] whitens the data using the background covariance matrix and then computes the cosine similarity between the whitened data and a target spectrum, providing robustness to illumination variations while preserving the Constant False Alarm Rate (CFAR) property. Meanwhile, Low-Rank and Sparse Representation (LRASR) [67]-based methods decompose the background into a low-rank component and anomalies into sparse components, enabling effective separation of anomalous signals even in complex backgrounds. Although these traditional anomaly detection methods have relatively simple structures and high interpretability, they suffer from limitations in handling complex backgrounds and in terms of computational cost.
4.4.2. Deep Learning Methods
Deep learning-based anomaly detection methods have evolved toward learning the distribution or structural characteristics of normal data and identifying patterns that deviate from normality. Early approaches primarily relied on autoencoder-based reconstruction schemes. Memory-augmented autoencoders, such as MemAE [68], alleviated the issue of simultaneously reconstructing anomalous patterns by selectively reconstructing only normal representations. Subsequently, methods such as AE-IT [69] integrated low-rank and sparse decomposition concepts into autoencoder architectures, enabling more explicit separation of background and anomalous components. Meanwhile, DROCC [70] introduced a discriminative approach that directly learns the manifold boundary formed by normal data, demonstrating effective anomaly discrimination without relying on reconstruction errors. More recently, transformer-based models such as 3DTR [71] and SDENet [72] have been proposed, which leverage self-attention mechanisms to model global spatial-spectral context and stably detect subtle anomalies even in complex background environments. Furthermore, diffusion model-based approaches such as DBD [73] combine probabilistic background generation with low-rank representations, thereby bridging traditional statistical methods and deep learning-based models.
Beyond these general frameworks, recent studies have demonstrated the applicability of deep learning to thermal anomaly detection in real-world satellite infrared imagery. Unlike conventional remote sensing imagery, thermal infrared data presents unique challenges: subtle anomalies often exhibit temperature differences of less than 1°C above the background, while solar irradiance, seasonal variation, and atmospheric effects introduce confounding thermal signals that are difficult to disentangle from genuine anomalous activity. A supervised semantic segmentation approach based on the U-Net architecture has been employed on ASTER thermal infrared imagery [74], trained on approximately 1,500 labeled images from multiple volcanoes. By leveraging spatial pattern learning rather than pixel-wise intensity thresholds and incorporating Focal Loss to address severe class imbalance, this method achieved a Macro F1-score of approximately 0.93 and demonstrated strong generalization to previously unseen volcanic regions. To improve data efficiency, an alternative image-level classification framework utilizing transfer learning has been proposed [75], where a pre-trained SqueezeNet was fine-tuned and integrated within an ensemble classification scheme using only about 200 Sentinel-2 and Landsat 8 infrared images. This approach achieved an overall accuracy of 98.3%, but it does not provide pixel-wise spatial localization of anomalies. Together, these studies highlight the trade-off between spatial granularity and data efficiency in satellite thermal infrared anomaly detection and underscore the advantages of spatial feature learning over intensity-based approaches in complex infrared backgrounds.
5. APPLICATIONS
Infrared and multispectral and hyperspectral remote sensing-based detection technologies are being practically employed across a wide range of application domains, including military and security, disaster and environmental monitoring, and transportation and mobile object surveillance. In infrared imagery, there is a strong demand for early identification of small thermal anomalies or targets with low contrast against complex backgrounds. This capability plays a critical role in applications such as early wildfire detection, industrial facility overheating monitoring, and long-range surveillance and reconnaissance missions. In these operational environments, stable detection under complex backgrounds and low-contrast conditions is essential, making the simultaneous achievement of high detection performance and reliability a key challenge.
In the military and security domain, space-based infrared sensor systems for missile early warning and tracking represent a prominent application. Systems such as SBIRS, STSS, and HBTSS enable real-time detection of launch signatures associated with ballistic missiles and hypersonic threats and are integrated with interception systems to significantly enhance defensive capabilities. In this context, infrared image-based anomaly detection and temporal change analysis serve as core technologies for the early identification of launch events [76,77,78].
Infrared and spectral remote sensing-based automated detection techniques are also widely applied in disaster and environmental monitoring. Infrared imagery is particularly advantageous for long-term surveillance, as it enables observation of thermally distinctive phenomena such as wildfires, energy and industrial facility accidents, and marine thermal anomalies under both day and night conditions and across diverse weather environments. Applications such as wildfire monitoring using satellite thermal infrared imagery and the surveillance of thermal effluents from nuclear power plants have been reported as representative use cases demonstrating the practical feasibility of these technologies [79,80,81].
Furthermore, infrared and spectral remote sensing imagery has been utilized in applications such as nuclear facility monitoring, where it serves as an indirect means of estimating the operational status of reactors and reprocessing facilities. Recurrent thermal signatures and discharge patterns observed over specific periods can be analyzed to infer facility activity and operational conditions [82].
In the field of transportation and mobile object surveillance, aircraft and vessel detection using thermal infrared sensors constitutes a major application area. Even under limited spatial resolution, aircraft and ships often appear as small-scale thermal signatures spanning only a few pixels, and it has been demonstrated that deep learning-based detection models can reliably identify such targets. Notably, the ability of infrared imagery to support detection under day/night cycles and cloudy conditions allows it to effectively complement the limitations of conventional visible-spectrum surveillance systems [29,31].
Overall, infrared and spectral image-based detection technologies have demonstrated their effectiveness through real-world operational deployments across diverse application domains. Looking ahead, further advancements are expected toward onboard processing and real-time surveillance systems, driven by improvements in sensor resolution and the adoption of lightweight deep learning models.
6. CONCLUSION AND FUTURE WORKS
This paper presented an integrated survey of AI-based infrared (IR) satellite image analysis, covering the full pipeline from physical principles and atmospheric correction to deep learning-based preprocessing and multi-task detection. By reviewing radiative foundations, band-dependent sensing characteristics, and radiative transfer modeling, we clarified the physical basis of IR remote sensing. We then examined how physics-based correction and data-driven restoration jointly enhance image reliability. Across object detection, infrared small target detection, change detection, and anomaly detection, we observed a clear methodological evolution from CNN-based architectures to Transformer based models, reflecting the increasing need for global context modeling and computational efficiency.
Despite significant progress, challenges remain, including limited satellite-specific IR datasets, domain generalization issues, and the demand for lightweight onboard inference. Beyond improving detection accuracy, future research should focus on integrating IR detection results into large-scale AI-driven intelligence systems. Platforms such as Palantir Technologies exemplify how satellite-derived detection outputs can be fused with geospatial data, auxiliary intelligence sources, and predictive analytics to support real-time decision-making. Accordingly, the next stage of IR satellite intelligence lies in combining physically grounded sensing, robust AI detection, and system-level data integration to enable reliable, operational, and scalable surveillance capabilities.
Future research should focus on constructing standardized satellite-specific IR benchmarks, developing physics-aware deep learning models that incorporate radiative constraints, and enabling lightweight, real-time onboard inference. In addition, multi-modal fusion strategies integrating IR with other geospatial and spectral sources will be essential to improve robustness and operational reliability, facilitating the transition from algorithm-level advancements to deployable intelligence systems.






