Spatial Focusing and Progressive Decoupling Detector for High-Aspect-Ratio Rotated Objects
In recent years, remote sensing object detection has witnessed significant advancements through deep explorations of convolutional neural networks (CNNs) and vision transformer (ViT) architectures. However, detecting rotated objects with high aspect ratios remains challenging. Current detection fram...
        Saved in:
      
    
          | Published in | IEEE journal of selected topics in applied earth observations and remote sensing pp. 1 - 17 | 
|---|---|
| Main Authors | , , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
            IEEE
    
        15.10.2025
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1939-1404 2151-1535 2151-1535  | 
| DOI | 10.1109/JSTARS.2025.3622338 | 
Cover
| Summary: | In recent years, remote sensing object detection has witnessed significant advancements through deep explorations of convolutional neural networks (CNNs) and vision transformer (ViT) architectures. However, detecting rotated objects with high aspect ratios remains challenging. Current detection frameworks inadequately address the anisotropic feature distribution caused by such objects: feature information is highly concentrated in one spatial dimension while being sparse in another; and there are significant feature differences in the parameters representing the bounding box. To address this issue, we propose a Spatial Focusing and Progressive Decoupling Detector (SFPD-Det), which consists of three components: the Spatially Crosswise Convolution Module (SCCM), Hierarchical Decoupling Network (HDN), and Dynamic Progressive Activation Masks (DPMs). The SCCM captures diverse spatial features with long-range dependencies by combining square convolutions with multi-branch orthogonal large strip convolutions, enhancing the model adaptability to objects with varying aspect ratios. The HDN is composed of stacked ViT blocks and uses separate network branches to predict the position, angle, and size of bounding boxes in a cascaded manner. Furthermore, by combining the predicted parameters, we propose DPMs that embed the mask information of potential object boundary regions into the HDN, which progressively guide the self-attention to enhance cirtical features within the region of interest, thereby achieving precise bounding box regression. Extensive experiments on four benchmark remote sensing datasets (DOTA, DIOR-R, HRSC2016, and UCAS-AOD) demonstrate that our SFPD-Det achieves superior performance when compared with state-of-the-art detectors. | 
|---|---|
| ISSN: | 1939-1404 2151-1535 2151-1535  | 
| DOI: | 10.1109/JSTARS.2025.3622338 |