懂球帝直播赛事-足球免费直播24小时直播

摘要详情

ID / 提交时间

83 / 2024-08-15 10:36:49

标题

A Fine-Grained Semantic Alignment Method for Remote Sensing Text-Image Retrieval （AITC 2024+摘要）

关键字

Remote Sensing,Text-Image Retrieval,Fine-Grained Alignment,Dataset

主题及专题

8. 空天大数据应用

状态

摘要待审

作者

张伟航 / 中国科学院空天信息创新研究院

陈佳良 / 中国科学院空天信息创新研究院

张文凯 / 中国科学院空天信息创新研究院

李新明 / 空天信息大学

高鑫 / 中国科学院空天信息创新研究院

摘要

In recent years, with the rapid development of remote sensing (RS) technology, a vast amount of high-resolution RS imagery has emerged, which holds significant value for humanity in extracting valuable knowledge. Cross-modal text-image retrieval methods have garnered increasing attention in the research of RS image retrieval. These methods involve both visual and language comprehension, enabling users to find images that best illustrate the topic of a text query or text descriptions that best explain the content of a visual query. Although some progress has been made in the field of cross-modal text-image retrieval for RS, previous mainstream methods have struggled to obtain fine-grained semantic discriminative features. Furthermore, due to the repetitive and ambiguous nature of text descriptions in commonly used datasets, these methods may not be directly applicable to fine-grained retrieval tasks for remote RS images and texts. To address this issue, we innovatively propose a fine-grained semantic alignment retrieval framework for RS text-image retrieval in this paper. Firstly, previously commonly used datasets RSICD and RSITMD have coarse-grained and repetitive captions. To achieve fine-grained semantic alignment, we conducted fine-grained annotations based on RSICD and RSITMD. Figure 1 shows an example from the RSICD dataset. Different from previous short-sentence annotation methods, we recaption the geospatial elements in each RS image with five fine-grained annotations. For the attributes of geospatial elements, we extended beyond categories to include information on state, color, quantity, etc. For the relationships between entities, we expanded to include spatial relationships and functional relationships. The volume of image-text pairs in the recaptioned dataset FG-RSICD and FG-RSITMD are 109K and 47K, respectively. The average length of captions is 30.5 and 32.2, respectively, up from 11.5 and 11.3 previously. To ensure the accuracy and diversity of the dataset semantics, we employed multimodal large models for scoring and conducted manual verification. Moreover, Previous approaches are limited by data quality and model capabilities and are suboptimal on fine-grained semantic alignment. Traditional methods primarily encode the entire image and serve the obtained high-dimensional embedding vectors as the visual representation of the image. Due to the lack of fine-grained visual representation, it is challenging to learn fine-grained visual-semantic correspondences. In order to learn fine-grained semantic discriminative features in recaptioned datasets, we incorporated instance feature information for supplementation. Specifically, inspired by bottom-up attention, we segment RS images and extract regions of interest that contain essential targets. During training, we project the precomputed regions of interest features into a common space and align them with visual and semantic representations. Empirical results indicate that the re-annotated datasets RSICD and RSITMD contain fine-grained information and implicit knowledge within complex RS scenes. Compared to previous methods, our framework exhibits superior retrieval performance. Especially on the RSICD dataset, the retrieval performance of our proposed framework achieves a relative improvement of 8%. The datasets and code will be open-sourced.

重要日期

会议日期

09月20日

2024

至

09月22日

2024
08月30日 2024

初稿截稿日期
09月22日 2024

注册截止日期

主办单位

山东省人民政府
中国电子学会

承办单位

中国科学院学部
中国科学院空天信创新研究所息
复旦大学

联系方式

李若明
li******@aircas.ac.cn
177********

赵丽云
zh******@aircas.ac.cn
186********

高华
ra******@aircas.ac.cn
010*********

登录查看完整联系方式

移动端

在手机上打开

小程序

打开微信小程序

客服

扫码或点此咨询

首届空天信息软件直播观看大会

摘要详情

重要日期

会议日期

主办单位

承办单位

联系方式