SaliencyI2PLoc: saliency-guided image-point cloud localization
using contrastive learning

Underview


Yuhao Li1, Jianping Li2†, Zhen Dong1†, Yuan Wang3, Bisheng Yang1

1State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University   
2Nanyang Technological University      3Jiangxi Normal University     
Corresponding authors.   

SaliencyI2PLoc aims to locate a image's position at a given reference point cloud map.

Abstract



Image to point cloud global localization is crucial for robot navigation in GNSS-denied environments and has become increasingly important for multi-robot map fusion and urban asset management. The modality gap between images and point clouds poses significant challenges for cross-modality fusion. Current cross-modality global localization solutions either require modality unification, which leads to information loss, or rely on engineered training schemes to encode multi-modality features, which often lack feature alignment and relation consistency. To address these limitations, we propose, SaliencyI2PLoc, a novel contrastive learning based architecture that fuses the saliency map into feature aggregation and maintains the feature relation consistency on multi-manifold spaces. To alleviate the pre-process of data mining, the contrastive learning framework is applied which efficiently achieves cross-modality feature mapping. The context saliency-guided local feature aggregation module is designed, which fully leverages the contribution of the stationary information in the scene generating a more representative global feature. Furthermore, to enhance the cross-modality feature alignment during contrastive learning, the consistency of relative relationships between samples in different manifold spaces is also taken into account. Experiments conducted on urban and highway scenario datasets demonstrate the effectiveness and robustness of our method. Specifically, our method achieves a Recall@1 of 78.92\% and a Recall@1\% of 97.59\% on the urban scenario evaluation dataset, showing an improvement of approximately 33.11\% and 22\%, compared to the baseline method. This demonstrates that our architecture efficiently fuses images and point clouds and represents a significant step forward in cross-modality global localization.

The pipeline of our methodology.


Results on the evaluation dataset built on the KITTI-360 dataset



Visualization of Top-K results



Cluster assignments at urban scenario datasets


The VLAD cluster assignment of the query images and the Top-1 point cloud from the database. The auxiliary point clouds/images are listed for better visualization. The referenced and Top-1 point clouds are rendered by the relative height, and whole point clouds are viewed from the bird eye view. The same color of patches in the cluster assignment subplots indicates the same cluster assigned.


Saliency maps


The visualization of saliency maps of the query images. During training, we notice that the saliency map shifts to the scene layout and stationary buildings.


More results


The feature visualization by t-sne. The data index is rendered from blue to red in descending order. $\blacktriangle$ and $\cdot$ represent the feature from the query and database respectively.


Citation


                @article{li2024saliencyi2ploc,
                  title={SaliencyI2PLoc: saliency-guided image-point cloud localization using contrastive learning},
                  author={Yuhao Li, Jianping Li, Zhen Dong, Yuan Wang and Bisheng Yang},
                  journal={Underreview},
                  year={2024}
                }