In this paper, we present an efficient visual SLAM system designed to tackle both short-term and long-term illumination challenges. Our system adopts a hybrid approach that combines deep learning techniques for feature detection and matching with traditional backend optimization methods. Specifically, we propose a unified CNN that simultaneously extracts keypoints and structural lines. These features are then associated, matched, triangulated, and optimized in a coupled manner. Additionally, we introduce a lightweight relocalization pipeline that reuses the built map, where keypoints, lines, and a structure graph are used to match the query frame with the map. To enhance the applicability of the proposed system to real-world robots, we deploy and accelerate the feature detection and matching networks using C++ and NVIDIA TensorRT. Extensive experiments conducted on various datasets demonstrate that our system outperforms other state-of-the-art visual SLAM systems in illumination-challenging environments. Efficiency evaluations show that our system can run at a rate of 73Hz on a PC and 40Hz on an embedded platform.
To improve the efficiency of the learning-based feature detection, we propose PLNet, a unified model for both keypoint and line detection. It consists of the shared backbone, the keypoint module, and the line module. PLNet can output keypoints, descriptors, and structural lines simultaneously at a spped of 79.4Hz.
We propose a point-line-based stereo visual odometry to build the iitial map. It is a hybrid system utilizing both the learning-based front-end and the traditional optimization backend. For each stereo image pair, we first employ the proposed PLNet to extract keypoints and line features. Then a GNN (LightGlue) is used to match keypoints. In parallel, we associate line features with keypoints and match them using the keypoint matching results. After that, we perform an initial pose estimation and reject outliers. Based on the results, we triangulate the 2D features of keyframes and insert them into the map. Finally, the local bundle adjustment will be performed to optimize points, lines, and keyframe poses. In the meantime, if an IMU is accessible, its measurements will be processed using the IMU preintegration method, and added to the initial pose estimation and local bundle adjustment.
We propose a point-line-based stereo visual odometry to build the initial map.
We propose a point-line-based stereo visual odometry to build the iitial map.
@article{xu2024airslam,
title = {{AirSLAM}: An Efficient and Illumination-Robust Point-Line Visual SLAM System},
author = {Xu, Kuan and Hao, Yuefan and Yuan, Shenghai and Wang, Chen and Xie, Lihua},
journal = {arXiv preprint arXiv:2408.03520},
year = {2024},
url = {https://arxiv.org/abs/2408.03520},
code = {https://github.com/sair-lab/AirSLAM},
}
@inproceedings{xu2023airvo,
title = {{AirVO}: An Illumination-Robust Point-Line Visual Odometry},
author = {Xu, Kuan and Hao, Yuefan and Yuan, Shenghai and Wang, Chen and Xie, Lihua},
booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year = {2023},
url = {https://arxiv.org/abs/2212.07595},
code = {https://github.com/sair-lab/AirVO},
video = {https://youtu.be/YfOCLll_PfU},
}