Dense visual SLAM

نویسنده

  • Richard Newcombe
چکیده

Visual SLAM systems aim to estimate the motion of a moving camera together with the geometric structure and appearance of the world being observed. To the extent that this is possible using only an image stream, the core problem that must be solved by any practical visual SLAM system is that of obtaining correspondence throughout the images captured. Modern visual SLAM pipelines commonly obtain correspondence by using sparse feature matching techniques and construct maps using a composition of point, line or other simple geometric primitives. The resulting sparse feature map representations provide sparsely furnished, incomplete reconstructions of the observed scene. Related techniques from multiple view stereo (MVS) achieve high quality dense reconstruction by obtaining dense correspondences over calibrated image sequences. Despite the usefulness of the resulting dense models, these techniques have been of limited use in visual SLAM systems. The computational complexity of estimating dense surface geometry has been a practical barrier to its use in real-time SLAM. Furthermore, MVS algorithms have typically required a fixed length, calibrated image sequence to be available throughout the optimisation — a condition fundamentally at odds with the online nature of SLAM. With the availability of massively-parallel commodity computing hardware, we demonstrate new algorithms that achieve high quality incremental dense reconstruction within online visual SLAM. The result is a live dense reconstruction (LDR) of scenes that makes possible numerous applications that can utilise online surface modelling, for instance: planning robot interactions with unknown objects, augmented reality with characters that interact with the scene, or providing enhanced data for object recognition. The core of this thesis goes beyond LDR to demonstrate fully dense visual SLAM. We replace the sparse feature map representation with an incrementally updated, non-parametric, dense surface model. By enabling real-time dense depth map estimation through novel short baseline MVS, we can continuously update the scene model and further leverage its predictive capabilities to achieve robust camera pose estimation with direct whole image alignment. We demonstrate the capabilities of dense visual SLAM using a single moving passive camera, and also when real-time surface measurements are provided by a commodity depth camera. The results demonstrate state-of-the-art, pick-up-and-play 3D reconstruction and camera tracking systems useful in many real world scenarios. Acknowledgements There are key individuals who have provided me with all the support and tools that a student who sets out on an adventure could want. Here, I wish to acknowledge those friends and colleagues, that by providing technical advice or much needed fortitude, helped bring this work to life. Prof. Andrew Davison’s robot vision lab provides a unique research experience amongst computer vision labs in the world. First and foremost, I thank my supervisor Andy for giving me the chance to be part of that experience. His brilliant guidance and support of my growth as a researcher are well matched by his enthusiasm for my work. This is made most clear by his fostering the joy of giving live demonstrations of work in progress. His complete faith in my ability drove me on and gave me license to develop new ideas and build bridges to research areas that we knew little about. Under his guidance I’ve been given every possible opportunity to develop my research interests, and this thesis would not be possible without him. My appreciation for Prof. Murray Shanahan’s insights and spirit began with our first conversation. Like ripples from a stone cast into a pond, the presence of his ideas and depth of knowledge instantly propagated through my mind. His enthusiasm and capacity to discuss any topic, old or new to him, and his ability to bring ideas together across the worlds of science and philosophy, showed me an openness to thought that I continue to try to emulate. I am grateful to Murray for securing a generous scholarship for me in the Department of Computing and for providing a home away from home in his cognitive robotics lab. I am indebted to Prof. Owen Holland who introduced me to the world of research at the University of Essex. Owen showed me a first glimpse of the breadth of ideas in robotics, AI, cognition and beyond. I thank Owen for introducing me to the idea of continuing in academia for a doctoral degree and for introducing me to Murray. I have learned much with many friends and colleagues at Imperial College, but there are three who have been instrumental. I thank Steven Lovegrove, Ankur Handa and Renato Salas-Moreno who travelled with me on countless trips into the unknown, sometimes to chase a small concept but more often than not in pursuit of the bigger picture we all wanted to see. They indulged me with months of exploration, collaboration and fun, leading to us understand ideas and techniques that were once out of reach. Together, we were able to learn much more. Thank you Hauke Strasdatt, Luis Pizarro, Jan Jachnick, Andreas Fidjeland and members of the robot vision and cognitive robotics labs for brilliant discussions and for sharing the

برای دسترسی به متن کامل این مقاله و 10 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Deeply Supervised Visual Descriptors for Dense Monocular Reconstruction

Visual SLAM (Simultaneous Localization and Mapping) methods typically rely on handcrafted visual features or raw RGB values for establishing correspondences between images. These features, while suitable for sparse mapping, often lead to ambiguous matches at texture-less regions when performing dense reconstruction due to the aperture problem. In this work, we explore the use of learned feature...

متن کامل

LoopSmart: Smart Visual SLAM Through Surface Loop Closure

We present a visual simultaneous localization and mapping (SLAM) framework of closing surface loops. It combines both sparse feature matching and dense surface alignment. Sparse feature matching is used for visual odometry and globally camera pose fine-tuning when dense loops are detected, while dense surface alignment is the way of closing large loops and solving surface mismatching problem. T...

متن کامل

A Distributed Framework for Monocular Visual SLAM

In Distributed Simultaneous Localization and Mapping (SLAM), multiple agents generate a global map of the environment while each performing its local SLAM operation. One of the main challenges is to identify overlapping maps, especially when agents do not know their relative starting positions. In this paper we are introducing a distributed framework which uses an appearance based method to ide...

متن کامل

GPSlam: Marrying Sparse Geometric and Dense Probabilistic Visual Mapping

We propose a novel, hybrid SLAM system to construct a dense occupancy grid map based on sparse visual features and dense depth information. While previous approaches deemed the occupancy grid usable only in 2D mapping, and in combination with a probabilistic approach, we show that geometric SLAM can produce consistent, robust and dense occupancy information, and maintain it even during erroneou...

متن کامل

Large Scale Dense Visual Inertial SLAM

In this paper we present a novel large scale SLAM system that combines dense stereo vision with inertial tracking. The system divides space into a grid and efficiently allocates GPU memory only when there is surface information within a grid cell. A rolling grid approach allows the system to work for large scale outdoor SLAM. A dense visual inertial dense tracking pipeline incrementally localiz...

متن کامل

Learning monocular visual odometry with dense 3D mapping from dense 3D flow

This paper introduces a fully deep learning approach to monocular SLAM, which can perform simultaneous localization using a neural network for learning visual odometry (L-VO) and dense 3D mapping. Dense 2D flow and a depth image are generated from monocular images by sub-networks, which are then used by a 3D flow associated layer in the L-VO network to generate dense 3D flow. Given this 3D flow...

متن کامل

افزودن به منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی راحت تر خواهید کرد

برای دسترسی به متن کامل این مقاله و 10 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012