One of the central problems in computer vision is the interpretation of the content of a single image. A particularly interesting example of this is the extraction of the underlying 3D structure apparent in an image, which is especially challenging due to the ambiguity introduced by having no depth information. Nevertheless, knowledge of the regular and predictable nature of the 3D world imposes constraints upon images, which can be used to recover basic structural information.
Our work is inspired by the human visual system, which appears to have little difficulty in interpreting complex scenes from only a single viewpoint. Humans are thought to rely heavily on learned prior knowledge for this. As such we take a machine learning approach, to learn the relationship between appearance and scene structure from training examples.
This thesis investigates this challenging area by focusing on the task of plane detection, which is important since planes are a ubiquitous feature of human-made environments. We develop a new plane detection method, which works by learning from labelled training data, and can find planes and estimate their orientation. This is done from a single image, without relying on explicit geometric information, nor requiring depth.
This is achieved by first introducing a method to identify whether an individual image region is planar or not, and if so to estimate its orientation with respect to the camera. This is done by describing the image region using basic feature descriptors, and classifying against training data. This forms the core of our plane detector, since by applying it repeatedly to overlapping image regions we can estimate plane likelihood across the image, which is used to segment it into individual planar and non-planar regions. We evaluate both these algorithms against known ground truth, giving good results, and compare to prior work.
We also demonstrate an application of this plane detection algorithm, showing how it is useful for visual odometry (localisation of a camera in an unknown environment). This is done by enhancing a planar visual odometry system to detect planes from one frame, thus being able to quickly initialise planes in appropriate locations, avoiding a search over the whole image. This enables rapid extraction of structured maps while exploring, and may increase accuracy over the baseline system.