Region Classification for the Interpretation of Video SequencesAngus Clark, Region Classification for the Interpretation of Video Sequences. PhD thesis. Department of Computer Science, University of Bristol. September 1999. PDF, 16157 Kbytes.
This thesis concerns the continuing development of a classification system for the interpretation of images. The approach involves segmenting the image into a number of regions, then describing each region in terms of a set of features, and finally passing the feature-description on to a neural network which has been trained to label the region with an object-type. The work begins with a detailed investigation of the feature-set. A psychophysical study, aimed at revealing which visual cues a human subject might use, is presented. The neural network is then subjected to a similar analysis, the results permitting a number of comparisons to be made between the machine vision system and its biological counterpart. Drawing on the psychophysical observations, the existing feature-set undergoes a series of refinements, with the properties of texture, colour, and shape coming under scrutiny. Significant improvements in classification accuracy are found when incorporating the revised feature-set. Directing attention towards the analysis of time-varying imagery, a number of temporal constraints are developed in order to promote continuity over the sequence. It is shown how the concept of temporal coherence can be exploited to improve both computational efficiency and classification accuracy. A method, employing a mesh of active contour primitives, is developed which allows an initial segmentation to be evolved over the sequence. Mechanisms are incorporated to ease the transition between frames and to accommodate the topological discontinuities that can arise during the sequence. A set of features, based on complex spatio-temporal Gabor filters, is developed to encode both texture and motion variations in the image data. The work shows not only how the classification system can be applied to the analysis of image sequences, but also how the temporal dimension can actually be exploited to improve overall performance.