<< 2012-3 >>
Department of
Computer Science
 

Locating Text in Indoor Scenes

Document image processing deals with automated techniques for retrieving information from scanned document images, including digitisation of documents for fast searching and retrieval, automated data entry from form sheets, retrieval of information from maps, and separation of text from graphics. The limitation of document image processing or OCR is that to analyse text a directly frontal image of the document is required, i.e the image has to be fronto-parallel and skew-compensated. The automatic segmentation and recognition of text in arbitrary scenes, where the text may or may not be fronto-parallel to the viewing plane, is an area of computer vision which has not been extensively researched previously. The problem is to locate any text in an image and align it correctly to obtain a fronto-parallel view. At this stage the text would be ready for higher-level interpretation by an OCR system or a human observer.

In this project we aim to locate and recover all regions of text in greylevel images of indoor scenes with fairly constant illumination. The text in these images may be small or large, and it may be at an unreadable angle to the viewpoint (to a degree). Once we locate a region of text, we wish to prepare it for higher level (text) recognition. We can either feed the region directly into an OCR system or display it for a human observer. To be suitable for such higher level recognition, the output images should have perspective effects removed so that the text appears to be face-on. If the quality of the recovery is not good enough, our focussing of attention could facilitate an autonomous robot to move into a suitable position to take a better look, or a computer controlled camera (wearable or otherwise) to zoom in on the text in order to read it. The advantage this facility gives these applications is that the resolution of the camera may be minimised. We hope to experiment with the latter option of controlling camera parameters.

The research into retrieval of text from 3D scenes has applications for intelligent robots which gain information from text in their surroundings, replacing the document/photograph scanner with a point and click camera, vehicle navigation and control (recognition of text on road signs and number plates), aid for the visually impaired, wearable computing tasks benefiting from knowledge of local text, and general automated tasks requiring the ability to read where it is not possible to use a scanner.

Staff and Students

Majid Mirmehdi.
Paul Clark (ex PhD Student)
Celine Mancas Thillou (visiting PhD student Summer 2005)
Eva Bertucci (MSc student 02/03)

Publications

Partial Support

Hewlett-Packard Research Laboratories Europe, Bristol.
© 1995-2013 University of Bristol  |  Terms and Conditions  |  Use of Cookies
About this Page