Locating Text in Indoor Scenes
Document image processing deals with automated techniques for retrieving information from scanned document images, including digitisation of documents for fast searching and retrieval, automated data entry from form sheets, retrieval of information from maps, and separation of text from graphics. The limitation of document image processing or OCR is that to analyse text a directly frontal image of the document is required, i.e the image has to be fronto-parallel and skew-compensated. The automatic segmentation and recognition of text in arbitrary scenes, where the text may or may not be fronto-parallel to the viewing plane, is an area of computer vision which has not been extensively researched previously. The problem is to locate any text in an image and align it correctly to obtain a fronto-parallel view. At this stage the text would be ready for higher-level interpretation by an OCR system or a human observer.In this project we aim to locate and recover all regions of text in greylevel images of indoor scenes with fairly constant illumination. The text in these images may be small or large, and it may be at an unreadable angle to the viewpoint (to a degree). Once we locate a region of text, we wish to prepare it for higher level (text) recognition. We can either feed the region directly into an OCR system or display it for a human observer. To be suitable for such higher level recognition, the output images should have perspective effects removed so that the text appears to be face-on. If the quality of the recovery is not good enough, our focussing of attention could facilitate an autonomous robot to move into a suitable position to take a better look, or a computer controlled camera (wearable or otherwise) to zoom in on the text in order to read it. The advantage this facility gives these applications is that the resolution of the camera may be minimised. We hope to experiment with the latter option of controlling camera parameters.
The research into retrieval of text from 3D scenes has applications for intelligent robots which gain information from text in their surroundings, replacing the document/photograph scanner with a point and click camera, vehicle navigation and control (recognition of text on road signs and number plates), aid for the visually impaired, wearable computing tasks benefiting from knowledge of local text, and general automated tasks requiring the ability to read where it is not possible to use a scanner.
Staff and Students
Majid Mirmehdi.Paul Clark (ex PhD Student)
Celine Mancas Thillou (visiting PhD student Summer 2005)
Eva Bertucci (MSc student 02/03)
Publications
-
Paul Clark and Majid Mirmehdi.
Location and recovery of text on
oriented surfaces.
In SPIE conference on Document Recognition and Retrieval VII, pages
267--277. The International Society for Optical Engineering, January 2000.
Gzipped PostScript: 1357669 bytes.
-
P. Clark and M. Mirmehdi.
Finding Text Regions Using Localised Measures.
Proceedings of the 11th British Machine Vision Conference, pages 675--684. BMVA Press, September 2000.
Gzipped PostScript: 6822053 bytes.
-
P. Clark and M.Mirmehdi.
Combining Statistical Measures to Find Image Text Regions.
Proceedings of the 15th International Conference on
Pattern Recognition, pages 450--453. IEEE Computer Society, September
2000.
Gzipped PostScript: 3262815 bytes.
-
M. Mirmehdi, P. Clark, and J. Lam.
Extracting low resolution text with an active camera for OCR.
In Proceedings of the IX Spanish Symposium on Pattern
Recognition and Image Processing, pages 43-48, May 2001.
Acrobat: 912879 bytes.
-
P. Clark and M. Mirmehdi.
Estimating the orientation and recovery of text planes in a single
image.
In Proceedings of the 12th British Machine Vision Conference,
pages 421-430. BMVA Press, September 2001.
Acrobat: 1824662 bytes.
-
P. Clark and M. Mirmehdi.
Recognising text in real scenes.
International Journal on Document Analysis and Recognition,
4(4):243-257, August 2002.
-
P. Clark and M. Mirmehdi.
On the recovery of oriented documents from single images.
In Proceedings of the 4th IEEE Advanced Concepts for Intelligent
Vision Systems, pages 190-197, 2002.
Acrobat: 1408660 bytes.
- Paul Clark and Majid Mirmehdi. Rectifying perspective views of text in 3D scenes using vanishing points. Pattern Recognition,volume 36(11): 2673--2686, November 2003.
- Majid Mirmehdi, Paul Clark and Justin Lam. A non-contact method of capturing low-resolution text for OCR. Pattern Analysis and Applications,,volume 6(1): 12--22, April 2003.
- Eva Bertucci, Maurizio Pilu and Majid Mirmehdi. Text selection by structured light marking for hand-held cameras. In: IEEE/IAPR 7th International Conference on Document Analysis and Recognition, pages 555--559. IEEE CS Press, August 2003.
Acrobat:1333690 bytes.
- Celine Mancas-Thillou and Majid Mirmehdi. Super-Resolution Text using the Teager Filter. In: First International Workshop on Camera-Based Document Analysis and Recognition, pages 10--16. August 2005. Acrobat:825414 bytes.

