We present a system that automatically detects, recognises and tracks text in natural scenes in real-time. The focus of our method is on large text found in outdoor environments, such as shop signs, street names, billboards and so on. Built on top of our previously developed techniques for scene text detection and orientation estimation, the main contribution of this work is to present a complete end-to-end scene text reading system based on text tracking. We propose to use a set of Unscented Kalman Filters (UKF) to maintain each text region's identity and to continuously track the homography transformation of the text into a fronto-parallel view, thereby being resilient to erratic camera motion and wide baseline changes in orientation. The system is designed for continuous, unsupervised operation in a handheld or wearable system over long periods of time. It is completely automatic and features quick failure recovery and interactive text reading. It is also highly parallelised to maximize usage of available processing power and achieve real-time operation. We demonstrate the performance of the system on sequences recorded in outdoor scenarios.