We describe initial work on a system for augmenting video sequences with 3-D graphics or animations so that they appear to be present within the scene. Our aim is to do this in real-time for sequences captured by uncalibrated `live' cameras, such as a hand-held or wearable. These sequences typically contain jitter and can have narrow baselines between frames over extended time intervals. The paper focuses on obtaining accurate estimates of the 3-D motion and position of the camera for these types of sequences. We present a method based on sparse feature tracking and the recursive structure from motion algorithm developed by Azarbayejani and Pentland. Our contribution is to report experiments which demonstrate that the approach performs well for sequences with jitter and narrow baselines and to discuss implementation issues relating to its use in a `live' real-time system.