This Thesis develops methods that ultimately result in a self–contained Augmented Reality hand- held system for assisting a user to interactively build 3D wireframe computer models of real objects. The approach is based on an in–situ (in–place) interactive image–based model build- ing method that allows users to build or import models useful to perform augmentations in most places. Three main components are studied: a number of visual competences such as 6D tracking and object detection methods, user interaction techniques, and evaluations for handheld devices. Four visual–based tracking algorithms have been implemented, tested and discussed in the context of existing literature. Two of them are region–based tracking algorithms that are tracking with general regression and sum–of–squared differences tracking algorithms. The others are edge– based tracking algorithms which are contour and 3D model–based tracking algorithms Based on testing, the 3D model–based tracking algorithm has been selected to be used for the rest of the system. Two types of input devices, which are a one–handed wand–like (WnD) and a two–handed touch- screen (TS), for users to interact with the system have been custom built and studied. Two methods for users to define a free 3D vertex from 2D images have been investigated. The user studies were carried out with the two different hardware platforms (WnD and TS), two different 3D selection techniques (two–click (2Cs) and click–and–move (CnM) techniques) and two different trackers (model–based and SLAM–based trackers). There were two groups of participants involved in the studies. The first group performed the experiment by using the system that estimates a camera pose from the model–based tracker. The statistical results suggested that the WnD combined with 2Cs is the best system for 3D vertex selection, and overall participants had a marginal preference for the WnD combined with CnM over the WnD combined with 2Cs. The system that estimates the camera pose from the SLAM–based tracker was used in the second study. This aimed to con- sider cases when it is possible to afford to maintain a larger model (or map) of the immediate environment. Although the statistical results of this study are similar to the results of the first study, overall participants had a marginal preference for the WnD combined with 2Cs over the WnD combined with CnM. An integrated in–situ modelling system named OutlinAR and its implementation are introduced. It consists of the visual tracking algorithms, the user interactive techniques and hardware platforms. To make the OutlinAR easier to use, a novel technique of edge–based object detection used to re–initialise the tracker has been co-developed and integrated into the system. The detector works form a library of edgelet constellations of object’s images recovered at a training phase. During the detection stage, the detector will detect the object by matching edgelet constellations to those in the library. To summarize the potential of the work, the touchscreen version of OutlinAR has been ported to run on a mobile phone, demonstrating all the stages necessary for modelling and using wireframe models for Augmented Reality on the go.