This paper presents a method that is capable of robustly estimating gait phase of a human walking from a sequence of images using only low-level motion. The approach we adopt is first to learn statistical motion models of the trajectories we would expect to observe for each of the main limbs. We then extract a sparse cloud of motion features from an image sequence using a standard feature tracker. By comparing the motion of the tracked features to our models and integrating over all feature points, a HMM can be used to estimate the most likely sequence of phases. This method is then extended to be invariant to translation by using a particle filter to track the dominant foreground object. Experimental results show that the presented system is capable of extracting gait phase to a high level of accuracy, demonstrating robustness to changes in height of the walker, gait frequency and individual gait characteristics. The purpose of this work is to ask the question "How much information can we extract if we choose to throw away all appearance cues and rely only on motion?"