In this paper, we present a novel approach for clip-based key frame extraction. Our framework allows both clips with subtle changes as well as clips containing rapid shot changes, fades and dissolves to be well approximated. We show that creating key frame video abstractions can be achieved by transforming each frame of a video sequence into an eigenspace and then clustering this space using Gaussian Mixture Models (GMMs). A Minimum Description Length (MDL) criterion is then used to determine the optimal number of GMM components to use in the clustering. The image nearest to the centres of each of the GMM components are selected as key frames. Unlike previous work this technique relies on global video clip properties and results show that the key frames extracted give a very good representation of the overall clip content. We demonstrate the application of this technique on a database of $307$ clips of wildlife footage containing dissolves, shot changes, fades, pans, zooms and a wide range of animal behaviours.