Genetic Programming (GP) is a computationally intensive technique which is also highly parallel in nature. In recent years, significant performance improvements have been achieved over a standard GP CPU-based approach by harnessing the parallel computational power of many-core graphics cards which have hundreds of processing cores. This enables both fitness cases and candidate solutions to be evaluated in parallel. However, this paper will demonstrate that by fully exploiting a multi-core CPU, similar performance gains can also be achieved. This paper will present a new GP model which demonstrates greater efficiency whilst also exploiting the cache memory. Furthermore, the model presented in this paper will utilise Streaming SIMD Extensions to gain further performance improvements. A parallel version of the GP model is also presented which optimises multiple thread execution and cache memory. The results presented will demonstrate that a multi-core CPU implementation of GP can yield performance levels that match and exceed those of the latest graphics card implementations of GP. Indeed, a performance gain of up to 420-fold over standard GP is demonstrated and a threefold gain over a graphics card implementation.