Technologies that enable trusted computing in embedded systems such as cell phones, PDAs, or set top boxes have drawn much attention in recent years, especially since the Trusted Computing Group (TCG) announced the creation of security specifications for such devices. In June 2007, the TCG released the first specification of a Mobile Reference Architecture that builds on the concept of a Mobile Trust Module (MTM) to provide hardware-based security services such as device authentication, integrity measurement, and remote attestation. The MTM specification is closely tied to the specification of the Trusted Platform Module (TPM) for personal computers to ensure interoperability with the existing trusted computing framework. However, the fact that the MTM has much in common with the TPM poses a number of implementation problems and challenges. In this paper we identify three specific problem areas in the MTM specification and discuss possible solutions. The first problem arises from the need to carefully balance divergent system-level design goals like performance, area, and power consumption. A monolithic implementation of MTM functionality in a separate module may fail to yield the desired trade-off between security, cost, and performance. On the other hand, integrating TPM-like features directly into a processor core, similar to ARM's TrustZone, allows for flexible yet cost-effective implementation of trust primitives. The second problem concerns the selection of the cryptographic algorithms a TPM or MTM must support. SHA-1 and RSA are bad choices for reasons of security and performance in mobile devices, respectively. Elliptic curve cryptography is a viable alternative to RSA and well suited for resource-restricted systems. Finally, the third issue we address in this paper is the concrete implementation of cryptographic primitives. While cryptographic co-processors suffer from poor flexibility and large silicon area, a hardware/software co-design approach (e.g. in the form of custom instructions integrated into a processor core) allows algorithm agility to be achieved at low hardware cost.