Project Proposals - Raphaël Clifford

My personal research interests are in provably fast algorithms. I would like to hear from any students who have an interest in this area especially if they are interested in further research. You might be interested also in the M level unit Advanced Algorithms.

The list below is not exhaustive and suggestions for alternatives are welcome. As the descriptions are necessarily short, please email or come to see me to discuss any that take your interest. If you have done well in the Advanced Algorithms unit or COMS21102 then you may particularly enjoy some of the more algorithmic projects.

In general my preference is that all code produced is Open Source. I am happy to discuss why this is in person with anyone interested. I have marked projects that are more advanced with an asterisk ('*').

Open Source Projects

The best way to select an Open Source Project is to choose a piece of software that you use a lot yourself which has obvious missing functionality. You should then get in touch with the developers (via email, mailing lists, forums or even IRC) and ask them if they would be interested in someone implementing the missing part and discuss how difficult it might be. I have listed a small number of potential areas mostly related to instant messaging just as examples. For a wide ranging list of Open Source Projects undertaken by students around the world which might inspire you, take a look at Google Summer of Code. Another good way to go about it is to think of all the annoying things that your Windows friends say they can do that you can't in linux and fix one of them :)
    Add voice or video to your favourite IM client.
    Make a fully functioning ``whiteboarding" solution to allow remote collaboration.
    Make an extension for thunderbird that integrates properly with an external spam checker, for example spamassassin.
    Update an IM client to the latest protocol or protocol version.
    Interfacing between non-Windows machines and modern media players. See libmtp and TODO list
    Implement a cutting edge p2p system such as this one and integrate it with an existing open source p2p application.

Main Projects

    Bird Song Recognition*. We have thousands of recordings of exotic birdsong but labelling each one is a very laborious task. The goal of this project is to build an automatic bird song recognition system. Many tools exist for cleaning and processing audio and we will leverage those to create the model. For example, see the following screenshot from Bob Planque to see what can be done in a few lines of Matlab. The full list of bird songs can be found here. This project is particularly suited to those who are taking an MSc in Machine Learning or have done well in the COMS21200 unit or just anyone with a love birds.
    HMM methods for weighted strings. DNA and protein data is often probabilistic in that it is not known for sure which character appears at each position. This is especially true in derived data from bioinformatics. The goal of this project is to design and implement standard Hidden Markov Model matching algorithms on weighted strings.
    Content Based Image Retrieval*. The goal is to enable fast content based retrieval of massive data sets of images using recent improvements in algorithmics. In particular you will learn about a much discussed new technique called Locality Sensitive Hashing.
    External Memory and Cache-Oblivious Algorithms*. External memory algorithms allow you to process massive data sets when RAM is limited (for example on PDAs/phones etc.). Recent exciting theoretical advances have been made in this field but their practical utility is not yet known. You will work with an existing PhD student to perform the first implementations of the current state of the art.
    FM-index*. Until a few years ago it was always assumed that indexing data would necessarily add to the total storage requirements of a system. The FM-index allows you to both index and compress a data set simultaneously. However, very few practical implementations exist to date. The goal of this project is to implement and test an FM-index using state of the art algorithmic techniques.
    Academic's Assistant. Make a news ticker client to monitor paper publications that are available online. System should be interactive and specialise to the academic's field of interest.
    Automated Economic Forecaster. Professional traders and brokers receive forecasting and advice documents in electronic format on a regular basis. These documents are often long and written for human rather than computer understanding. The aim of this project is to produce an automated summary of these various different forms of information and a consensus view where appropriate.
    Automated Local Newspaper Generator. The Web is full of news sources and almost anything can be found if the user is interested. However, newspapers and similar news outlets still have strong appeal. Information that is relevant to where you are is also of great interest as restaurant reviews from LA may not be useful for people living in Dunstable. The goal of this project is to automatically create a properly set out local newspaper from information collected from the Web.
    Voice Controlled Computer Aided Cookery. Timing and sequencing are everything in cookery and more and more people have computers they can access in the kitchen. This project is to make a full computer aided cookery system, complete with recipe search, planning and real time cookery scheduling. The system should also be voice activated and give audible instructions to the chef.

Gaming

    Automatic Backgammon Robot Perform a study to see if the new range of online backgammon websites can resist attack by a world class backgammon application.

Music Projects

    Optical Music Recognition. Given an image of a musical score, transcribe the score to MIDI. 2nd supervisor: Majid Mirmehdi.
    Musical Segmentation. Use approximation repetition finding algorithms to ``segment" music into its logical parts. This will require an interest in algorithms and preferably some interest in music analysis.
    Automated Music Transcription*. Given an audio source, such as a music CD, the system should ideally produce an accurate MIDI representation of the input. This problem is very hard in its full form but the task will be to solve simplified special cases, such as a single piano playing under ideal recording conditions. The project will involve signal processing amongst other tools. This page and this paper might serve as useful starting points.

More Algorithm Research Projects

    An empirical comparison of approximate matching algorithms There have been a number of sophisticated approximation matching techniques proposed in recent years with impressive time complexities. However, their practical utility is unclear and largely untested. This project would be implement these algorithms and compare their performance on real data. This project requires someone with an interest in algorithms and a good confidence in C
    Approximate Counting in a Stream*. Given a limited amount of memory, the problem of counting the number of distinct elements in a stream, for example, becomes considerably more challenging. However, in a router or switch for example, this is the situation that arises. The goal of this project is to implement and compare existing approximate counting and sketching schemes and apply them to real world data.
    Nearest Neighbour Algorithms for Edit and Hamming Distance*. Similarly, some very advanced techniques have recently been proposed to extend the idea of nearest neighbour search to edit and hamming distances. The goal of this project is to implement some of these ideas and test then on large DNA data sets.
Home page