Bristol Algorithm Days - List of Abstracts

BAD'07

18 - 20th Feb 2007

Department of Computer Science
University of Bristol, Bristol, UK


Professor Muthu Muthukrishnan (Rutgers University)

Title: Algorithmic Theory of Sparse Approximation Problems

The sparse approximation problem is to find the best linear combination of at most B vectors from a pre-specified dictionary of N dimensional vectors for a given input vector V, also of N dimensions. The classical versions arise from late 1700's with orthonormal dictionaries; the modern versions have richer dictionaries, weighted norms or study compressed sensing. This talk will focus on algorithmic results for such problems.

Professor Michael Bender (Department of Computer Science, Stony Brook, New York)

Title: An Adaptive Packed-Memory Array

In this talk we show how to maintain a dynamic set of N elements in sorted order in a O(N)-sized array in memory or on disk. The idea is to intersperse O(N) gaps so that only a small number of elements need to be moved on an insert or delete.

Because the elements are stored physically in sorted order in memory or on disk, the structure supports extremely efficient data scans or range queries.

We show how to maintain the structure with a small (polylogarithmic) number of element moves even in the worst case. We then present an adaptive structure that performs even better for common insertion patterns. We give theoretical analysis and experimental results.

Joint work with Haodong Hu.

Dr Veli Makïnen (Department of Computer Science, University of Helsinki)

Title: Implicit Compression Boosting with Applications to Self-Indexing

Compression boosting (Ferragina & Manzini, SODA 2004) is a new technique to enhance zeroth order entropy compressors' performance to k-th order entropy. It works by constructing the Burrows-Wheeler transform of the input text, finding optimal partitioning of the transform, and then compressing each piece using an arbitrary zeroth order compressor. The technique has an application to text indexing: Essentially, building a wavelet tree (Grossi & Gupta & Vitter, SODA 2003) for each piece in the partitioning yields a k-th order compressed full-text self-index providing efficient substring searches on the indexed text (Ferragina & Manzini & Mäkinen & Navarro, ACM TALG, to appear). In this talk, we show that using explicit compression boosting with wavelet trees is not necessary; our new analysis reveals that the size of the wavelet tree built for the complete Burrows-Wheeler transformed text is, in essence, the sum of those built for the pieces in the optimal partitioning. Hence, the technique provides a way to do compression boosting implicitly, with a trivial linear time algorithm. In addition to having these consequences on compression and static full-text self-indexes, the analysis shows that a recent dynamic self-index (Mäkinen & Navarro, CPM 2006) occupies in fact space proportional to k-th order entropy.

Joint work with Gonzalo Navarro.

Dr David Manlove (Department of Computer Science, University of Glasgow)

Title: Popular matchings in the Weighted Capacitated House Allocation Problem

We consider the problem of finding a popular matching in the Weighted Capacitated House Allocation problem (WCHA). An instance of WCHA involves a set of agents and a set of houses. Each agent has a positive weight indicating his priority, and a preference list in which a subset of houses are ranked in strict order. Each house has a (positive integral) capacity that indicates the maximum number of agents who could be matched to it. A matching M of agents to houses is popular if there is no other matching M' such that the total weight of the agents who prefer their allocation in M' to that in M exceeds the total weight of the agents who prefer their allocation in M to that in M'. We describe an O(\sqrt{C}n_1+m) algorithm to determine if an instance of WCHA admits a popular matching, and if so, to find a largest such matching, where C is the total capacity of the houses, n_1 is the number of agents and m is the total length of the agents' preference lists.

Professor Richard Jozsa (Department of Computer Science, University of Bristol)

Title: An introduction to quantum computation and some principal achievements

Quantum computation represents a synthesis of principles from theoretical computer science and quantum physics. This union leads to new physically realisable modes of computation and for some computational tasks (such as integer factorisation) it provides algorithms that are exponentially more efficient than any known classical algorithms.

In this talk we will describe the computational model that underlies quantum computation, introducing also the relevant aspects of quantum theory. We will discuss its novel features (relative to classical computation) and describe some illustrative applications of quantum effects in information processing tasks. Finally we'll remark upon the relation of quantum computational complexity to NP, as currently understood.

Dr Aram Harrow (Department of Computer Science, University of Bristol)

Title: Random walks and quantum search

One of the first, and most general, problems that quantum computers can solve faster than classical computers is unstructured search, for which there is a quadratic quantum speedup. However, many other search problems, such as finding triangles in a graph, or determining whether a function is 1-1, can benefit from a more sophisticated treatment. This talk will describe a useful general method for quantum search which is based on a random walk and can be applied to the above problems as well as many others.

Saverio Caminiti (Computer Science Department, University of Rome ``La Sapienza")

Title: Combinatorial Mappings between labelled trees and strings

Labelled trees are of interest in practical and theoretical areas of computer science. Due to the growing interest in developing efficient tools for manipulating XML trees brings back labelled trees to the forefront of research.

An intriguing alternative to the usual representation of tree data structures in computer memories is based on coding labelled trees by means of node label strings.

This representation was first used by Pr\"ufer in the proof of Cayley's theorem to show a one-to-one correspondence between unrooted labelled trees on n nodes and strings of length n-2. Later on, several other encodings have been introduced. We will consider two classes of encodings: the so called Pr\"ufer-like codes and the Picciotto's codes.

Pr\"ufer-like codes include those introduced by Neville, Moon, Deo, and Micikevi\v{c}ous. For codes belonging to this class we show how coding and decoding can be reduced to integer (radix) sorting. This approach easily yields to optimal linear time algorithms.

Picciotto's codes are based on works by Orlin, Knuth, Joyal, E\u{g}ecio\u{g}lu, and Remmel. Optimal algorithms for these codes are obtained through a general scheme to define bijective codes based on the transformation of a tree into a functional digraph. This general scheme also provides a theoretical instrument to understand former experimental results. Indeed, it has been observed that Picciotto's codes satisfy, much better than Pr\"ufer-like codes, two properties (locality and heritability) fundamental to successfully use labelled tree codes in genetic algorithms.

These codes can also be exploited to efficiently generate random tree with several possible constraints.

Dr David Leslie (Department of Mathematics, University of Bristol)

Title: Theory of Multi-agent Learning Algorithms

Many problems are too large to be solved globally using a centralised optimisation algorithm. Other problems must be solved in a distributed manner due to constraints of the problem. In such circumstances a multi-agent approach can be taken, in which a collection of independent optimisers try to simultaneously optimise some aspect of the problem, hopefully resulting in globally desirable behaviour. I will introduce the ALADDIN project, which studies these methods of distributed and decentralised control.

My current approach to the problem stylises the multi-agent learning problem as a repeated normal form game. Approaches to multi-agent learning can then be studied theoretically in this much simpler framework. I will present variations of standard reinforcement learning algorithms for this simple environment, showing how to study them theoretically. This allows us to characterise environments in which coordination is likely, and those in which it is not.

David Meredith (Department of Computer Science, Goldsmiths College)

Title: Point-set algorithms for pattern discovery and pattern matching in music

An algorithm that discovers the themes, motives and other perceptually significant repeated patterns in a musical work can be used, for example, in a music information retrieval system for indexing a collection of music documents so that it can be searched more rapidly. It can also be used in software tools for music analysis and composition and in a music transcription system or model of music cognition for discovering grouping structure, metrical structure and voice-leading structure. In most approaches to pattern discovery in music, the data is assumed to be in the form of strings. However, string-based methods become inefficient when one is interested in finding highly embellished occurrences of a query pattern or searching for polyphonic patterns in polyphonic music. These limitations can be avoided by representing the music as a set of points in a multidimensional Euclidean space. This point-set pattern matching approach allows the maximal repeated patterns in a passage of polyphonic music to be discovered in quadratic time and all occurrences of these patterns to be found in cubic time. More recently, Clifford et al. (2006) have shown that the best match for a query point set within a text point set of size n can be found in O(n log n) time by incorporating randomised projection, uniform hashing and FFT into the point-set pattern matching approach. Also, by using appropriate heuristics for selecting compact maximal repeated patterns with many non-overlapping occurrences, the point-set pattern discovery algorithms described here can be adapted for data compression. Moreover, the efficient encodings generated when this compression algorithm is run on music data seem to resemble the motivic-thematic analyses produced by human experts.

Dr. Colin Campbell (Department of Engineering Mathematics, University of Bristol)

Title: Probabilistic Approaches to Unsupervised and Semi-Supervised Learning

We outline several novel approaches to unsupervised and semi-supervised learning. These methods are based on probabilistic graphical models and they maximise the probability of a model given the data. For semi-supervised learning some sample labels are known, whereas other samples are unlabelled. The main application has been cancer informatics where these algorithms have delineated putative cancer subtypes which exhibit very distinct clinical outcomes. Apart from assigning patients to different subtypes, these algorithms can be used to highlight genes with distinct differential expression. Over-expressing genes within putative subtypes have been used as targets for expression knockdown via short inferring RNAs (siRNA) by collaborating groups at the Institute of Cancer Research, London. For two cancer subtypes we have been able to demonstrate selective loss of viability of cancer cells relative to control using targets found using these probabilistic graphical models.

Professor Iain Stewart (Department of Computer Science, University of Durham)

Title: Bounded pathwidth duality and uniform constraint satisfaction problems

We extend some recent results of Dalmau and we establish three new conditions on constraint satisfaction problems which are equivalent to the problem having bounded pathwidth duality. Our conditions relate to program schemes, games and finite automata. Using our condition relating to finite automata, we give an alternative, direct proof that all problems of bounded pathwidth duality are in NL and we use our proof technique to extend this result to show that the uniform constraint satisfaction problem CSP(all,bpd) is in NL, where "all" is the class of all finite structures and "bpd" is the class of finite structures C for which the problem CSP(C) has bounded pathwidth duality.

Dr Alexander Tiskin (Department of Computer Science, University of Warwick)

Title: Faster subsequence recognition in compressed strings

One of the most general and powerful string compression methods is compression by straight-line programs (SLP). Cegielski et al. considered subsequence recognition problems on SLP-compressed strings. For an SLP-compressed text of length m, and an uncompressed pattern of length n, they gave several algorithms for global and local subsequence recognition, running in time O(mn^2 log n). We improve on these results as follows. First, we mention a simple folklore algorithm for global subsequence recognition on an SLP-compressed text, running in time O(mn).For the more general semi-compressed LCS problem, we propose a new algorithm, running in time O(mn^{1.5}). We then extend this algorithm to several versions of local subsequence recognition, all running in the same asymptotic time O(mn^{1.5}).

Dr Kostas Tsichlas (University of Patras, Greece)

Title: Turning Amortized to Worst-Case Data Structures: Techniques and Results

Many data structures have been designed having amortized complexity in mind. This means that a series of operations on this structure will result in a small mean cost but no guarantees are given for the worst-case cost of each operation. Turning these structures to provide guarantees for the atomic cost of each operation is quite intriguing and hard. There are many examples where these efforts have failed in the sense that there have been proved lower bounds (for example the union-find problem). However, there are many success stories as well. The aim of the talk is to present such techniques as well as some new results on search trees.

Dr Prudence Wong (Department of Computer Science, University of Liverpool)

Title: Energy Efficient Online Deadline Scheduling

We extend the study of online algorithms for energy-efficient deadline scheduling to the overloaded setting. Specifically, we consider a processor that can vary its speed between 0 and a maximum speed T to minimize its energy usage (of which the rate is roughly a cubic function of the speed). As the speed is upper bounded, the system may be overloaded with jobs and no scheduling algorithms can meet the deadlines of all jobs. An optimal schedule is expected to maximize the throughput, and furthermore, its energy usage should be the smallest among all schedules that achieve the maximum throughput. In designing a scheduling algorithm, one has to face the dilemma of selecting more jobs and being conservative in energy usage. Even if we ignore energy usage, the best possible online algorithm is 4-competitive on throughput. On the other hand, existing work on energy-efficient scheduling focuses on minimizing the energy to complete all jobs on a processor with unbounded speed, giving several O(1)-competitive algorithms with respect to the energy usage. We present the first online algorithm for the more realistic setting where processor speed is bounded and the system may be overloaded; the algorithm is O(1)-competitive on both throughput and energy usage. If the maximum speed of the online scheduler is relaxed slightly to (1+\epsilon)T for some \epsilon > 0, we can improve the competitive ratio on throughput to arbitrarily close to one, while maintaining O(1)-competitive on energy usage.

This is a joint work with HL Chan, WT Chan, TW Lam, LK Lee and KS Mak and appears in SODA 2007.

Dr Ely Porat (Department of Computer Science, Bar-Ilan University)

Title: Approximate String Matching with Address Bit Errors

A string S \in \Sigma^m can be viewed as a set of pairs S = \lbrace (\sigma_{i},i) : i \in \lbrace 0, \ldots, m-1 \rbrace \rbrace. We consider pattern matching problems arising from the setting where errors are introduced to the location component (i), rather than the more traditional setting, where errors are introduced to the content itself (\sigma_{i}). Specifically, we consider the case where bits of i may be erroneously flipped, either in a consistent or transient manner. We formally define the corresponding approximate pattern matching problems, and provide efficient algorithms for their resolution.