Dr. Sebastian Spiegler
Research
In spring 2011, I successfully finished my Ph.D. in machine learning and natural language processing under the supervision of Professor Peter Flach at the Department of Computer Science. Until then, I was a member of the Intelligent Systems Laboratory and the Machine Learning Group at the University of Bristol, UK.My research work focused on the application of machine learning to the morphological analysis of complex agglutinating languages. Subtopics covered by my thesis are:
- A systematic framework which combines machine learning and morphological analysis. It includes the learning and deployment of a language model in a supervised, partially supervised or unsupervised learning setup.
- The description of a novel set-based evaluation metric for unsupervised learning, referred to as EMMA. It performs a hard assignment of predicted to ground truth constituents and therefore avoids certain limitations of previous approaches.
- The construction of novel algorithms for morphological analysis including the probabilistic model for word decomposition named PROMODES, as well as approaches for morpheme labelling in either a post-processing step or by performing deductive-abductive parsing. The latter approach is referred to as DEAP.
- The description of the Ukwabelana Corpus which is the first publicly available morphological corpus for the indigenous language Zulu. The corpus was developed in close collaboration with linguist Dr. Andrew van der Spuy from the University of the Witwatersrand, Johannesburg.
Previous Research
- The application of information retrieval and clustering algorithms to support knowledge management applications in an information system, called Plataforma Lattes, that is maintained by the Brazilian government. This research was carried out at the Stela Institute for knowledge engineering, Florianópolis, Brazil between 2006 and 2007.
- The development and implementation of a genome analysis algorithm for gene prediction called GFMerge. The algorithm was used during the Dictyostelium discoideum Genome Project as part of a high-throughput computing architecture. This research was carried out at the Wellcome Trust Sanger Institute in Cambridge, UK in summer 2003.
- The development and implementation of tools for post-processing and reanalysing genomic data in the Human Genome Project. This work was carried out at the Leibniz Institute for Age Research, Jena, Germany (former Institute of Molecular Biotechnology) in summer 2001.
Previous education
- Computer science, business administration and economics at Technical University Ilmenau, Germany
- Computer science and information systems at Federal University of Santa Catarina, Florianópolis, Brazil
Ph.D. thesis:
- Sebastian Spiegler, Machine Learning For The Analysis Of Morphologically Complex Languages. Ph.D. thesis. Merchant Venturers School of Engineering, University of Bristol. April 2011. PDF, 3101 Kbytes.
Selected publications:
- Sebastian Spiegler, Christian Monson, EMMA: A Novel Evaluation Metric for Morphological Analysis. Proceedings of the 23rd International Conference on Computational Linguistics (COLING). August 2010. PDF, 148 Kbytes.
- Sebastian Spiegler, Andrew van der Spuy, Peter A. Flach, Ukwabelana - An open-source morphological Zulu corpus. Proceedings of the 23rd International Conference on Computational Linguistics (COLING). August 2010. PDF, 157 Kbytes.
- Sebastian Spiegler, Peter Flach, Enhanced word decomposition by calibrating the decision threshold of probabilistic models and using a model ensemble. 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010). July 2010. PDF, 275 Kbytes.
- Bruno Golenia, Sebastian Spiegler, Peter Flach, Unsupervised Morpheme Discovery with Ungrade. Multilingual Information Access Evaluation. February 2010. PDF, 129 Kbytes.
- Sebastian Spiegler, Bruno Golenia, Peter Flach, Unsupervised Word Decomposition with the Promodes Algorithm. In Multilingual Information Access Evaluation, Lecture Notes in Computer Science. February 2010. PDF, 277 Kbytes.
- Peter Flach, Sebastian Spiegler, Bruno Golenia, Simon Price, John Guiver, Ralf Herbrich, Thore Graepel, Mohammed J. Zaki, Novel Tools To Streamline the Conference Review Process: Experiences from SIGKDD'09. SIGKDD Explorations, 11(2). ISSN 19310145, pp. 62–67. December 2009. PDF, 307 Kbytes.
- Bruno Golenia, Sebastian Spiegler, Peter Flach, UNGRADE: UNsupervised GRAph DEcomposition. Working Notes for the CLEF 2009 Workshop, Corfu, Greece. September 2009. PDF, 71 Kbytes. External information
- Sebastian Spiegler, Bruno Golenia, Peter Flach, Promodes: A probabilistic generative model for word decompositions. Working Notes for the CLEF 2009 Workshop, Corfu, Greece. September 2009. PDF, 399 Kbytes. External information
- Sebastian Spiegler, Bruno Golenia, Ksenia Shalonova, Peter Flach, Roger Tucker, Learning the morphology of Zulu with different degrees of supervision.Spoken Language Technology Workshop, 2008. SLT 2008. IEEE. ISBN 978-1-4244-3471-8, pp. 9–12. December 2008. PDF, 82 Kbytes. External information
- Sebastian Spiegler, Comparative study of clustering algorithms on textual databases - Clustering of curricula vitae into comptency-based groups to support knowledge management. VDM Verlag Dr. Mueller e.K. 2007.
Masters thesis (German Diploma)
- Sebastian Spiegler, Comparative study of clustering algorithms on textual databases. Faculty of Economic Sciences, Technical University Ilmenau, Germany. Feb. 2007. External PDF, 976 Kbytes. External information
Tech reports:
- Sebastian Spiegler, EMMA: A Novel Evaluation Metric for Morphological Analysis - Experimental Results in Detail. CSTR-10-004, University of Bristol. July 2010. PDF, 101 Kbytes.
- Sebastian Spiegler, Andrew van der Spuy, Peter A. Flach, Additional material for the Ukwabelana Zulu corpus. CSTR-10-003, University of Bristol. July 2010. PDF, 104 Kbytes.

