Promodes: A probabilistic generative model for word decompositionsSebastian Spiegler, Bruno Golenia, Peter Flach, Promodes: A probabilistic generative model for word decompositions. Working Notes for the CLEF 2009 Workshop, Corfu, Greece. September 2009. PDF, 399 Kbytes. External information
For the Morpho Challenge 2009 we present an algorithm for unsupervised morpho- logical analysis called Promodes1 which is based on a probabilistic generative model. The model considers segment boundaries as hidden variables and includes probabilities for letter transitions within segments. Promodes purely concentrates on segmenting words whereas its labeling method is simplistic. Morpheme labels are the segments themselves. The algorithm can be employed in different degrees of supervision. For the challenge, however, we demonstrate three unsupervised versions. The ﬁrst one uses a simple segmenting algorithm on a small subset of the data which is based on letter succession probabilities in substrings and then estimates the model parameters using a maximum likelihood approach. The second version estimates its parameters through expectation maximization. Independently of the parameter estimation, we utilized each model to decompose words from the original language data. A third method is a committee of unsupervised learners where each learner corresponds to the second version, however, with different initializations of the expectation maximization. The solution is then found by ma jority vote which decides whether to segment in a word position or not. In this paper, we describe the details of the probabilistic model, how parameters are estimated and how the most likely decomposition of an input word is found. We have tested Promodes on Arabic (vowelized and non-vowelized), English, Finnish, German and Turkish. All three methods achieved competitive results in the Morpho Challenge 2009.