<< 2012-3 >>
Department of
Computer Science
 

UNGRADE: UNsupervised GRAph DEcomposition

Bruno Golenia, Sebastian Spiegler, Peter Flach, UNGRADE: UNsupervised GRAph DEcomposition. Working Notes for the CLEF 2009 Workshop, Corfu, Greece. September 2009. PDF, 71 Kbytes. External information

Abstract

This article presents an unsupervised algorithm for word decomposition called UNGRADE (UNsupervised GRAph DEcomposition) to segment any word list of any language. UNGRADE assumes that each word follows the structure prefixes, a stem and suffixes without giving a limit on the number of prefixes and suffixes. The UNGRADE’s algorithm works in three steps and is language independent. Firstly, a pseudo stem is found for each word using a window based on Minimum Description Length. Secondly, prefix sequences and suffix sequences are found independently using a graph algorithm called graph-based unsupervised sequence segmentation. Finally, the morphemes from previous steps are joined to provide a segmented word list. We focus purely on the segmentation of words, thus, we employ a trivial method for labeling each morpheme which is the segment of the morpheme itself. UNGRADE is applied to 5 languages (English, German, Finnish, Turkish and Arabic) and results are provided according to their gold standard.

Bibtex entry.

Contact details

Publication Admin

© 1995-2013 University of Bristol  |  Terms and Conditions  |  Use of Cookies
About this Page