Logo[ ILPnet2 | Library | Newsletter | CSCW | Education | End-User Club | Events | Nodes | Systems | Applications | Members only ]

Filtering noisy instances and outliers

D. Gamberger and N. Lavrac. In H. Liu and H. Motoda, editors, Instance Selection and Construction for Data Mining, pages 375--394. Kluwer Academic Publishers, Boston/Dordrecht/London, February 2001.

Abstract

Instance selection methods are aimed at finding a representative data subset that can replace the original dataset but still provide enough information to solve a given data mining task. If instance selection is done by sampling, the sample should preferably exclude noisy instances and outliers. This chapter presents methods for noise and outlier detection that can be incorporated into sampling as filters for data cleaning. The chapter presents the following filtering algorithms: a saturation filter, a classification filter, a combined classification-saturation filter, and a consensus saturation filter. The distinguishing feature of the novel consensus saturation filter is its high reliability which is due to the multiple detection of outliers and/or noisy instances. Medical evaluation in the problem of coronary artery disease diagnosis shows that the detected instances are indeed noisy or non-typical class representatives.

BibTeX entry.

Other publications


D Gamberger, gamber@faust.irb.hr,
N Lavrac, Nada.Lavrac@ijs.si. Last modified on Wednesday 9 April 2003 at 18:31. © 2003 ILPnet2