
[ ILPnet2 | Library | Newsletter | CSCW | Education | End-User Club | Events | Nodes | Systems | Applications | Members only ]
Filtering noisy instances and outliers
D. Gamberger
and N. Lavrac.
In H. Liu
and H. Motoda, editors, Instance Selection and Construction for
Data Mining, pages 375--394. Kluwer Academic Publishers,
Boston/Dordrecht/London, February 2001.
Abstract
Instance selection methods are aimed at finding a representative data subset
that can replace the original dataset but still provide enough information to
solve a given data mining task. If instance selection is done by sampling,
the sample should preferably exclude noisy instances and outliers. This
chapter presents methods for noise and outlier detection that can be
incorporated into sampling as filters for data cleaning. The chapter presents
the following filtering algorithms: a saturation filter, a classification
filter, a combined classification-saturation filter, and a consensus
saturation filter. The distinguishing feature of the novel consensus
saturation filter is its high reliability which is due to the multiple
detection of outliers and/or noisy instances. Medical evaluation in the
problem of coronary artery disease diagnosis shows that the detected
instances are indeed noisy or non-typical class representatives.
BibTeX entry.
Other publications
D Gamberger,
gamber@faust.irb.hr,
N Lavrac,
Nada.Lavrac@ijs.si. Last modified on Wednesday 9 April 2003 at 18:31. © 2003 ILPnet2