Soft Discretization to Enhance the Continuous Decision Tree InductionY. Peng, P. Flach, Soft Discretization to Enhance the Continuous Decision Tree Induction. Integrating Aspects of Data Mining, Decision Support and Meta-Learning. Christophe Giraud-Carrier, Nada Lavrac, Steve Moyle, (eds.), pp. 109–118. September 2001. No electronic version available. External information
Decision tree induction has been widely used to generate classifiers from training data through a process of recursively splitting the data space. In the case of training on continuous-valued data, the associated attributes must be discretized in advance or during the learning process. The commonly used method is to partition the attribute range into two or several intervals using a single or a set of cut points. One inherent disadvantage in these methods is that the use of sharp (crisp) cut points makes the induced decision trees sensitive to noise. To overcome this problem this paper presents an alternative method, called soft discretization, based on fuzzy set theory. As opposed to a classical decision tree, which gives only one class as the end result, the soft discretization based decision tree associates a set of possibilities to several or all classes for an unknown object. As a result, even if uncertainties existed in the object, the decision tree would not give a completely wrong result, but a set of possibility values. This approach has been successfully applied to an industrial problem to monitor a typical machining process. Experimental results showed that, by using soft discretization, better classification accuracy has been obtained in both training and testing than classical decision tree, which suggest that the robustness of decision trees could be improved by means of soft discretization.