A First-order Representation for Knowledge Discovery and Bayesian Classification on Relational DataNicolas Lachiche, Peter Flach, A First-order Representation for Knowledge Discovery and Bayesian Classification on Relational Data. PKDD2000 workshop on Data Mining, Decision Support, Meta-learning and ILP : Forum for Practical Problem Presentation and Prospective Solutions. Pavel Brazdil, Alipio Jorge, (eds.), pp. 49–60. September 2000. PDF, 45 Kbytes.
In this paper we consider different representations for relational learning problems, with the aim of making ILP methods more applicable to real-world problems. In the past, ILP tended to concentrate on the term representation, with the flattened Datalog representation as a `poor man's version'. There has been relatively little emphasis on database-oriented representations, using e.g. the relational datamodel or the Entity-Relationship model. On the other hand, much of the available data is stored in multi-relational databases. Even if we don't actually interface our ILP systems with a DBMS, we need to understand the database representation sufficiently in order to convert it to an ILP representation. Such conversions and relations between different representations are the subject of this paper. We consider four different representations: the Entity-Relationship model, the relational model, a flattened individual-centred representation based on so-called ISP declarations we use for our ILP systems Tertius and 1BC, and the term-based representation. We argue that the term-based representation does not have all the flexibility and expressiveness provided by the other representations. For instance, there is no way to deal with graphs without partly flattening the data (i.e., introducing identifiers). Furthermore, there is no easy way of switching to another individual without converting the data, let alone learning with different individual types. The flattened representation has clear advantages in these respects.