Logo[ Bristol CS | Index | ML group | Peter Flach | Papers | Presentations ]

Program for the discovery of multivalued dependencies from relations

I.Savnik, P.A.Flach



Introduction

Dependencies between attributes of a database relation express the presence of structure in that relation. In particular, the existence of a multivalued dependency X->>Y in a relation r(R), where X and Y are non-overlapping subsets of R, denotes that for each possible value of attributes X, there exist no associations between the values of attributes from Y and Z=R-X-Y. As a consequence, the relation r(R) can be decomposed into relations r1(XY)=Project[XY](r) and r2(XZ)=Project[XZ](r) without loss of the information. The decomposition of r into r1 and r2 makes explicit the internal structure of relation r. Furthermore, the new representation requires less storage space than the complete relation r.


Algorithms

The discovery of multivalued dependencies from relations is viewed as a search in a hypothesis space defined in accordance with the generalisation relationship among the multivalued dependencies. Two algorithms for the discovery of multivalued dependencies from relations are defined. The top-down algorithm enumerates the hypotheses from the most general to more specific hypotheses which are checked on the input relation. The bottom-up algorithm first computes the set of invalid multivalued dependencies. Starting with the most general dependencies, the bottom-up algorithm iteratively refines the set of dependencies to conform with each particular invalid dependency.


Applications

Traditionally, database dependencies were considered to be part of the data model provided by the database designer. One reason for the discovery of multivalued dependencies from extensional data can be that the data model, or parts of it, has been lost or is no longer accurate, so that reverse engineering is required. Another reason may be that certain dependencies were not foreseen by the database designer, but do occur in practice. The discovered dependencies may be utilised for restructuring the database but also for query optimisation.

Furthermore, the discovery of multivalued dependencies (as well as functional dependencies) can assist in the process of discovery of other types of interesting patterns from relational databases by revealing the internal structure of datasets. The discovered multivalued dependencies can be utilised for planning/guiding the overall process of knowledge discovery, and for restructuring the relations.


Program MDEP

The program MDEP includes implementations of the top-down and bottom-up algorithms. The program is implemented in Sicstus Prolog. The MDEP distribution includes the source code and a short description of the program. If you have comments or suggestions related to MDEP, you can send an email to savnik@informatik.uni-freiburg.de.


Publications

  1. Discovery of multivalued dependencies from relations.
    I.Savnik, P.A.Flach.
    Intelligent Data Analysis Journal, Vol.4, No.3, 4(2000), pp.195-211.
    Also: Technical Report 135, Institut für Informatik, Universität Freiburg, Mar 2000.
    [Abstract] [Postscript]

  2. Database dependency discovery: a machine learning approach.
    P.A.Flach, I.Savnik.
    AI Communications, Vol.12, No.3, IOS Press, 1999, pp.139-160.
    [ Postscript]


Related work


P A Flach, Peter.Flach@bristol.ac.uk. Last modified on Tuesday 17 July 2001 at 18:20. © 2001 University of Bristol