PharmaDM - Software for Biotech and Pharma Research

spacer Relational DM


Relational Data Mining

1. Usage of the original, natural data as input

Conventional data mining systems are not able to fully exploit highly complex biological, chemical and clinical data. Any information fed into these systems has to be reduced to a list of descriptors (one vector per example approach) that can be stored as a single row in a table. This reduction step introduces a severe bottleneck in knowledge discovery:

This approach is problematic, especially if:

Deprived of relational information, conventional data mining techniques may miss key insights. PharmaDM avoids this harmful reduction step by tapping its relational data mining technology directly into the original database. Complex descriptors are generated automatically.

2. Extraction of comprehensible knowledge from the data

The patterns found in classical data mining are typically less comprehensible and not readily applicable, like a mathematical equation expressing a structure-activity relationship:
e.g. Activity = 1.7 + 2.2 x (Hydrophobicity) + 1.8 x (Has_3_Fused_Rings)

Relational data mining has an unrivalled expressivity so that patterns that it finds can emerge as very complex English logical rules, e.g.

    IF the molecule contains a benzene ring 
    with two substituents in meta position, 
    one of which is a methyl group
    THEN the molecule is active
    OR ELSE the molecule is inactive.

Relationships between data (such as the position of substructures) can be easily expressed and, if applicable, represented graphically. The rules are comprehensible for scientists and can be readily applied in further research. Relational data mining no longer requires biochemists/biologists to master 'computer languages'.

3. Incremental knowledge enhancement through
    incorporation of background information

Because of the ability to mine complex relational databases directly, background knowledge can easily be incorporated in the simple form of extra tables in the database.

This means that first order logic can allow for the inclusion of researcher's expertise, previously occurring patterns and knowledge from the public domain in the data mining process. The output resulting from the previous round of mining therefore becomes the input for the next round.

The knowledge base is thus constantly and incrementally enhanced.

» Browse the library for a selection of relational datamining papers.


©2000-2007 PharmaDM. Terms of use / Privacy.