PharmaDM - Software for Biotech and Pharma Research

spacer Decision Trees


A classification tree is a data mining model that permits the classification of examples, using properties of those examples.





Figure 1 Example of a decision tree. Data belonging to different classes have to be separated into pure nodes (nodes belonging to one class). This can be done by launching queries on the data (shown in blue text).

In this example, peptides have to be classified according to the MHC molecule that they bind to. The peptide properties that can be used to group the examples in the same class are peptide primary structure and amino acid properties.





Figure 2 Example of a database used for building a decision tree

Queries using the peptide properties can be launched on the examples, dividing the examples into peptides for which the query succeeds and peptides for which the query fails. The goal is to divide the peptides into "pure" groups (or nodes in the tree belonging to the same binding class).





Figure 3 Use of queries to divide examples into pure class nodes

The result of a classification tree takes the form of rules, providing information on which combination of peptide properties is needed for an example to belong to a certain class. Rules can be read from a tree by combining all queries needed to arrive into a pure node (following a branch of the tree to a leaf node). These rules can be used to predict the class of new, unclassified examples.





Figure 4 Example of a rule derived from a tree by combining queries along one branch of the tree.


©2000-2007 PharmaDM. Terms of use / Privacy.