![]() |
|
![]() |
25 Nov 2005 - Meet PharmaDM - BioScope-IT kickoff meeting - 25 November 2005
Extracting explanations for experimental data from biological and chemical databases
dr. Luc Dehaspe, PharmaDM NV, Leuven (Heverlee), Belgium
"Scientists use observations and reasoning to propose tentative explanations for natural phenomena, termed hypotheses. Predictions from these hypotheses are tested by various experiments [...] " [1]. The need for "observations" and "experiments" has inspired the development of ever more efficient and reliable measurement technology, in particular in molecular biology and biochemistry. Expression levels of large numbers of genes can be measured simultaneously. Biological activity of entire small molecule libraries can be recorded in high throughput mode.
I will focus on the need for "tentative explanations" of experimental data, i.e. the hypothesis formulation step. Also for this step technological assistance is urgent, for two reasons. A first reason, inevitably cited in Data Mining propaganda, is that the amount of experimental data is "overwhelming". The second -largely ignored- reason however has to do with the high number of candidate hypotheses, which is in turn related to the amount of information available on the measured objects. Protein pathway information, once available, could be considered in hypotheses to explain differences in expression levels. Electron flow concepts, once understood, could contribute to an explanation of differences in biological activity observed in compound libraries. Relevant background knowledge tends to extend beyond one researcher's area of expertise, which limits the capacity of this researcher to formulate valid hypotheses.
I will present innovative Relational Data Mining technology that autonomously draws elements of background knowledge from biological and chemical databases and combines these into explanations of experimental data, i.e. hypotheses. Users of this hypothesis formulation system tap into the joint expertise of the biologists and chemists who contributed to the input databases. I will explain how this technology copes with missing background knowledge and experimental variation, and show how the generated hypotheses can be validated, visualized, and used by molecular biologists and biochemists.
[1] "Scientific method." Wikipedia: The Free Encyclopedia. 17 November 2005
22 slides in pdf format