NLU | Data Analysis Software for Pharmaceutical Research and Biotech

Natural Language Understanding Text mining technology

PharmaDM’s text mining method relies on natural Language Understanding technology to bridge the gap between word and concept, between syntax and semantics. This technology enables the search for relations between relevant concepts, e.g.

<Protein> <is expressed> <location>
Monocyte chemoattractant protein (MCP)-4 expression in
the airways of patients with asthma […]

<Protein/Mol/Process>actor <influences> <Protein/Mol/Process>patient
Chemokines are chemotactic cytokines that play an important
role in recruiting leukocytes in allergic inflammation […]

Unlike standard search engines that are based on a ‘bag-of-words’ approach, our solutions(*) are based on a meta-query engine that expands concept-based high level queries (e.g. interactions with G protein-coupled receptors) into particular source queries of interest (e.g. PubMed queries).

(*)The approach taken by AlphaDMax^™ can be schematically represented as follows:

Some key features are:

Retrieved documents are ranked and filtered according to semantics-related criteria
A shallow syntactic parsing of sentences is performed prior to information extraction (IE)
Domain specific knowledge is incorporated by means of an ontology for genomics and proteomics database annotation (XML)
The used technique is scalable to large volumes of text and adaptable to the proteomics application domain
The emphasis is on extracting relations of interest