BioMinT use cases
|
Version |
1.4.1 |
|
Date |
18/07/03 |
|
Authors |
Andre Vandecandelaere, Kristof Van Belleghem |
Overview:
Actors:
SwissProt annotator (primary), PRINTS annotator (primary), researcher (primary), information source (secondary), SwissProt database (secondary), PRINTS database (secondary), model builder (secondary), administrator (secondary)
Diagram:

High-level descriptions:
(I) Start up:
|
Use case: |
Start Up |
|
Actors: |
SwissProt annotator, Prints annotator, researcher, model builder (trainer) |
|
Purpose: |
Start up the BioMinT text mining tool taking account of the type of user. |
|
Overview: |
A user starts the BioMinT program and identifies himself/herself as an administrator, an annotator, a researcher or a model builder. If administrator, the user administration facilities are activated. If annotator, then the facilities for protein annotation and gathering of protein information are activated. If researcher, then only protein information gathering functions are activated. If model builder, then the facilities for building a model (training a component) are activated. |
|
Type: |
primary |
|
Cross References: |
Functions:
Use cases: |
|
Typical course of events |
|
|
actor |
system response |
|
1. This use case begins when a user (an administrator, an annotator, a researcher or a model builder) wants to use the BioMinT tool (i.e. to modify user rights, to annotate a protein or protein family, to gather information about proteins from text documents, or to build and/or incorporate a new model). |
|
|
2. The user opens the BioMinT application and enters his user details. |
3. The system checks the user details. If the login details are correct, the system grants the corresponding user rights and presents a menu of allowed tasks for this user (i.e. a subset of protein annotation, protein fingerprint annotation, research assistance, model incorporation, user administration). |
|
4. The user selects a task. |
5. The system evaluates the selected task and starts the corresponding use case. |
(II) Protein annotation (SwissProt):
|
Use case: |
Annotate protein |
|
Actors: |
SwissProt annotator, Information source |
|
Purpose: |
Annotate a specified protein for entry in the SwissProt database. |
|
Overview: |
A SwissProt annotator specifies a protein of interest and annotates the protein. A SwissProt record for the protein is created containing information in the following fields: description (DE), gene name (GN), organism species (OS), organel of origin (OG), tokens for plasmid, species, strain, tissue or transposon (RC), comments (CC), keywords (KW) and sequence features (FT). The annotation process is interactive and makes use of information from external resources. |
|
Type: |
primary |
|
Cross References: |
Functions:
Use cases: SwissProt annotator must have completed the “Start Up” use case. uses “Retrieve external information” |
|
Typical course of events |
|
|
actor |
system response |
|
1.This use case begins when the SwissProt annotator has chosen to annotate a SwissProt entry. |
2. The system prompts the user to enter a new protein name, a gene name, or a TrEMBL accession number. |
|
3. The SwissProt annotator enters a new protein name, a gene name, or a TrEMBL accession number. |
4. The system retrieves the corresponding entry from SwissProt (ExPASy) or TrEMBL. If an entry is found, the system sends requests to gather additional information (synonyms, corresponding gene or protein) to various external sources (e.g. HUGO). Otherwise, the user is informed that no entry was found, and the use case ends here. |
|
5. For each external information request, the relevant external information source provides the requested information. |
6. The system produces a list of additional names and query modifiers (function, organism, mutations, disease, PTMs, splice variants, polymorphisms, ...) and presents them to the user in a menu. |
|
7. The SwissProt annotator selects an information source (typically PubMed, but possibly also other document repositories or information sources) and the elements (names, sequences, ...), modifiers and filters (period, maximum number, ... ) that need to be used to compose the query. |
8. Using the selected elements and modifiers, the system composes a query for the chosen information source and sends it to the source. |
|
9. The selected external information source provides the requested information. |
10. The system receives the information and presents it to the SwissProt annotator. If articles are returned by a document repository (PubMed, ...), they are sorted according to some relevance measure (content, journal type, research lab, ...), and a list of selected articles is presented to the SwissProt annotator. |
|
11. If the annotator is not satisfied with the set of returned documents, he/she returns to step 7 and adds a new query or refines the previous one. Otherwise the use case proceeds with step 12. |
|
|
12. If the returned information consists of free text (text fields, list of articles), the SwissProt annotator selects the text for information extraction and the type of information that needs to be extracted. |
13. The indicated information is extracted from the selected text in the format required by the information type. For each type of information, the system shows the extracted information together with its source. |
|
14. The SwissProt annotator validates the extracted information. If multiple pieces of information are available the SwissProt annotator enters a synthesis of that information. |
15. The system records the validated information in the appropriate fields of the Swiss-Prot entry. |
|
16. If the information obtained thus far is not sufficient to complete the novel protein entry, the SwissProt annotator repeats the cycle from step 3 downwards. If all the required information has been gathered , the SwissProt annotator validates the new protein entry. |
17. The system stores the new protein entry. If the entry was stored successfully, the system produces a success message. If not, the system produces an error message. |
(III) Protein fingerprint annotation (PRINTS):
|
Use case: |
Annotate protein fingerprint |
|
Actors: |
PRINTS annotator, Information source |
|
Purpose: |
Annotate a specified protein for entry in the PRINTS protein fingerprint database. |
|
Overview: |
A PRINTS annotator selects a naked fingerprint for a protein family, super-family or protein domain of interest and annotates that family, super-family or domain. An annotated fingerprint is stored in the local file system. The annnotation process is interactive and makes use of information from external resources. |
|
Type: |
primary |
|
Cross References: |
Functions:
Use cases: Prints annotator must have completed the “Start Up” use case. includes the Annotate Family, Annotate Super-family and Annotate Domain use cases. |
|
Typical course of events |
|
|
actor |
system response |
|
1. This use case starts when the PRINTS annotator has selected to annotate a PRINTS protein fingerprint. |
2. The system prompts the PRINTS annotator to enter a reference to a naked fingerprint (ffmt file name). |
|
3. The PRINTS annotator enters the name of a (naked) fingerprint. |
4. The system requests the specified naked fingerprint from the local file system
|
|
5. The file system provides the requested (naked) fingerprint. |
6. Using the SwissProt references in the naked fingerprint, the system sends requests for the corresponding entries from SwissProt. |
|
7. For each request, SwissProt returns the appropriate protein record. |
8. The system analyses the received protein records and retrieves referenced articles from the appropriate information sources. |
|
9. Each information source sends the requested articles. |
10. The system analyses the protein records and articles. While it has insufficient information to propose a fingerprint type, but sufficient information to send new queries to some information sources, it does so. In that case, return to step 9, otherwise proceed. |
|
|
11. If the system has sufficient information to make a reasonable proposal, it proposes a fingerprint type (family, super-family, domain), together with a motivation of the proposal. Otherwise, a default type (family) is proposed, with the motivation "default". |
|
12. The PRINTS annotator validates the proposed fingerprint type or specifies the type: a. If 'family' fingerprint, initiate 'Annotate family'. b. If 'super-family' fingerprint, initiate 'Annotate super-family'. c. If 'domain' fingerprint, initiate 'Annotate domain'. |
13. After completion of the relevant use case, the system proposes the annotation report. |
|
14. If the proposed annotation report is incomplete, the PRINTS annotator edits the report. The following additional requests for selected information may be sent: a. retrieval queries to external information sources, b. extraction queries for selected sets of text, c. instructions to fill in particular information in specified fields. |
15. For each additional request, the system retrieves or extracts the requested information, or records the specified information. A modified annotation report is proposed. |
|
16. The PRINTS annotator validates the proposed annotation report. |
17. The system stores the validated annotation (in the extended format with all available information). |
|
|
|
|
Alternative course of events |
|
|
3. When presenting the name of a fingerprint, the user can specify that the tool should run automatically. In that case, all interaction with the user for the remainder of the use case is suppressed, and an "accept" answer is assumed for all queries |
|
|
Use case: |
Annotate family |
|
Actors: |
PRINTS annotator, Information source, PRINTS database |
|
Purpose: |
Annotate a specified protein family. |
|
Overview: |
|
|
Type: |
primary |
|
Cross References: |
Functions:
Use cases: initiated from 'Annotate protein fingerprint'. |
|
Typical course of events |
|
|
actor |
system response |
|
1. This use case starts when the PRINTS annotator has chosen to proceed with protein family annotation. The naked fingerprint and the SwissProt protein entries that have been retrieved earlier are used. |
2. The system identifies suitable external information sources for establishing an identifier (gc), a title (gt) and database links (gp). For each type of information, the system sends a request to the appropriate source (SwissProt, PubMed, etc). |
|
3. For each request, the external information source returns the requested information. |
4. The retrieved information is presented to the annotator together with proposals for an identifier (gc), a title (gt) and database links (gp). |
|
5. The annotator validates the proposals and/or specifies required information.
|
6. The system creates an identifier (gc), a title (gt) and database links (gp). |
|
|
7. The system checks the PRINTS database for a superfamily of this family, using an existing hierarchical sequence analyser. |
|
8. The PRINTS database returns candidate super-family names of the family, if any. |
9. If necessary, the system retrieves relevant SwissProt entries or literature articles from the corresponding information sources to help in determining the right superfamily. |
|
10. The addressed information sources return the requested entries. |
11. The system proposes a superfamily name if it can do so, and prompts the annotator to accept it or propose a name himself. |
|
12. The PRINTS annotator proposes or accepts a valid superfamily name. |
13. The system queries the PRINTS database for an entry with the given superfamily name. |
|
14. The PRINTS database returns the requested superfamily entry, if any. |
15. If there is a PRINTS super-family entry for this family, then annotation of the super-family is inherited (if considered sufficiently recent and complete), and the system proceeds with step 20, else the system requests relevant documents on the given superfamily from appropriate information sources (by default PubMed and OMIM, probably also InterPro). |
|
16. For each query, the addressed information source returns the requested documents. |
17. The system presents the retrieved documents to the annotator, ranked according to estimated relevance. |
|
18. If new documents are retrieved, the annotator selects the documents to be used for information extraction, or chooses to launch new queries (in this case, proceed from the "else" case in step 15) |
19. New super-family annotation (high-level function, structural information, disease, ...) is constructed from the selected documents and the available Swiss-Prot entries. Links to the used documents are established. (Optionally, the super-family information can be stored separately for inclusion in the corresponding superfamily fingerprint) |
|
|
20. Collection of new family information (tissue expression, history, cytological location, species distribution, ...) is prepared by querying appropriate external document repositories (PubMed, OMIM, Interpro). |
|
21. For each query, the addressed information source returns the requested documents. |
22. The system presents the retrieved documents to the annotator, ranked according to estimated relevance. |
|
23. If new documents are retrieved, the annotator selects the documents for information extraction. |
24. New family annotation (tissue expression, history, cytological location, species distribution, ...) is constructed from the received documents. Links to the documents are established. The constructed information is presented to the annotator. |
|
25. The annotator validates the proposed family information. |
26. The system records the validated information. |
|
|
27. The system proposes a (incomplete) technical paragraph. |
|
28. The PRINTS annotator completes the technical paragraph and submits it. |
29. The system inserts the completed technical paragraph in the entry. |
|
Use case: |
Annotate super-family |
|
Actors: |
PRINTS annotator, Information source, PRINTS database |
|
Purpose: |
Annotate a specified protein super-family. |
|
Overview: |
|
|
Type: |
primary |
|
Cross References: |
Functions:
Use cases: initiated from 'Annotate protein fingerprint'. |
|
Typical course of events |
|
|
actor |
system response |
|
1. This use case starts when the PRINTS annotator has chosen to proceed with protein super-family annotation. The naked fingerprint and the SwissProt protein entries that have been retrieved earlier are used. |
2. The system identifies suitable external information sources for establishing an identifier (gc), a title (gt) and database links (gp). For each type of information, the system sends a request to the appropriate source (SwissProt, PubMed, etc). |
|
3. For each request, the external information source returns the requested information. |
4. The retrieved information is presented to the annotator together with proposals for an identifier (gc), a title (gt) and database links (gp). |
|
5. The annotator validates the proposals and/or specifies required information. |
6. The system creates an identifier (gc), a title (gt) and database links (gp).
|
|
|
7. New super-family information (broad structure overview, high-level function, homology, disease, ...) is collected by sending appropriate queries to external document repositories (PubMed, OMIM, InterPro). |
|
8. For each query, the addressed information source returns the requested documents. |
9. The system presents the retrieved documents to the annotator, ranked according to estimated relevance. |
|
10. The annotator selects the documents that need to be used for information extraction, or chooses to launch new queries (in this case, return to step 7). |
11. New super-family annotation is constructed from the received documents and the available SwissProt entries. Links to the documents are established. The constructed information is presented to the annotator. |
|
12. The annotator validates the proposed super-family information. |
13. The system records the validated information. |
|
|
14. The system proposes a (incomplete) technical paragraph. |
|
15. The annotator completes the technical paragraph and submits it. |
16. The system inserts the completed technical paragraph. |
|
Use case: |
Annotate domain |
|
Actors: |
PRINTS annotator, Information source, PRINTS database |
|
Purpose: |
To annotate a specified protein domain. |
|
Overview: |
|
|
Type: |
primary |
|
Cross References: |
Functions:
Use cases: initiated from 'Annotate protein fingerprint'. |
|
Typical course of events |
|
|
actor |
system response |
|
1. This use case starts when the PRINTS annotator has chosen to proceed with protein domain annotation. |
2. The system identifies suitable external information sources for establishing an identifier (gc), a title (gt) and database links (gp). For each type of information, the system sends a request to the appropriate source (SwissProt, PubMed). |
|
3. For each request, the external information source returns the requested information. |
4. The retrieved information is presented to the annotator together with proposals for an identifier (gc), a title (gt) and database links (gp). |
|
5. The annotator validates the proposals and/or specifies required information. |
6. The system creates an identifier (gc), a title (gt) and database links (gp). |
|
|
7. New domain information (domain structure, domain function, protein range, commonly associated domains, physicochemical properties, ...) is collected by querying external document repositories (PubMed, OMIM, InterPro). |
|
8. For each query, the addressed information source returns the requested documents. |
9. The system presents the retrieved documents to the annotator, ranked according to estimated relevance. |
|
10. The annotator selects the documents that need to be used for information extraction, or chooses to launch new queries (in this case, return to step 7) |
11. New domain annotation is constructed from the received documents. Links to the documents are established. The constructed information is presented to the annotator. |
|
12. The annotator validates the proposed domain information. |
13. The system records the validated information. |
|
|
14. The system proposes a (incomplete) technical paragraph. |
|
15. The annotator completes the technical paragraph and submits it. |
16. The system inserts the completed technical paragraph. |
(IV) Gather information from literature (Biological researcher):
|
Use case: |
Gather information from literature |
|
Actors: |
Life science researcher |
|
Purpose: |
To allow life scientists to query free-style text repositories in search of integrated information on a specified subject. |
|
Overview: |
A life science researcher specifies a subject of interest. The system queries one or more free-style text repositories and extracts relational information from the relevant documents. The results are presented, possibly integated with informaton from SwissProt and PRINTS. |
|
Type: |
primary |
|
Cross References: |
Functions:
Use cases: the Researcher must have completed “Start Up”. |
|
Typical course of events |
|
|
actor |
system response |
|
1.This use case begins when the Researcher has selected to gather information about a subject of interest. |
2. The system prompts the user to specify a particular retrieval and/or extraction request, and one or more document repositories to be consulted. |
|
3. The researcher specifies a particular retrieval and/or extraction request; if relevant, an external information source, and one or more preferred display modes of results (e.g. new ranking of documents according to relevance (default), summary of extracted relations, ...). |
4. If no retrieval request was specified, the system builds one based on the extraction request. The system turns the retrieval request into a query to the appropriate document repository and sends it to that repository. |
|
5. The document repository returns the requested documents. |
6. The system receives the returned documents. |
|
|
7. If an extraction request is specified, the system extracts the requested information from the retrieved set of documents, and displays the results in the preferred mode(s). If no request is specified, the alternative course 'Specify Extraction Request' is taken. |
|
8. If he/she wishes to, the researcher can do datamining (clustering, autocategorisation, relational datamining, ...) on the retrieval and extraction results, possibly in combination with earlier results or other data stored in a repository. |
9. The system performs the specified datamining task. If stored data or results need to be incorporated in the analysis, the system first retrieves these data/results from the relevant repository. |
|
10. If desired, the researcher can instruct the system manage the retrieved documents by means of an available document management tool. |
11. The sytem launches a specified document management tool and sends the retrieved documents to it. A task completion message is produced. |
|
12. If desired, the researcher can instruct the system to store the results of extraction in a specified repository. |
13. The system sends the extraction results to the specified repository for storage. A task completion message is produced. |
|
14. If desired, the researcher can instruct the system to store the results of datamining in a specified repository. |
15. The system sends the datamining results to the specified repository for storage. A task completion message is produced. |
|
16. The researcher can either repeat from step 1 downwards or stop this use case. |
|
|
|
|
|
Alternative course of events |
|
|
|
|
|
|
7.1. The system displays the retrieved documents in the order as obtained from the document repository. |
|
7.2. The researcher specifies an extraction request or leaves the alternative course by not specifying anything. |
|
|
|
7.3. If an extraction request is specified, the system extracts the requested information from the retrieved set of documents, and displays the results in the preferred mode(s). |
(V) Retrieve external information (Website, external database, ...):
|
Use case: |
Retrieve external information |
|
Actors: |
Information source (PubMed, HUGO, ...) |
|
Purpose: |
Provide the system with the information that is available in external information sources. |
|
Overview: |
|
|
Type: |
primary |
|
Cross References: |
Functions:
Use cases: Used by the “Annotate protein” ,“Label protein family” and “Retrieve protein information” use cases. |
|
Typical course of events |
|
|
actor |
system response |
|
|
1. This use case starts when information is required from a source that does not reside within the system. The system composes a request for an appropriate external information source and sends it to that source. |
|
2. The information source processes the request and returns the corresponding information. |
|
|
|
3. The system receives the returned information. |
(VI) Incorporate new model:
|
Use case: |
Incorporate new model |
|
Actors: |
Model Builder |
|
Purpose: |
Build (i.e. present new training data, choose a learning algorithm, train and evaluate) and/or incorporate a model. |
|
Overview: |
Given a collection of labeled text documents, the Model builder feeds the training data to the system for it to learn to retrieve and extract required information from new texts. Alternatively, if a model has been built off-line (e.g. a logic program, the parameters of a neural network, ...), it is incorporated in the system. |
|
Type: |
secondary |
|
Cross References: |
Functions:
Use cases: the Model Builder must have completed “Start Up”. |
|
Typical course of events |
|
|
actor |
system response |
|
1. This use case begins when the model builder has selected to incorporate a new model in the system. |
2. The system prompts the user to enter the component for which a model is to be built (information retrieval, information extraction, or NLU tasks), and the learning method to be used. |
|
3. The user enters the component for which the model will be built, and the desired learning method. |
4. The system prompts the user for specifications (e.g. file names) of a training data set, and (if relevant for the learning method) a background knowledge theory. |
|
5. The user provides a training data set and a background knowledge theory. |
6. The system uses the selected learning method to build a model for the training data. If the learning method requires an external application, the system prepares the input for this application and starts it. (*) |
|
7. If an external application is used to build the model, it eventually returns a pointer to this model, or an error message. |
8. If an error has occurred during training, the system sends an error message to the user and the use case ends. Otherwise, the system ensures it has a pointer to the new model and sends a message to the user that a new model has been built. The model is stored as a new "unvalidated" model. |
|
|
9. The system presents a menu where the user can select one or more validation methods, or accept or reject the model. |
|
10. The user selects the validation methods he/she wants to apply to the new model. If necessary, the user also specifies a test data set. Alternatively, the user decides to accept or reject the model and the use case proceeds from step 14 downward. |
11. The system performs the desired validations. If necessary, it delegates this to an external application. |
|
12. If an external application does the validation, it sends a report back to the system. |
13. The system builds a validation summary and presents it to the user. The use case then iterates from step 9 downward. |
|
|
14. If the model was accepted, the system incorporates it in the appropriate module. Otherwise the model is forgotten. The system sends a "stored" or "deleted" message to the user. |
(*) If training takes a long time, especially when an external application is used, it is not necessary for the user to wait for it to terminate. We suggest to allow the user to close the BioMinT tool in that case while the application continues running. When the user selects the same parameters again later, it can be checked if the corresponding model has already been (or is being) built. If the model is ready, the use case can proceed as described above.
A similar approach may be taken for validation of a model by external applications.
(VII) Administrate Users:
|
Use case: |
Administrate Users |
|
Actors: |
Administrator |
|
Purpose: |
(un)register users and (un)grant user rights |
|
Overview: |
An administrator registers a new user and grants the appropriate rights (annotation, research, model building, administration). Alternatively, the rights of an existing user are updated, or an existing user is removed. |
|
Type: |
secondary |
|
Cross References: |
Functions:
Use cases: |
|
Typical course of events |
|
|
actor |
system response |
|
1. This use case begins when an administrator has selected to add or remove users or modify user rights. |
2. The system presents a task menu (add user, remove user, modify rights). |
|
3. The administrator selects the appropriate task. |
4. The system prompts the administrator for task-dependent parameters (e.g. user name, password, desired rights ...) |
|
5. The administrator enters the required parameters. |
6. The system updates its user information. |