What does it do?
This service recognizes entities and relations from biomedical texts and and returns documents in Semantic Web formats (OWL, RDFa). You can use Semantic Web software to easily query, integrate and visualize the relevant information,
In the current version, you can submit a PubMed identifier (PMID) or a PubMed text query. Entities and relations between entities in matching PubMed abstracts are extracted by the Whatizit text processing system hosted by the European Bioinformatics Institute (EBI). The recognition of protein interactions is based on 'tri-co-occurrence', where two protein/gene names are found in conjunction with a verb in a sentence; as well as more refined natural language processings algorithms. The results are presented as HTML with embedded RDF (RDFa), a Semantic Web format which can be unterstood by Semantic Web software.
You can visualize the embedded RDFa by clicking on highlight RDF/OWL in the upper right corner of the result screen. Embedded RDFa will be highlighted in red; you can see the machine-readable statements when you move your mouse over a highlighted portion of the document. To remove highlights, reload the page.
The resulting RDFa makes use of several established Semantic Web resources, e.g., the Gene Ontology, Chemical Entities of Biological Interest (Chebi), Uniprot RDF and the Basic Formal Ontology (BFO).
Future plans
The current version is a prototype, further extensions are planned:
- Add output in RDF/XML and TURTLE syntax
- Enable free-text submissions
- Improvements of ontological representation
- Resolvable purl.org URIs
Current developers
Matthias Samwald
Alan Ruttenberg
This project is an open-source development, if you want to join or use code, please contact us.
Contact
Matthias Samwald: samwald (at) gmx.at
Alan Ruttenberg: alanruttenberg (at) gmail.com
Appendix: Metadata representation
Protein interactions
Protein interactions are described as processes that have proteins as their participants. The type of the process (e.g., binding process, acetylation process) are determined by the verb recognized the Whatizit text processing service. The verbs recognized by Whatizit are listed in [ebimed poster]. Where possible, processes defined in the OWL version of the Gene Ontology were used. The mappings are described below.
The following verbs have a close match to processes described in the Gene Ontology.
Verb |
Gene Ontology process ID |
dissociate |
0032984 |
assemble |
0065003 |
complex |
0065003 |
regulate, |
0065007 |
inhibit |
0048519 |
acetylate |
0006473 |
acylate |
0043543 |
amidate |
0001519 |
brominate |
0018073 |
biotinylate |
0009305 |
carboxylate |
0018214 |
farnesylate |
0018343 |
formylate |
0018256 |
hydrox[iy]late |
0018126 |
methylate |
0032259 |
myristo?ylate |
0018377 |
palmito?ylate |
0018345 |
phosphorylate |
0016310 |
nitrosylate |
0017014 |
sumoylate |
0016925 |
ubiquitin(yl)?ate |
0016567 |
The following verbs could not be matched with a process in GO, but could aligned to a process derived from associated functions defined in the GO Function ontology. These processes are defined in the OBO Interactomics ontology, and are linked to the original GO Functions through a ‘realization of' relation (defined in the Relation Ontology [RO].
Verb |
Associated Gene Ontology Function ID |
dimerize |
0046983 |
bind |
0005488 |
The following verbs did not match any process or function in the Gene Ontology. They are represented as generic ‘processual entities' (defined in BFO [BFO]).
Verb |
contact |
couple |
link |
interact |
precipitate |
cysteinylate |
pyruvate |
Reference
[BFO] http://www.ifomis.uni-saarland.de/bfo/
[RO] http://obofoundry.org/ro/
[ebimed poster] Arregui, Gaudan, Kirsch, Rebholz-Schuhmann. EBIMed and Protein Corral: EBI's information retrieval and information extraction engines for realtime analysis of Medline abstracts http://ismb2006.cbi.cnptia.embrapa.br/poster_abstract.php?id=K-22
[whatizit paper] Kirsch, Gaudan, Rebholz-Schuhmann. Distributed modules for text annotation and IE applied to the biomedical domain
