About the Science Commons text annotation service

powered by EBI Whatizit

What does it do?

This service recognizes entities and relations from biomedical texts and and returns documents in Semantic Web formats (OWL, RDFa). You can use Semantic Web software to easily query, integrate and visualize the relevant information,

In the current version, you can submit a PubMed identifier (PMID) or a PubMed text query. Entities and relations between entities in matching PubMed abstracts are extracted by the Whatizit text processing system hosted by the European Bioinformatics Institute (EBI). The recognition of protein interactions is based on 'tri-co-occurrence', where two protein/gene names are found in conjunction with a verb in a sentence; as well as more refined natural language processings algorithms. The results are presented as HTML with embedded RDF (RDFa), a Semantic Web format which can be unterstood by Semantic Web software.

You can visualize the embedded RDFa by clicking on highlight RDF/OWL in the upper right corner of the result screen. Embedded RDFa will be highlighted in red; you can see the machine-readable statements when you move your mouse over a highlighted portion of the document. To remove highlights, reload the page.

The resulting RDFa makes use of several established Semantic Web resources, e.g., the Gene Ontology, Chemical Entities of Biological Interest (Chebi), Uniprot RDF and the Basic Formal Ontology (BFO).

Future plans

The current version is a prototype, further extensions are planned:

  • Add output in RDF/XML and TURTLE syntax
  • Enable free-text submissions
  • Improvements of ontological representation
  • Resolvable purl.org URIs


Current developers

Matthias Samwald
Alan Ruttenberg

This project is an open-source development, if you want to join or use code, please contact us.

Contact

Matthias Samwald: samwald (at) gmx.at
Alan Ruttenberg: alanruttenberg (at) gmail.com



Appendix: Metadata representation

Protein interactions

Protein interactions are described as processes that have proteins as their participants. The type of the process (e.g., binding process, acetylation process) are determined by the verb recognized the Whatizit text processing service. The verbs recognized by Whatizit are listed in [ebimed poster]. Where possible, processes defined in the OWL version of the Gene Ontology were used. The mappings are described below.

The following verbs have a close match to processes described in the Gene Ontology.

Verb

Gene Ontology process ID

dissociate

0032984

assemble

0065003

complex

0065003

regulate,

0065007

inhibit

0048519

acetylate

0006473

acylate

0043543

amidate

0001519

brominate

0018073

biotinylate

0009305

carboxylate

0018214

farnesylate

0018343

formylate

0018256

hydrox[iy]late

0018126

methylate

0032259

myristo?ylate

0018377

palmito?ylate

0018345

phosphorylate

0016310

nitrosylate

0017014

sumoylate

0016925

ubiquitin(yl)?ate

0016567

The following verbs could not be matched with a process in GO, but could aligned to a process derived from associated functions defined in the GO Function ontology. These processes are defined in the OBO Interactomics ontology, and are linked to the original GO Functions through a ‘realization of' relation (defined in the Relation Ontology [RO].

Verb

Associated Gene Ontology Function ID

dimerize

0046983

bind

0005488

The following verbs did not match any process or function in the Gene Ontology. They are represented as generic ‘processual entities' (defined in BFO [BFO]).

Verb

contact

couple

link

interact

precipitate

cysteinylate

pyruvate

Reference

[BFO] http://www.ifomis.uni-saarland.de/bfo/
[RO] http://obofoundry.org/ro/
[ebimed poster] Arregui, Gaudan, Kirsch, Rebholz-Schuhmann. EBIMed and Protein Corral: EBI's information retrieval and information extraction engines for realtime analysis of Medline abstracts http://ismb2006.cbi.cnptia.embrapa.br/poster_abstract.php?id=K-22
[whatizit paper] Kirsch, Gaudan, Rebholz-Schuhmann. Distributed modules for text annotation and IE applied to the biomedical domain