1. Semantic Annotation of Scientific Articles DC-2009 "Semantic Interoperability of Linked Data" Sudeshna Das 1,2 & Tim Clark 1,2 sudeshna_das@harvard.edu, [email_address] 1 MIND, Massachusetts General Hospital 2 Harvard Medical School
With the advent of Web 2.0, social networking sites have become very common and esp students spend increasing amounts of time on networking sites such as Facebook, Orkut & MySpace. Even within the science & education community, researchers are discussing findings and networking within scientific social communities. This is the home page of Alzforum, one of the oldest of such communities. It has over 4000 registered Alzheimers researchers networking to find a cure for the neurological disorder Alzheimers. Alzforum became very popular and is known as CNN for AD researchers. In fact it became necessary to clone the site. Alzourm was developed 10 years ago and features were added over time making it difficult to replicate the platform.
To meet these needs we developed SCF – Science Collaboration Framework. SCF can be used to replicate Alzforum like communities. It is based on Drupal – an open source CMS. Contains Integrated Collaborative tools Web 2.0. One of our key contribution was to adapt Drupal to The Semantic Web. Thus we can leverage existing linked data and ontologies/vocabularies. SCF ssc are Interoperable with other SCF or Semantic Web communities. And finally provides powerful “semantic search” capabilities
Our pilot project was StemBook - an online review of Stem Cell Biology for Stem Cell researchers. Then we took advantage of features in StemBook and developed PDOnline – a site for Parkinson’s researchers. Alzforum has come a full circle and is re-developing their site on SCF. A site for neuropathic pain and other sites are in planning stages. The idea is that every site contributes features to the SCF toolkit as well as reuses existing ones. And we hope to achieve asymptotic convergence.
To link and integrate these communities developed with SCF, we annotate the content of the communities with ontologies, controlled vocabularies and linked data. The articles and comments on the community site are tagged with resources that have stable URIs or terms from controlled vocabularies. The tags have meaning and other details such as provenance and status are also captured. The details of the semantic annotation ontology can be found at our website swan.mindinformatics.org
Suppose a document discusses the gene amyloid beta. We annotate the document with the gene resource “AB”, not just the term “AB”. The resource information is obtained from a SPARQL endpoint provided by Science Commons that contains the gene synonyms are other information. Thus, search using any of these terms returns the document
Another example search for BACE1 returns document annotated wih Beta secretase, beta-site AP cleaving enzyme and so on…
In principle, search for BACE1 could also bring up the structure for BACE1. This feature has not yet been implemented
Such searches are made possible by semantic annotation of site content. And semantic annotation is facilitated by semi-automatic text mining. Text-mining algorithms suggest terms for annotation and then the editor of the community sites manually review those, prior to attaching the annotation to the document. Currently we mine documents for genes names, gene ontology terms, tissue cell types etc.
Screen shot of SCF annotation editor. The editor facilitates the manual review process. The terms identified by the algorithm are highlighted and any term can be accepted, changed or deleted.
So to recap “algorithm finds core terms”
Relationships to other entities are established automatically. The gene points to the protein, the protein to the antibody and so on
Thus powerful searches across communities are established