SlideShare una empresa de Scribd logo
1 de 12
Descargar para leer sin conexión
Study about OpenCalais API practical usage in
               linked data context

                              Căciulă Maricel

         „Faculty of Computer Science, A. I. Cuza Univesrity of Iasi”




Abstract. A presentation of OpenCalaisses. Here will be a short describtion
of the web service API and will be presented some projects that are using this
API. At the end it will be showed some personal ideas of the API usage.


Keywords: Web Service, API, resource management, linked data.
2           Căciulă Maricel




1         Introduction

       OpenCalais is a project that makes your text more valuable. It enables you to
identify named entities, facts and events and returns a Resource Description
Framework formatted result. This project was initiated by Thomson Reuters, and at
the beginning, it was aiming to eliminate the manual tagging step for publishers. In
time, OpenCalais proved to be useful improving user search experience and lately
was used to generate content hubs. OpenCalais is free to use, and can be accessed up
to 40.000 times per day. It can be used in commercial and noncommercial
applications. The motivation behind open free usage is to improve their natural
language processing tools, and to semantically link the web content.



1.1       OpenCalais Web Service

  OpenCalais can be accessed through a web service. It supports SOAP, REST and
HTTP Trafic compressions

Accessing through SOAP can be done using the web method “Enlighten” on this
URL : http://api.opencalais.com/enlighten?wsdl


                  String Enlighten(String licenseID, String content, String paramsXML)



The parameters are described in the following table as in the official Calais Soap
documentation :

Field Name          Type       Definition                                     Notes
licenseID           String     API access key                                Optain through registration


Content             String     Content to be annotated                       Max input length is
                                                                             100,000 characters
paramsXML           String     Processing and user directives and external   Max parameters length is
                               metadata                                      16.000 characters


Accessing through REST can be done at the following URL :
http://api.opencalais.com/enlighten/rest
Study about OpenCalais API practical usage in linked data context                 3



   LicenceID=url-encoded-string&content=url-encoded-string&paramsXML=url-encoded-streams
This can be used with GET, adding the argument lines to the rest URL, or with POST
and including the argument line in the html body.

There is a nice tutorial example on the official site:
http: //opencalais/files/HTMLform.zip


Accessing through HTTP Trafic Compression can be done using a Gzip request.The
client should add the “Accept-encodeing:gzip” header to the web request in order to
tell the server that the client can handle a gzip response



1.2    Open Calais API



The input API parameters are set in XML format. The parameters refer o process
directives, user directives and external metadata. The entire input XML (meaning the
paramasXML) must be HTTP encoded (escaped)


Here is a table that describes the API input parameters from the official OpenCalais
API documentation :


Parameter             Section        Definition                      Values             Default
                                                              “TEXT/HTML”
contentType           Processing     Format of the input      ”TEXT/XML”
                      Directives     content                  “TEXT/HTMLRAW”            None
                                                              “TEXT/RAW”


                                     Format of the returned   “XML/RDF”,
outputFormat          Processing     results                  “Text/Simple”
                      Directives                              “Text/Microformats”       XML/RDF
                                                              “Application/JSON”


                                     Base URL to be put in
reltagBaseURL         Processing     Rel-tag microformats     <the base URL>, for
                      Directives                              example                   None
                                                              “http://myblog.com/tag”
4        Căciulă Maricel



                                      Indicates wheter the
                         Processing   extracted metadata
calculateRelativeScore   Directives   will include relevance   “true” or”false”          True
                                      score for each unique
                                      entity
                                      I
                                      ndicates wheter output   “GenericRelations”
                         Processing   will include Generic     “SocialTags”
enableMetadataType       Directives   Relation extraction      “GenericRelations,Socia   None
                                      (RDF) and/or             lTags”
                                      Social/Tags

                                      Indicates
                                      whetherentire
                         Processing   XML/RDF document
docRDFaccessible         Directives   is saved in the Calais   “true” or “false”         None
                                      Linked Data
                                      repository



                         User         Indicates whether the
allowDistribution        Directives   extracted metadata       “true” or “false”         False
                                      can be distribuied



                                      Indicates whether
                         User         future searchers can
allowSearch              Directives   be performed on the      “true” or “false”         False
                                      extracted metadata




                                      User-generated ID for
externalID               User         the submission           Any string                None
                         Directives




                         User         Indentifier for the
Submitter                Directives   content submitter        Any string                None




The Input Content can be TEXT/HTML, TEXT/HTMLRAW, TEXT/XML,
TEXT/RAW. If the content type is not specified , then Calais tries to auto detect the
type.

As a default language Open Calais uses English, but also supports French and
Spanish. If the input text is smaller the 100 characters, then the default language is
used.
Study about OpenCalais API practical usage in linked data context         5



The API can also be used with SSL , accessing through https.



1.3    Data structure

   OpenCalais returns the response by default in RDF format. The RDF header
includes a summary of all entities extracted from the text and sorted alphabetically
based o the Entity type.

  INFORMATION

   For each unique element, the information includes the element type (that can be a
Company, Person, Acquisition for example) attribute values and ID of a unique
element
   We can enable the Relevance feature and the result RDF will also include the
relevance score for this unique entity
   When an attribute value is refered to by its ID, it will include a comment
containing the actual value for easier understanding

  INSTANCES

   As we can see on the official documentation, one or more individual instances
(mentions) for each unique metadata element. Each element instance includes the
following :

   c:docId: URI of the document this mention was detected in
   c:subject:URI of the unique entity
   c:detection:snippet of the input content where the metadata element was
indentified
   c:prefix:snippet of the input content that precedes the current instance
   c:exact:snippet of the input content in the matched portion of text
   c:offset: the character offset relative to theinput content after it has been converted
into xml
   c:length:length of the instance

1.4    OpenCalais and linked data


        With the last significant update on OpenCalais, the 4.0 version, users are now
able to connect to the Linked Data web standard.
        Linked data is a method of exposing , sharing, and connecting data through
deferenceable URIs on the web.
        To be compatible, OpenCalais respects the four principles of linked data.
     - It has URIs to identify things.
6       Căciulă Maricel


    -    It usesHTP URIs so that these things can be reffered to and looked up by
         people and user agents
    -    It provides useful information (structures description - metadata) about the
         thing when is URI is deferenced
    -    It include links to other URIs in the exposed data to improve discovery of
         other related information on the Web

In the image shown beneath, we see the latest instance linkage within the Linking
Open Data datasets. Here we can see the OpenCalais .




The Calais ecosystem is exposed via Linked Data endpoints and when it extracts an
entity from a given text it also returns a entity URI. This URI is deferenceable. You
can submit an HTTP request programmatically or through browser, and get in
response useful information and links to other Kinked Data and web assets.

        As we can see on the official site, OpenCalais is linked at this moment to the
following assets :
     - DBpedia
     - Wikipedia
     - Freebase
     - Reuters.com
     - GeoNames
     - Shooping.com
     - IMDB
     - LinkedMDB
Study about OpenCalais API practical usage in linked data context       7




2      Practical usage


OpenCalais used primarily for tagging blogs and word press articles. As it’s founder
says, they noticed that the OpenCalais project is used for other purposes like creating
content hubs

Open Calais can be used to :

Triage – Filter large influx of content

Workflow – Use metadata returned from OpenCalais to route documents to the right
person/system

Content Enhancement – OpenCalais can be the entry point for the huge world of
linked data.

Alerting – Allow advanced alerting giving the users the ability to interact more
naturally with the user application

Media Monitoring – Take in a content feed (social media, press releases , news) can
be categorized and organized using OpenCalais.

Content Harmonization – Mixing different sources of information that can be
integratied in a CMS (Content Management System)

Automated News Portal – Publish relevant information taken from different sources
after are filtered using OpenCalais

SEO – Improving search

News Presentation- With consistent metadata extraction it is possible to create new
navigation and search tools on your site



2.1    Blog tagging



As we expected, one of the first implementation based on OpenCalais was designed
for bloggers. Tagaroo is a tool initiated by the same OpenCalais team and it’s a
plugin for wordpress.com blogger site.This tool makes better your blog by improving
8        Căciulă Maricel


both the user experience and searchability. This tool analyzes you text , as you are
writing and suggests intelligent tags for the things and events you are writing about.
A nice ability that this tool provides is to use the generated tags to automatically get
images from Flicker to include your post.

    Link : http://tagaroo.opencalais.com

Another site that is using OpenCalais for blog tagging is “Al Jazeera English’ new
blogging network”. All posts in the new blog are semantically tagged using
OpenCalais for optimal search and navigation.

    Link: http://blogs.aljazeera,net

I *heart* Sea is hyperlocal news aggregation site that collects some of the best blogs
in Seatle. It uses OpenCalais to automatically tag the keywords of the blog posts in
aggregates, to make it easier to find related information.

    Link: http://iheartsea.com



2.2     Press tagging


The new websited from “The New Republic” is using an OenCalais-enabled Drupal-
powered Content Management System to increase editorial productivity and improve
search engine optimization
Link: http://www.tnr.com

The “Slate Magazine’s News Dots Network” visualizes the most recent topics in the
news as a concise network of related topics Like a human social network, the ews
tends to cluster around popular topics, and most stories are more closely related than
one might think. In the background, the News Dots scans all the articles from major
publications and submits them to OpenCalais to identify the relevant people, places,
companies, topics, etc

Link: http://slatest.slate.com/features/news_dots/default.htm




2.3     Media monitoring

Tattler is an open source topic monitoring tool for the Web. Tattler finds and
aggregates content from the web on topics users ask it to monitor. In background it
uses OpenCalais together with other Semantic Web technologies. It mines news,
websites, blogs, multimedia sites, and other social media like Twitter, to find
Study about OpenCalais API practical usage in linked data context        9


mentions of the issues, most relevant to user’s selected topics , making easy for user
to filter, organize, share and take action on content gathered from the Web.

Link : http://tattlerapp.com

Interceder is a social media monitoring tool that makes it easy to track trending topics
and search through the latest content from major news websites, blogs, twitter and
youtube.

Link: http://www.interceder.net

AskJot is a tool for analyzing web pages fro keywords and displaying as links to
search results from various services around the web. Behind the scene Ask Jot uses
OpenCalais, NYT articles search API, DBPedia, Yahho! Answers API, the flicker
API and others.



2.4    Intelligent Content

   Feedly is a Firefox plugin that brings to life user-selected inputs from Google
Reader, friendfeed, Twitter, RSS feeds and others in a easy to read and engaging
magazine style format. It uses OpenCalais and other semantic techonologiesfor
clustering, linking and organizing the content experience in an intuitive fashion that is
nicely integrated into the browsing experience.

  Link : http://www.feedly.com


   OpenPublish is based on the Drupal platform and it is a next generation CMS that
has been tailored to the needs of today’s online publishers (magazines, newspapers,
journals, trade publications, broadcast and wire services). It uses metatagging from
OpenCalais to streamline content operations, automatically create topic hubs and
recommended related articles and archived more from the same authors stories

  Link: http://www.opensourceopenminds.com/openpublish

DocumentCloud was found by The New York Times and ProPublic . DocumentCloud
is a unique online resource that offers public access to news reporters’ original source
materials, including documents, media files and more. OpenCalais processes
materials available through DocumentCloud to make it easy for users to explore
connections between newsmakers, corporations, transactions and even quotations
across documents and across the full collection of sources.

  Link : http://www.documentcloud.org
10       Căciulă Maricel


       3 .Personal of OpenCalais API usage idea


   As I tested the OpenCalais API with Document Viewer, I notice that on short text
the relevance is not accurate. For example using the text from twitter will prove that.
It will add irrelevant topics and social tags.
   Testing OpenCalais on big text, it took several hours to process. That is not ok.
This means is not reliable for books and other big length texts.
   Finally I arrived at the conclusion that the optimal text should be from an article, or
blog, that has more then 100 words and is smaller than 2 pages.



3.1    Blog tagging and filter


   Manual tagging of blogs in not always the best way to describe a post . I’ve seen
blogs that are not completely described, omitting some key words that could be
essential to find what are interested in, and, as people could see in personal way the
things, they can tag the same post with different key words.

   Essentially, the idea was to try tagging blogs using semantic web (OpenCalais).
   Using as many blogs are possible, manually added or through a crawler that will
recursively add new blogs(using the contacts from friends or persons that added a
comment ).

The easiest way to watch blogs is to use the RSS feed. In this way we can gather
blogs from different sites in a standard way.

Creating a new service that gathers posts from blogs and tags the text using the
OpenCalais , we can create a database of feeds. Having such a database we can do a
site/application that could enable a user to create a new custom generated RSS feed
from the entire database.

This way a user can see posts he is interested in from thousands of blogs. The
generated result RSS feed can be consumed by the already existing applications for
RSS.

The original idea in this is that you could se posts from thousands of common blogs
and filter by semantic tags
Study about OpenCalais API practical usage in linked data context      11


        3.2        Language abstraction


   Other interesting idea is to abstract the language. Right now the supported
language for OpenCalais are English, Franch and Spanish.
   A interesting idea it was to semantic tag Romanian , or other language texts.
   I was thinking to integrate the google translate service from Google with the
OpenCalais.
   The idea is to translate first the text from blogs or news and then to use OpenCalasi
to semantic tag .

   This could not be reliable as the translation could not be so accurate but for texts
larger the 100 words will probably tag correct the most relevant tags. This because the
translation will translate ok the key words.




4       References
1. http://opencalais.com

2. http:;//en.wikipedia.com/wiki/Linked_Data

3. http://facebook/note.php?none_id=160609314491

4. http://readwriteweb.com/archives/calais_4/linked_data.php

5. http://tagaroo.opencalais.com

6. http://viewer.opencalais.com

7. http://vator.tv/news/shows/2009-06-19-opencalais-makes-content-discoverable

8.
     http://video.google.com/videoplay?docid=1419547095322807081&ei=mUgRS7yIO5CW2
     wKs7ImKAg&q=opencalais#
12   Căciulă Maricel

Más contenido relacionado

Similar a Open Calais

Approaches to machine actionable links
Approaches to machine actionable linksApproaches to machine actionable links
Approaches to machine actionable linksStephen Richard
 
OpenCalais in Linked Data context
OpenCalais in Linked Data contextOpenCalais in Linked Data context
OpenCalais in Linked Data contexteldorina
 
Build your APIs with apigility
Build your APIs with apigilityBuild your APIs with apigility
Build your APIs with apigilityChristian Varela
 
Data Virtualization Primer -
Data Virtualization Primer -Data Virtualization Primer -
Data Virtualization Primer -Kenneth Peeples
 
Deploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application ServerDeploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application Serverwebhostingguy
 
Session 8 Android Web Services - Part 1.pdf
Session 8 Android Web Services - Part 1.pdfSession 8 Android Web Services - Part 1.pdf
Session 8 Android Web Services - Part 1.pdfEngmohammedAlzared
 
Adding Rules on Existing Hypermedia APIs
Adding Rules on Existing Hypermedia APIsAdding Rules on Existing Hypermedia APIs
Adding Rules on Existing Hypermedia APIsMichael Petychakis
 
Innovate2014 Better Integrations Through Open Interfaces
Innovate2014 Better Integrations Through Open InterfacesInnovate2014 Better Integrations Through Open Interfaces
Innovate2014 Better Integrations Through Open InterfacesSteve Speicher
 
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...Edward Blurock
 
Document Databases & RavenDB
Document Databases & RavenDBDocument Databases & RavenDB
Document Databases & RavenDBBrian Ritchie
 
Web Tech Java Servlet Update1
Web Tech   Java Servlet Update1Web Tech   Java Servlet Update1
Web Tech Java Servlet Update1vikram singh
 
Using Semantics to personalize medical research
Using Semantics to personalize medical researchUsing Semantics to personalize medical research
Using Semantics to personalize medical researchMark Wilkinson
 
Oracle Database Management REST API
Oracle Database Management REST APIOracle Database Management REST API
Oracle Database Management REST APIJeff Smith
 
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin CoreAndy Powell
 
Ruby On Rails Siddhesh
Ruby On Rails SiddheshRuby On Rails Siddhesh
Ruby On Rails SiddheshSiddhesh Bhobe
 

Similar a Open Calais (20)

The Glory of Rest
The Glory of RestThe Glory of Rest
The Glory of Rest
 
Approaches to machine actionable links
Approaches to machine actionable linksApproaches to machine actionable links
Approaches to machine actionable links
 
OpenCalais in Linked Data context
OpenCalais in Linked Data contextOpenCalais in Linked Data context
OpenCalais in Linked Data context
 
Introduction to Hydra
Introduction to HydraIntroduction to Hydra
Introduction to Hydra
 
Build your APIs with apigility
Build your APIs with apigilityBuild your APIs with apigility
Build your APIs with apigility
 
Data Virtualization Primer -
Data Virtualization Primer -Data Virtualization Primer -
Data Virtualization Primer -
 
dvprimer-concepts
dvprimer-conceptsdvprimer-concepts
dvprimer-concepts
 
Deploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application ServerDeploying PHP applications using Virtuoso as Application Server
Deploying PHP applications using Virtuoso as Application Server
 
Rest web service
Rest web serviceRest web service
Rest web service
 
API Basics
API BasicsAPI Basics
API Basics
 
Session 8 Android Web Services - Part 1.pdf
Session 8 Android Web Services - Part 1.pdfSession 8 Android Web Services - Part 1.pdf
Session 8 Android Web Services - Part 1.pdf
 
Adding Rules on Existing Hypermedia APIs
Adding Rules on Existing Hypermedia APIsAdding Rules on Existing Hypermedia APIs
Adding Rules on Existing Hypermedia APIs
 
Innovate2014 Better Integrations Through Open Interfaces
Innovate2014 Better Integrations Through Open InterfacesInnovate2014 Better Integrations Through Open Interfaces
Innovate2014 Better Integrations Through Open Interfaces
 
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
ChemConnect: Characterizing CombusAon KineAc Data with ontologies and meta-­‐...
 
Document Databases & RavenDB
Document Databases & RavenDBDocument Databases & RavenDB
Document Databases & RavenDB
 
Web Tech Java Servlet Update1
Web Tech   Java Servlet Update1Web Tech   Java Servlet Update1
Web Tech Java Servlet Update1
 
Using Semantics to personalize medical research
Using Semantics to personalize medical researchUsing Semantics to personalize medical research
Using Semantics to personalize medical research
 
Oracle Database Management REST API
Oracle Database Management REST APIOracle Database Management REST API
Oracle Database Management REST API
 
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
 
Ruby On Rails Siddhesh
Ruby On Rails SiddheshRuby On Rails Siddhesh
Ruby On Rails Siddhesh
 

Último

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 

Último (20)

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 

Open Calais

  • 1. Study about OpenCalais API practical usage in linked data context Căciulă Maricel „Faculty of Computer Science, A. I. Cuza Univesrity of Iasi” Abstract. A presentation of OpenCalaisses. Here will be a short describtion of the web service API and will be presented some projects that are using this API. At the end it will be showed some personal ideas of the API usage. Keywords: Web Service, API, resource management, linked data.
  • 2. 2 Căciulă Maricel 1 Introduction OpenCalais is a project that makes your text more valuable. It enables you to identify named entities, facts and events and returns a Resource Description Framework formatted result. This project was initiated by Thomson Reuters, and at the beginning, it was aiming to eliminate the manual tagging step for publishers. In time, OpenCalais proved to be useful improving user search experience and lately was used to generate content hubs. OpenCalais is free to use, and can be accessed up to 40.000 times per day. It can be used in commercial and noncommercial applications. The motivation behind open free usage is to improve their natural language processing tools, and to semantically link the web content. 1.1 OpenCalais Web Service OpenCalais can be accessed through a web service. It supports SOAP, REST and HTTP Trafic compressions Accessing through SOAP can be done using the web method “Enlighten” on this URL : http://api.opencalais.com/enlighten?wsdl String Enlighten(String licenseID, String content, String paramsXML) The parameters are described in the following table as in the official Calais Soap documentation : Field Name Type Definition Notes licenseID String API access key Optain through registration Content String Content to be annotated Max input length is 100,000 characters paramsXML String Processing and user directives and external Max parameters length is metadata 16.000 characters Accessing through REST can be done at the following URL : http://api.opencalais.com/enlighten/rest
  • 3. Study about OpenCalais API practical usage in linked data context 3 LicenceID=url-encoded-string&content=url-encoded-string&paramsXML=url-encoded-streams This can be used with GET, adding the argument lines to the rest URL, or with POST and including the argument line in the html body. There is a nice tutorial example on the official site: http: //opencalais/files/HTMLform.zip Accessing through HTTP Trafic Compression can be done using a Gzip request.The client should add the “Accept-encodeing:gzip” header to the web request in order to tell the server that the client can handle a gzip response 1.2 Open Calais API The input API parameters are set in XML format. The parameters refer o process directives, user directives and external metadata. The entire input XML (meaning the paramasXML) must be HTTP encoded (escaped) Here is a table that describes the API input parameters from the official OpenCalais API documentation : Parameter Section Definition Values Default “TEXT/HTML” contentType Processing Format of the input ”TEXT/XML” Directives content “TEXT/HTMLRAW” None “TEXT/RAW” Format of the returned “XML/RDF”, outputFormat Processing results “Text/Simple” Directives “Text/Microformats” XML/RDF “Application/JSON” Base URL to be put in reltagBaseURL Processing Rel-tag microformats <the base URL>, for Directives example None “http://myblog.com/tag”
  • 4. 4 Căciulă Maricel Indicates wheter the Processing extracted metadata calculateRelativeScore Directives will include relevance “true” or”false” True score for each unique entity I ndicates wheter output “GenericRelations” Processing will include Generic “SocialTags” enableMetadataType Directives Relation extraction “GenericRelations,Socia None (RDF) and/or lTags” Social/Tags Indicates whetherentire Processing XML/RDF document docRDFaccessible Directives is saved in the Calais “true” or “false” None Linked Data repository User Indicates whether the allowDistribution Directives extracted metadata “true” or “false” False can be distribuied Indicates whether User future searchers can allowSearch Directives be performed on the “true” or “false” False extracted metadata User-generated ID for externalID User the submission Any string None Directives User Indentifier for the Submitter Directives content submitter Any string None The Input Content can be TEXT/HTML, TEXT/HTMLRAW, TEXT/XML, TEXT/RAW. If the content type is not specified , then Calais tries to auto detect the type. As a default language Open Calais uses English, but also supports French and Spanish. If the input text is smaller the 100 characters, then the default language is used.
  • 5. Study about OpenCalais API practical usage in linked data context 5 The API can also be used with SSL , accessing through https. 1.3 Data structure OpenCalais returns the response by default in RDF format. The RDF header includes a summary of all entities extracted from the text and sorted alphabetically based o the Entity type. INFORMATION For each unique element, the information includes the element type (that can be a Company, Person, Acquisition for example) attribute values and ID of a unique element We can enable the Relevance feature and the result RDF will also include the relevance score for this unique entity When an attribute value is refered to by its ID, it will include a comment containing the actual value for easier understanding INSTANCES As we can see on the official documentation, one or more individual instances (mentions) for each unique metadata element. Each element instance includes the following : c:docId: URI of the document this mention was detected in c:subject:URI of the unique entity c:detection:snippet of the input content where the metadata element was indentified c:prefix:snippet of the input content that precedes the current instance c:exact:snippet of the input content in the matched portion of text c:offset: the character offset relative to theinput content after it has been converted into xml c:length:length of the instance 1.4 OpenCalais and linked data With the last significant update on OpenCalais, the 4.0 version, users are now able to connect to the Linked Data web standard. Linked data is a method of exposing , sharing, and connecting data through deferenceable URIs on the web. To be compatible, OpenCalais respects the four principles of linked data. - It has URIs to identify things.
  • 6. 6 Căciulă Maricel - It usesHTP URIs so that these things can be reffered to and looked up by people and user agents - It provides useful information (structures description - metadata) about the thing when is URI is deferenced - It include links to other URIs in the exposed data to improve discovery of other related information on the Web In the image shown beneath, we see the latest instance linkage within the Linking Open Data datasets. Here we can see the OpenCalais . The Calais ecosystem is exposed via Linked Data endpoints and when it extracts an entity from a given text it also returns a entity URI. This URI is deferenceable. You can submit an HTTP request programmatically or through browser, and get in response useful information and links to other Kinked Data and web assets. As we can see on the official site, OpenCalais is linked at this moment to the following assets : - DBpedia - Wikipedia - Freebase - Reuters.com - GeoNames - Shooping.com - IMDB - LinkedMDB
  • 7. Study about OpenCalais API practical usage in linked data context 7 2 Practical usage OpenCalais used primarily for tagging blogs and word press articles. As it’s founder says, they noticed that the OpenCalais project is used for other purposes like creating content hubs Open Calais can be used to : Triage – Filter large influx of content Workflow – Use metadata returned from OpenCalais to route documents to the right person/system Content Enhancement – OpenCalais can be the entry point for the huge world of linked data. Alerting – Allow advanced alerting giving the users the ability to interact more naturally with the user application Media Monitoring – Take in a content feed (social media, press releases , news) can be categorized and organized using OpenCalais. Content Harmonization – Mixing different sources of information that can be integratied in a CMS (Content Management System) Automated News Portal – Publish relevant information taken from different sources after are filtered using OpenCalais SEO – Improving search News Presentation- With consistent metadata extraction it is possible to create new navigation and search tools on your site 2.1 Blog tagging As we expected, one of the first implementation based on OpenCalais was designed for bloggers. Tagaroo is a tool initiated by the same OpenCalais team and it’s a plugin for wordpress.com blogger site.This tool makes better your blog by improving
  • 8. 8 Căciulă Maricel both the user experience and searchability. This tool analyzes you text , as you are writing and suggests intelligent tags for the things and events you are writing about. A nice ability that this tool provides is to use the generated tags to automatically get images from Flicker to include your post. Link : http://tagaroo.opencalais.com Another site that is using OpenCalais for blog tagging is “Al Jazeera English’ new blogging network”. All posts in the new blog are semantically tagged using OpenCalais for optimal search and navigation. Link: http://blogs.aljazeera,net I *heart* Sea is hyperlocal news aggregation site that collects some of the best blogs in Seatle. It uses OpenCalais to automatically tag the keywords of the blog posts in aggregates, to make it easier to find related information. Link: http://iheartsea.com 2.2 Press tagging The new websited from “The New Republic” is using an OenCalais-enabled Drupal- powered Content Management System to increase editorial productivity and improve search engine optimization Link: http://www.tnr.com The “Slate Magazine’s News Dots Network” visualizes the most recent topics in the news as a concise network of related topics Like a human social network, the ews tends to cluster around popular topics, and most stories are more closely related than one might think. In the background, the News Dots scans all the articles from major publications and submits them to OpenCalais to identify the relevant people, places, companies, topics, etc Link: http://slatest.slate.com/features/news_dots/default.htm 2.3 Media monitoring Tattler is an open source topic monitoring tool for the Web. Tattler finds and aggregates content from the web on topics users ask it to monitor. In background it uses OpenCalais together with other Semantic Web technologies. It mines news, websites, blogs, multimedia sites, and other social media like Twitter, to find
  • 9. Study about OpenCalais API practical usage in linked data context 9 mentions of the issues, most relevant to user’s selected topics , making easy for user to filter, organize, share and take action on content gathered from the Web. Link : http://tattlerapp.com Interceder is a social media monitoring tool that makes it easy to track trending topics and search through the latest content from major news websites, blogs, twitter and youtube. Link: http://www.interceder.net AskJot is a tool for analyzing web pages fro keywords and displaying as links to search results from various services around the web. Behind the scene Ask Jot uses OpenCalais, NYT articles search API, DBPedia, Yahho! Answers API, the flicker API and others. 2.4 Intelligent Content Feedly is a Firefox plugin that brings to life user-selected inputs from Google Reader, friendfeed, Twitter, RSS feeds and others in a easy to read and engaging magazine style format. It uses OpenCalais and other semantic techonologiesfor clustering, linking and organizing the content experience in an intuitive fashion that is nicely integrated into the browsing experience. Link : http://www.feedly.com OpenPublish is based on the Drupal platform and it is a next generation CMS that has been tailored to the needs of today’s online publishers (magazines, newspapers, journals, trade publications, broadcast and wire services). It uses metatagging from OpenCalais to streamline content operations, automatically create topic hubs and recommended related articles and archived more from the same authors stories Link: http://www.opensourceopenminds.com/openpublish DocumentCloud was found by The New York Times and ProPublic . DocumentCloud is a unique online resource that offers public access to news reporters’ original source materials, including documents, media files and more. OpenCalais processes materials available through DocumentCloud to make it easy for users to explore connections between newsmakers, corporations, transactions and even quotations across documents and across the full collection of sources. Link : http://www.documentcloud.org
  • 10. 10 Căciulă Maricel 3 .Personal of OpenCalais API usage idea As I tested the OpenCalais API with Document Viewer, I notice that on short text the relevance is not accurate. For example using the text from twitter will prove that. It will add irrelevant topics and social tags. Testing OpenCalais on big text, it took several hours to process. That is not ok. This means is not reliable for books and other big length texts. Finally I arrived at the conclusion that the optimal text should be from an article, or blog, that has more then 100 words and is smaller than 2 pages. 3.1 Blog tagging and filter Manual tagging of blogs in not always the best way to describe a post . I’ve seen blogs that are not completely described, omitting some key words that could be essential to find what are interested in, and, as people could see in personal way the things, they can tag the same post with different key words. Essentially, the idea was to try tagging blogs using semantic web (OpenCalais). Using as many blogs are possible, manually added or through a crawler that will recursively add new blogs(using the contacts from friends or persons that added a comment ). The easiest way to watch blogs is to use the RSS feed. In this way we can gather blogs from different sites in a standard way. Creating a new service that gathers posts from blogs and tags the text using the OpenCalais , we can create a database of feeds. Having such a database we can do a site/application that could enable a user to create a new custom generated RSS feed from the entire database. This way a user can see posts he is interested in from thousands of blogs. The generated result RSS feed can be consumed by the already existing applications for RSS. The original idea in this is that you could se posts from thousands of common blogs and filter by semantic tags
  • 11. Study about OpenCalais API practical usage in linked data context 11 3.2 Language abstraction Other interesting idea is to abstract the language. Right now the supported language for OpenCalais are English, Franch and Spanish. A interesting idea it was to semantic tag Romanian , or other language texts. I was thinking to integrate the google translate service from Google with the OpenCalais. The idea is to translate first the text from blogs or news and then to use OpenCalasi to semantic tag . This could not be reliable as the translation could not be so accurate but for texts larger the 100 words will probably tag correct the most relevant tags. This because the translation will translate ok the key words. 4 References 1. http://opencalais.com 2. http:;//en.wikipedia.com/wiki/Linked_Data 3. http://facebook/note.php?none_id=160609314491 4. http://readwriteweb.com/archives/calais_4/linked_data.php 5. http://tagaroo.opencalais.com 6. http://viewer.opencalais.com 7. http://vator.tv/news/shows/2009-06-19-opencalais-makes-content-discoverable 8. http://video.google.com/videoplay?docid=1419547095322807081&ei=mUgRS7yIO5CW2 wKs7ImKAg&q=opencalais#
  • 12. 12 Căciulă Maricel