This presentation was given during the first Lightning Talk session at the Alfresco DevCon 2012 in San Jose. It covers the port of the OpenCalais Integration and its Share UI extension to work with Apache Stanbol. These integrations support auto-tagging, semantic tag clouds, and semantic geo-tagged maps. Both integrations are open source and available on Google Code http://code.google.com/p/semantics4alfresco/
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
An Alfresco Apache Stanbol Integration (port of OpenCalais integration) - Alfresco DevCon 2012 San Jose
1. An Alfresco Apache Stanbol Integration
(port of OpenCalais Integration)
Steve Reiner
CTO
Integrated Semantics
2. OpenCalais Integration Features
• Share, FlexSpaces, and Explorer UI
• Auto tagging action (manual and
rules) in all
• List semantic tags in details in all
• Share, FlexSpaces: Semantic Tag
Clouds, Geo-Tagged Map
• FlexSpaces: Suggest Tags, Add /
Remove Tags on Doc
• Open Source
3. OpenCalais Share Integration Features
• Auto tag menu in Doc Lib, Repo, Details
• Semantic Tag Cloud Dashlets with category drop-down
• Geo-tagged map dashlet
• Dashlets work both on site and overall dashboards
• Search results list when click tags in these dashlets
4. OpenCalais Advantages / Disadvantages
• Advantages:
• Good Recognition Results on Names, Cities,
Companies
• Good for news, public website text
• Disadvantages:
• Doc Size limit on All versions (100k bytes)
• Daily submission items limits on Free OpenCalais (50k)
and Calais Professional (100k)
• Keep metadata extracted
• Focused on English, some support for French,Spanish
• Not Customizable in Taxonomy or in recognition code
• Not Open Source
5. Apache Stanbol
• Disadvantages:
• OpenNLP Recognition of Names, Cities, Companies
not as good as OpenCalais (can chain other
engines/services including OpenCalais)
• Advantages:
• No doc size or submission item limits
• Multi language focused
• Customizable in Taxonomy and in recognition code
• Open Source
6. Apache Stanbol
• More of a full semantic platform, not just text enhancement
• Focused on semantic content management
• Could be used for a more general semantic platform
• Componentized, OSGi based
• Enhancer, Enhancement Engines, Entity Hub, ContentHub,
Ontology Mgr, Rules, Reasoners, CMS Adapter, FactStore
7. Port of OpenCalais Integration to Apache
Stanbol
• Prototype download available now
• http://code.google.com/p/semantics4alfresco/
• Open Source
• All previous features in Share and Explorer are available
• Alfresco extension (4.x) and Share extension (4.0 and 4.2)
• Share auto-tag menus, semantic tag clouds dashlet, geo-tagged dashlet, semantic tags listed in
details
• Action can be used in content rules to auto-tag all submissions to a folder, etc.
• Auto tag action also available in Explorer, semantic tags listed on details page
• Suggest tag webscript not complete
• FlexSpaces doesn’t have support yet (need to add additional calls to different webscript URLs and
add preference options of to use OpenCalais or
• Leveraged a Java client API library contributed by Zaizi to Stanbol that makes
REST calls to Stanbol
8. Apache Stanbol Integration Features
Roadmap
• Finish Suggest Tag WebScript and add support to FlexSpaces for Stanbol
• Display of dbpedia info / webpage on entity next to search results list on page
displayed after semantic tag click
• Add using Stanbol contenthub instead of stateless entityhub to retain semantic
enhancement of docs
• If Zaizi Stanbol integration is not made available as open source, will add some
things such as Solr Facets search UI of semantic categories / entities
• Other things considering
• SKOS taxonomy editor
• Semantic Categories Graph (single doc, multiple docs)
• Tie in Alfresco as the content mgr of versions of Proté gé GWT Web UI ontology editor / tie in with
Stanbol
• Stanbol support for enhancing any CMIS repository
• Stanbol as platform semantic data integration of structured data in addition to unstructured
9. Links to Find out more
• http://code.google.com/p/semantics4alfresco/
• www.integratedsemantics.org blog
• www.integratedsemantics.com
• http://stanbol.apache.org
• http://www.iks-project.eu/
• http://www.opencalais.com/
• Twitter: @stevereiner