SlideShare una empresa de Scribd logo
1 de 48
Descargar para leer sin conexión
When ECM Meets the
                             Semantic Web


                             20 Oct 2011 - Olivier Grisel & Stefane Fermigier



           Open Source ECM


Thursday, October 20, 2011
Business Motivations


                               2




Thursday, October 20, 2011
Source: Wikipedia
Thursday, October 20, 2011
Source: Wikipedia
Thursday, October 20, 2011
The DIKW hierarchy




                             5




Thursday, October 20, 2011
But every coin has
                               another side


Thursday, October 20, 2011
Infobesity!




Thursday, October 20, 2011
A few figures
                     • 50% more data / content / information
                             produced every year
                     • 1.8 zettabytes of data produced in 2011
                             (= 1 billion terabytes)
                     • Employees are drowning in a sea of email,
                             status messages, etc., and spend on average
                             more than 6 hours / weeks unsuccessfully
                             searching for or recreating lost documents


Thursday, October 20, 2011
A Solution: the Semantic
        Web


                                   9




Thursday, October 20, 2011
A Brief History of the Web
            • Web 1.0 (1990-now): web of sites and pages,
              aka the World Wide Web
            • Web 2.0 (2000-now): web of people and of
              participation, aka the Social Web (Blogs, RSS,
              tags, Facebook, Wikipedia, etc.)
            • Web 3.0 (2010-now): web of data, of meaning
              and connected knowledge, aka the Semantic
              Web
                                                               10




Thursday, October 20, 2011
11




Thursday, October 20, 2011
“To a computer, then, the web is a flat,
                              boring world devoid of meaning”


          Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/           12




Thursday, October 20, 2011
“This is a pity, as in fact documents on the
                   web describe real objects and imaginary
                 concepts, and give particular relationships
                                   between them”
          Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/     13




Thursday, October 20, 2011
“Adding semantics to the web involves two things:
       allowing documents which have information in
      machine-readable forms, and allowing links to be
            created with relationship values.”
          Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
                                                               14




Thursday, October 20, 2011
“The Semantic Web is not a separate Web but an
        extension of the current one, in which information
           is given well-defined meaning, better enabling
        computers and people to work in cooperation.”

          Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
                                                               15




Thursday, October 20, 2011
Means and Tools


                             16




Thursday, October 20, 2011
4 stages


            • Extract meaning from raw data / content
            • Connect information to form knowledge
            • Reason about this knowledge
            • Present this knowledge in actionable form

                                                          17




Thursday, October 20, 2011
Extracting

            • Leverage metadata embedded in or associated with
              documents (when they exist)
            • Or use machine learning, NLP (Natural Language
              Processing) and image processing algorithms to
              extract meaning from text / images
            • Examples include: named entities extraction,
              automatic categorization / tagging, sentiment
              analysis, etc.
                                                                 18




Thursday, October 20, 2011
Interlude:
        Linked Open Data


                             19




Thursday, October 20, 2011
2007
                                    2008




                             2009   2010




                                       20




Thursday, October 20, 2011
2011!                 21




Thursday, October 20, 2011
Linking

            • Many Linked Open Data repositories have been
              made available over the last 10 years
            • RDF and graph database systems are now available
              to manage this huge mass of information (billions of
              triples)
            • Match information extracted from content with
              these public (or internal) data/knowledge bases
                                                                     22




Thursday, October 20, 2011
Reasoning

            • When you are working on reliable metadata (ex:
              RDFa embedded in web pages), you can use rule /
              inference engines to infer actionable knowledge
              from your content (ex: shopping recommendation
              engine)
            • Rules can also be used to clean up / flag errors
              when working with unreliable (e.g. automatically
              extracted) information
                                                                 23




Thursday, October 20, 2011
Presenting

            • Allow the users of your system to interact with the
              knowledge thus extracted or produced, in a way
              that allows them to do their jobs better
            • A smart presentation system solves the information
              overload issue by contextualizing the information,
              i.e. presenting only information relevant to what the
              user is currently doing

                                                                      24




Thursday, October 20, 2011
R&D Projects
        Involving Nuxeo


                             25




Thursday, October 20, 2011
IKS project

            • European R&D project under the FP7, with 13
                    partners (6 SMEs) and a 8.5M EUR budget

            • Goal: create a semantic software “stack” that
                    will be used by CMS vendors to add semantic
                    features to their products

            • Started in Jan. 2009, will last until Dec. 2012
            • First tangible result: Apache Stanbol
                    (more about this later)                       26




Thursday, October 20, 2011
SAMAR project
            • French collaborative R&D project with 10
              partners, and a 4.5M EUR budget
            • Goal: create a platform for managing
              multimedia content in arabic, for news agencies
              such as AFP
            • Will include: automated translation, named
              entities extraction, content classification
            • First results: integration between Nuxeo and
              Temis (more later)                                27




Thursday, October 20, 2011
State of the Art
        Semantic ECM at Nuxeo


                                28




Thursday, October 20, 2011
The Semantic Engine

           • From unstructured content to Knowledge

           • Language guessing

           • Topic classification (Business, Sports, Media, ...)

           • Named Entities extraction and linking

           • Relationships and properties extraction

                                                                  29




Thursday, October 20, 2011
Demo time!



                             30




Thursday, October 20, 2011
31




Thursday, October 20, 2011
32




Thursday, October 20, 2011
33




Thursday, October 20, 2011
RESTful
                                is
                             Beautiful




                                         34




Thursday, October 20, 2011
35




Thursday, October 20, 2011
36




Thursday, October 20, 2011
=
                                  Semantic Engines
                                 (Apache OpenNLP)
                                          +
                             Fast Linked Data local index
                                    (Apache Solr)
                                          +
                                Semantic Rule Engine        37


                                    (Apache Jena)
Thursday, October 20, 2011
Apache Stanbol

                                                                     Engine 1          DBpedia
                                                                     Engine 2


                                 2
           1                                                         Engine 3



                                                                                       Freebase

                    Nuxeo DM
                                                              3
                             addon
                                                                                       Geonames
                                                                                LDAP
                                     Local IT infrastructure (LAN)                                38




Thursday, October 20, 2011
How to build engines?



                                39




Thursday, October 20, 2011
Training statistical models for NER with
        Wikipedia and DBpedia

           •       Extract sentences with link positions in Wikipedia articles

           •       DBPedia to the find type of the target entity (Person,
                   Location, Organization)

           •       Apache Pig scripts to compute the join + format the result as
                   training files for OpenNLP

           •       Apache OpenNLP to build and evaluate the models

           •       Apache Hadoop for distributed processing

           •       Apache Whirr for deployment and management on Amazon
                   EC2 cluster
                                                                                   40




Thursday, October 20, 2011
41




Thursday, October 20, 2011
42




Thursday, October 20, 2011
43




Thursday, October 20, 2011
44




Thursday, October 20, 2011
Training statistical models for topic
        classification from Wikipedia and DBpedia


           •       Filter category tree from DBpedia SKOS entries (~500k)

           •       Pig scripts to compute the joins with articles abstracts for all
                   the articles categorized in Wikipedia

           •       Export as 2.8GB TSV file to be indexed in Apache Solr

           •       Use Solr MoreLikeThisHandler to find the top 3 most related
                   Wikipedia category for any kind of text

           •       Apache Whirr & Hadoop for deployment and management on
                   Amazon EC2 cluster
                                                                                      45




Thursday, October 20, 2011
Wrap Up on Recent Work

            • Full offline mode: Stanbol EntityHub
            • Multi-lingual Indexes
            • New UI for occurrences reviews
            • Temis Luxid Annotation Factory integration

                                                           46




Thursday, October 20, 2011
What’s next?

           • Stanbol and Temis connection in Admin Center

           • Embedded Stanbol mode for easy deployment

           • More OpenNLP models for more languages

           • Finalize topic classification - handle hierarchy

           • Tight integration with Nuxeo DM search features

                                                               47




Thursday, October 20, 2011
Thank you for your attention!




                                        48




Thursday, October 20, 2011

Más contenido relacionado

La actualidad más candente

LIBER and its EU projects
LIBER and its EU projectsLIBER and its EU projects
LIBER and its EU projectsLIBER Europe
 
Computing and Linguistics: A cognitive approach
Computing and Linguistics: A cognitive approachComputing and Linguistics: A cognitive approach
Computing and Linguistics: A cognitive approachSteve Pepper
 
20101015 linked openeuropeanafi
20101015 linked openeuropeanafi20101015 linked openeuropeanafi
20101015 linked openeuropeanafiStefan Gradmann
 
Connecting Smart Things through Web services Orchestrations
Connecting Smart Things through Web services OrchestrationsConnecting Smart Things through Web services Orchestrations
Connecting Smart Things through Web services OrchestrationsAntonio Pintus
 
Nuxeo World Session: Semantic Technologies - Update on Recent Research
Nuxeo World Session: Semantic Technologies - Update on Recent ResearchNuxeo World Session: Semantic Technologies - Update on Recent Research
Nuxeo World Session: Semantic Technologies - Update on Recent ResearchNuxeo
 
The Future of Business Intelligence
The Future of Business IntelligenceThe Future of Business Intelligence
The Future of Business IntelligenceTim O'Reilly
 
Social web & linked data
Social web & linked dataSocial web & linked data
Social web & linked dataSerge Garlatti
 
When the Wikipedians talk: network and tree structure of Wikipedia discussion...
When the Wikipedians talk: network and tree structure of Wikipedia discussion...When the Wikipedians talk: network and tree structure of Wikipedia discussion...
When the Wikipedians talk: network and tree structure of Wikipedia discussion...David Laniado
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
Open Source in the Cloud Computing Era
Open Source in the Cloud Computing EraOpen Source in the Cloud Computing Era
Open Source in the Cloud Computing EraTim O'Reilly
 
Accessibility: introduction
Accessibility: introduction  Accessibility: introduction
Accessibility: introduction Andres Baravalle
 
Enhancing user access to european digital heritage
Enhancing user access to european digital heritageEnhancing user access to european digital heritage
Enhancing user access to european digital heritageEuropeanaConnect
 
Twenty Years of Metadata: Lessons from the First Two Decades of the Web
Twenty Years of Metadata: Lessons from the First Two Decades of the WebTwenty Years of Metadata: Lessons from the First Two Decades of the Web
Twenty Years of Metadata: Lessons from the First Two Decades of the WebStuart Weibel
 
Drupalcon keynote: Open Source and Open Data in the age of the cloud
Drupalcon keynote: Open Source and Open Data in the age of the cloudDrupalcon keynote: Open Source and Open Data in the age of the cloud
Drupalcon keynote: Open Source and Open Data in the age of the cloudTim O'Reilly
 
Everything is a Subject: The vision of subject-centric computing
Everything is a Subject: The vision of subject-centric computingEverything is a Subject: The vision of subject-centric computing
Everything is a Subject: The vision of subject-centric computingSteve Pepper
 
Localbysocial sunderland
Localbysocial sunderlandLocalbysocial sunderland
Localbysocial sunderlandlocalgovuk
 

La actualidad más candente (19)

LIBER and its EU projects
LIBER and its EU projectsLIBER and its EU projects
LIBER and its EU projects
 
Computing and Linguistics: A cognitive approach
Computing and Linguistics: A cognitive approachComputing and Linguistics: A cognitive approach
Computing and Linguistics: A cognitive approach
 
20101015 linked openeuropeanafi
20101015 linked openeuropeanafi20101015 linked openeuropeanafi
20101015 linked openeuropeanafi
 
Connecting Smart Things through Web services Orchestrations
Connecting Smart Things through Web services OrchestrationsConnecting Smart Things through Web services Orchestrations
Connecting Smart Things through Web services Orchestrations
 
Nuxeo World Session: Semantic Technologies - Update on Recent Research
Nuxeo World Session: Semantic Technologies - Update on Recent ResearchNuxeo World Session: Semantic Technologies - Update on Recent Research
Nuxeo World Session: Semantic Technologies - Update on Recent Research
 
Kick-off meeting Linkflows project
Kick-off meeting Linkflows projectKick-off meeting Linkflows project
Kick-off meeting Linkflows project
 
The Future of Business Intelligence
The Future of Business IntelligenceThe Future of Business Intelligence
The Future of Business Intelligence
 
Social web & linked data
Social web & linked dataSocial web & linked data
Social web & linked data
 
When the Wikipedians talk: network and tree structure of Wikipedia discussion...
When the Wikipedians talk: network and tree structure of Wikipedia discussion...When the Wikipedians talk: network and tree structure of Wikipedia discussion...
When the Wikipedians talk: network and tree structure of Wikipedia discussion...
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
Open Source in the Cloud Computing Era
Open Source in the Cloud Computing EraOpen Source in the Cloud Computing Era
Open Source in the Cloud Computing Era
 
Accessibility: introduction
Accessibility: introduction  Accessibility: introduction
Accessibility: introduction
 
NISO BISG Forum: Bibliographic Roadmap
NISO BISG Forum: Bibliographic RoadmapNISO BISG Forum: Bibliographic Roadmap
NISO BISG Forum: Bibliographic Roadmap
 
Enhancing user access to european digital heritage
Enhancing user access to european digital heritageEnhancing user access to european digital heritage
Enhancing user access to european digital heritage
 
Twenty Years of Metadata: Lessons from the First Two Decades of the Web
Twenty Years of Metadata: Lessons from the First Two Decades of the WebTwenty Years of Metadata: Lessons from the First Two Decades of the Web
Twenty Years of Metadata: Lessons from the First Two Decades of the Web
 
Drupalcon keynote: Open Source and Open Data in the age of the cloud
Drupalcon keynote: Open Source and Open Data in the age of the cloudDrupalcon keynote: Open Source and Open Data in the age of the cloud
Drupalcon keynote: Open Source and Open Data in the age of the cloud
 
Everything is a Subject: The vision of subject-centric computing
Everything is a Subject: The vision of subject-centric computingEverything is a Subject: The vision of subject-centric computing
Everything is a Subject: The vision of subject-centric computing
 
Brink uksg 2013 3
Brink uksg 2013 3Brink uksg 2013 3
Brink uksg 2013 3
 
Localbysocial sunderland
Localbysocial sunderlandLocalbysocial sunderland
Localbysocial sunderland
 

Destacado

The Nuxeo Way: leveraging open source to build a world-class ECM platform
The Nuxeo Way: leveraging open source to build a world-class ECM platformThe Nuxeo Way: leveraging open source to build a world-class ECM platform
The Nuxeo Way: leveraging open source to build a world-class ECM platformNuxeo
 
Challenges du recrutement pour un editeur de logiciel libre
Challenges du recrutement pour un editeur de logiciel libreChallenges du recrutement pour un editeur de logiciel libre
Challenges du recrutement pour un editeur de logiciel libreStefane Fermigier
 
Nuxeo World Session: Mobile ECM Apps with Nuxeo EP
Nuxeo World Session: Mobile ECM Apps with Nuxeo EPNuxeo World Session: Mobile ECM Apps with Nuxeo EP
Nuxeo World Session: Mobile ECM Apps with Nuxeo EPNuxeo
 
Lessons learned Building Nuxeo EP - Component-based, open source ECM platform
Lessons learned Building Nuxeo EP - Component-based, open source ECM platformLessons learned Building Nuxeo EP - Component-based, open source ECM platform
Lessons learned Building Nuxeo EP - Component-based, open source ECM platformNuxeo
 
Eclipse Apogee and Nuxeo RCP
Eclipse Apogee and Nuxeo RCPEclipse Apogee and Nuxeo RCP
Eclipse Apogee and Nuxeo RCPStefane Fermigier
 
Le Marché du Logiciel Libre en France en 2010
Le Marché du Logiciel Libre en France en 2010Le Marché du Logiciel Libre en France en 2010
Le Marché du Logiciel Libre en France en 2010Stefane Fermigier
 
What's new in Nuxeo 5.2? - Solutions Linux 2009
What's new in Nuxeo 5.2? - Solutions Linux 2009What's new in Nuxeo 5.2? - Solutions Linux 2009
What's new in Nuxeo 5.2? - Solutions Linux 2009Stefane Fermigier
 
GT Logiciel Libre - Convention Systematic 2011
GT Logiciel Libre - Convention Systematic 2011GT Logiciel Libre - Convention Systematic 2011
GT Logiciel Libre - Convention Systematic 2011Stefane Fermigier
 
A Quick Tour of JVM Languages
A Quick Tour of JVM LanguagesA Quick Tour of JVM Languages
A Quick Tour of JVM LanguagesStefane Fermigier
 
Nuxeo on the Cloud - Nuxeo World 2011
Nuxeo on the Cloud - Nuxeo World 2011Nuxeo on the Cloud - Nuxeo World 2011
Nuxeo on the Cloud - Nuxeo World 2011Stefane Fermigier
 

Destacado (13)

The Nuxeo Way: leveraging open source to build a world-class ECM platform
The Nuxeo Way: leveraging open source to build a world-class ECM platformThe Nuxeo Way: leveraging open source to build a world-class ECM platform
The Nuxeo Way: leveraging open source to build a world-class ECM platform
 
Challenges du recrutement pour un editeur de logiciel libre
Challenges du recrutement pour un editeur de logiciel libreChallenges du recrutement pour un editeur de logiciel libre
Challenges du recrutement pour un editeur de logiciel libre
 
Open Cloud Computing @ GTLL
Open Cloud Computing @ GTLLOpen Cloud Computing @ GTLL
Open Cloud Computing @ GTLL
 
Nuxeo World Session: Mobile ECM Apps with Nuxeo EP
Nuxeo World Session: Mobile ECM Apps with Nuxeo EPNuxeo World Session: Mobile ECM Apps with Nuxeo EP
Nuxeo World Session: Mobile ECM Apps with Nuxeo EP
 
Lessons learned Building Nuxeo EP - Component-based, open source ECM platform
Lessons learned Building Nuxeo EP - Component-based, open source ECM platformLessons learned Building Nuxeo EP - Component-based, open source ECM platform
Lessons learned Building Nuxeo EP - Component-based, open source ECM platform
 
Eclipse Apogee and Nuxeo RCP
Eclipse Apogee and Nuxeo RCPEclipse Apogee and Nuxeo RCP
Eclipse Apogee and Nuxeo RCP
 
Nuxeo at 10
Nuxeo at 10Nuxeo at 10
Nuxeo at 10
 
Le Marché du Logiciel Libre en France en 2010
Le Marché du Logiciel Libre en France en 2010Le Marché du Logiciel Libre en France en 2010
Le Marché du Logiciel Libre en France en 2010
 
What's new in Nuxeo 5.2? - Solutions Linux 2009
What's new in Nuxeo 5.2? - Solutions Linux 2009What's new in Nuxeo 5.2? - Solutions Linux 2009
What's new in Nuxeo 5.2? - Solutions Linux 2009
 
GT Logiciel Libre - Convention Systematic 2011
GT Logiciel Libre - Convention Systematic 2011GT Logiciel Libre - Convention Systematic 2011
GT Logiciel Libre - Convention Systematic 2011
 
A Quick Tour of JVM Languages
A Quick Tour of JVM LanguagesA Quick Tour of JVM Languages
A Quick Tour of JVM Languages
 
Nuxeo on the Cloud - Nuxeo World 2011
Nuxeo on the Cloud - Nuxeo World 2011Nuxeo on the Cloud - Nuxeo World 2011
Nuxeo on the Cloud - Nuxeo World 2011
 
Cours ECM à l'EPITA
Cours ECM à l'EPITACours ECM à l'EPITA
Cours ECM à l'EPITA
 

Similar a ECM Meets the Semantic Web - Nuxeo World 2011

6 - Making Information Pay 2011 -- SOLOMON, MADI (Pearson)
6 - Making Information Pay 2011 -- SOLOMON, MADI (Pearson)6 - Making Information Pay 2011 -- SOLOMON, MADI (Pearson)
6 - Making Information Pay 2011 -- SOLOMON, MADI (Pearson)bisg
 
OpenAIRE at e-infrastructures DC-NET Brussels, October 2010
OpenAIRE at e-infrastructures DC-NET Brussels, October 2010OpenAIRE at e-infrastructures DC-NET Brussels, October 2010
OpenAIRE at e-infrastructures DC-NET Brussels, October 2010OpenAIRE
 
APIs and URLs for Social TV
APIs and URLs for Social TVAPIs and URLs for Social TV
APIs and URLs for Social TVDan Brickley
 
DCI - Data, Context and Interaction @ Jug Genova April 2011
DCI - Data, Context and Interaction @ Jug Genova April 2011DCI - Data, Context and Interaction @ Jug Genova April 2011
DCI - Data, Context and Interaction @ Jug Genova April 2011Fabrizio Giudici
 
06 making information pay 2011 -- solomon, madi (pearson)
06   making information pay 2011 -- solomon, madi (pearson)06   making information pay 2011 -- solomon, madi (pearson)
06 making information pay 2011 -- solomon, madi (pearson)bisg
 
Trove: A Government 2.0 Showcase August 2010, Australian Parliament
Trove: A Government 2.0 Showcase August 2010, Australian ParliamentTrove: A Government 2.0 Showcase August 2010, Australian Parliament
Trove: A Government 2.0 Showcase August 2010, Australian ParliamentRose Holley
 
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria
 
Intro to Linked Data: Context
Intro to Linked Data: ContextIntro to Linked Data: Context
Intro to Linked Data: ContextDavid Wood
 
Going Global - Workshop Version - Fall 2011
Going Global - Workshop Version - Fall 2011Going Global - Workshop Version - Fall 2011
Going Global - Workshop Version - Fall 2011Lucy Gray
 
Collecting sharing and improving data: changing roles for librarians and user...
Collecting sharing and improving data: changing roles for librarians and user...Collecting sharing and improving data: changing roles for librarians and user...
Collecting sharing and improving data: changing roles for librarians and user...Rose Holley
 
Collaborative Culture
Collaborative   CultureCollaborative   Culture
Collaborative Cultureroger Pitiot
 
Introduction to digital libraries - definitions, examples, concepts and trend...
Introduction to digital libraries - definitions, examples, concepts and trend...Introduction to digital libraries - definitions, examples, concepts and trend...
Introduction to digital libraries - definitions, examples, concepts and trend...Olaf Janssen
 
Authorities as Linked Data Hubs
Authorities  as Linked Data HubsAuthorities  as Linked Data Hubs
Authorities as Linked Data HubsRichard Wallis
 
Migration from FAST ESP to Solr
Migration from FAST ESP to SolrMigration from FAST ESP to Solr
Migration from FAST ESP to SolrTNR Global
 
In the land of the blind the squinter rules
In the land of the blind the squinter rulesIn the land of the blind the squinter rules
In the land of the blind the squinter ruleswremes
 

Similar a ECM Meets the Semantic Web - Nuxeo World 2011 (20)

6 - Making Information Pay 2011 -- SOLOMON, MADI (Pearson)
6 - Making Information Pay 2011 -- SOLOMON, MADI (Pearson)6 - Making Information Pay 2011 -- SOLOMON, MADI (Pearson)
6 - Making Information Pay 2011 -- SOLOMON, MADI (Pearson)
 
OpenAIRE at e-infrastructures DC-NET Brussels, October 2010
OpenAIRE at e-infrastructures DC-NET Brussels, October 2010OpenAIRE at e-infrastructures DC-NET Brussels, October 2010
OpenAIRE at e-infrastructures DC-NET Brussels, October 2010
 
Semantic Technologies for Cultural Heritage
Semantic Technologies for Cultural HeritageSemantic Technologies for Cultural Heritage
Semantic Technologies for Cultural Heritage
 
APIs and URLs for Social TV
APIs and URLs for Social TVAPIs and URLs for Social TV
APIs and URLs for Social TV
 
DCI - Data, Context and Interaction @ Jug Genova April 2011
DCI - Data, Context and Interaction @ Jug Genova April 2011DCI - Data, Context and Interaction @ Jug Genova April 2011
DCI - Data, Context and Interaction @ Jug Genova April 2011
 
06 making information pay 2011 -- solomon, madi (pearson)
06   making information pay 2011 -- solomon, madi (pearson)06   making information pay 2011 -- solomon, madi (pearson)
06 making information pay 2011 -- solomon, madi (pearson)
 
Antonio Pintus- TouchTheWeb 2010
Antonio Pintus- TouchTheWeb 2010Antonio Pintus- TouchTheWeb 2010
Antonio Pintus- TouchTheWeb 2010
 
Trove: A Government 2.0 Showcase August 2010, Australian Parliament
Trove: A Government 2.0 Showcase August 2010, Australian ParliamentTrove: A Government 2.0 Showcase August 2010, Australian Parliament
Trove: A Government 2.0 Showcase August 2010, Australian Parliament
 
Web heresies
Web heresiesWeb heresies
Web heresies
 
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)
 
Intro to Linked Data: Context
Intro to Linked Data: ContextIntro to Linked Data: Context
Intro to Linked Data: Context
 
Going Global - Workshop Version - Fall 2011
Going Global - Workshop Version - Fall 2011Going Global - Workshop Version - Fall 2011
Going Global - Workshop Version - Fall 2011
 
Sharing knowledge 2011
Sharing knowledge 2011Sharing knowledge 2011
Sharing knowledge 2011
 
Collecting sharing and improving data: changing roles for librarians and user...
Collecting sharing and improving data: changing roles for librarians and user...Collecting sharing and improving data: changing roles for librarians and user...
Collecting sharing and improving data: changing roles for librarians and user...
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Collaborative Culture
Collaborative   CultureCollaborative   Culture
Collaborative Culture
 
Introduction to digital libraries - definitions, examples, concepts and trend...
Introduction to digital libraries - definitions, examples, concepts and trend...Introduction to digital libraries - definitions, examples, concepts and trend...
Introduction to digital libraries - definitions, examples, concepts and trend...
 
Authorities as Linked Data Hubs
Authorities  as Linked Data HubsAuthorities  as Linked Data Hubs
Authorities as Linked Data Hubs
 
Migration from FAST ESP to Solr
Migration from FAST ESP to SolrMigration from FAST ESP to Solr
Migration from FAST ESP to Solr
 
In the land of the blind the squinter rules
In the land of the blind the squinter rulesIn the land of the blind the squinter rules
In the land of the blind the squinter rules
 

Más de Stefane Fermigier

Pitch Abilian - Paris Open Source Summit 2015
Pitch Abilian - Paris Open Source Summit 2015Pitch Abilian - Paris Open Source Summit 2015
Pitch Abilian - Paris Open Source Summit 2015Stefane Fermigier
 
15 ans de politiques publiques du logiciel libre en France
15 ans de politiques publiques du logiciel libre en France15 ans de politiques publiques du logiciel libre en France
15 ans de politiques publiques du logiciel libre en FranceStefane Fermigier
 
Créer une communauté open source: pourquoi ? comment ?
Créer une communauté open source: pourquoi ? comment ?Créer une communauté open source: pourquoi ? comment ?
Créer une communauté open source: pourquoi ? comment ?Stefane Fermigier
 
L'open source professionnel - un business model open source
L'open source professionnel - un business model open sourceL'open source professionnel - un business model open source
L'open source professionnel - un business model open sourceStefane Fermigier
 
Roadmap du GT Logiciel Libre 2013-2020
Roadmap du GT Logiciel Libre 2013-2020Roadmap du GT Logiciel Libre 2013-2020
Roadmap du GT Logiciel Libre 2013-2020Stefane Fermigier
 
Le MOOC powered by Abilian - Plateforme open source de MOOC
Le MOOC powered by Abilian - Plateforme open source de MOOCLe MOOC powered by Abilian - Plateforme open source de MOOC
Le MOOC powered by Abilian - Plateforme open source de MOOCStefane Fermigier
 
Pourquoi le big data open source ?
Pourquoi le big data open source ?Pourquoi le big data open source ?
Pourquoi le big data open source ?Stefane Fermigier
 
Pleniere du GT Logiciel Libre, janvier 2013
Pleniere du GT Logiciel Libre, janvier 2013Pleniere du GT Logiciel Libre, janvier 2013
Pleniere du GT Logiciel Libre, janvier 2013Stefane Fermigier
 
Nuxeo, an open source platform for content-centric business applications
Nuxeo, an open source platform for content-centric business applicationsNuxeo, an open source platform for content-centric business applications
Nuxeo, an open source platform for content-centric business applicationsStefane Fermigier
 
Open World Forum 2011 - Overview
Open World Forum 2011 - OverviewOpen World Forum 2011 - Overview
Open World Forum 2011 - OverviewStefane Fermigier
 
Plénière du GT Logiciel Libre - Février 2011
Plénière du GT Logiciel Libre - Février 2011Plénière du GT Logiciel Libre - Février 2011
Plénière du GT Logiciel Libre - Février 2011Stefane Fermigier
 
Samar - Premier bilan d'étape - Oct. 2010
Samar - Premier bilan d'étape - Oct. 2010Samar - Premier bilan d'étape - Oct. 2010
Samar - Premier bilan d'étape - Oct. 2010Stefane Fermigier
 
Pleniere du GTLL - Septembre 2010
Pleniere du GTLL - Septembre 2010Pleniere du GTLL - Septembre 2010
Pleniere du GTLL - Septembre 2010Stefane Fermigier
 

Más de Stefane Fermigier (20)

Pitch Abilian - Paris Open Source Summit 2015
Pitch Abilian - Paris Open Source Summit 2015Pitch Abilian - Paris Open Source Summit 2015
Pitch Abilian - Paris Open Source Summit 2015
 
15 ans de politiques publiques du logiciel libre en France
15 ans de politiques publiques du logiciel libre en France15 ans de politiques publiques du logiciel libre en France
15 ans de politiques publiques du logiciel libre en France
 
Créer une communauté open source: pourquoi ? comment ?
Créer une communauté open source: pourquoi ? comment ?Créer une communauté open source: pourquoi ? comment ?
Créer une communauté open source: pourquoi ? comment ?
 
L'open source professionnel - un business model open source
L'open source professionnel - un business model open sourceL'open source professionnel - un business model open source
L'open source professionnel - un business model open source
 
Roadmap du GT Logiciel Libre 2013-2020
Roadmap du GT Logiciel Libre 2013-2020Roadmap du GT Logiciel Libre 2013-2020
Roadmap du GT Logiciel Libre 2013-2020
 
Le MOOC powered by Abilian - Plateforme open source de MOOC
Le MOOC powered by Abilian - Plateforme open source de MOOCLe MOOC powered by Abilian - Plateforme open source de MOOC
Le MOOC powered by Abilian - Plateforme open source de MOOC
 
Pitch Abilian mai 2013
Pitch Abilian mai 2013Pitch Abilian mai 2013
Pitch Abilian mai 2013
 
Open Innovation in Action
Open Innovation in ActionOpen Innovation in Action
Open Innovation in Action
 
Pourquoi le big data open source ?
Pourquoi le big data open source ?Pourquoi le big data open source ?
Pourquoi le big data open source ?
 
Save the date OWF 2013
Save the date OWF 2013Save the date OWF 2013
Save the date OWF 2013
 
Ecosystemes logiciel libre
Ecosystemes logiciel libreEcosystemes logiciel libre
Ecosystemes logiciel libre
 
Pleniere du GT Logiciel Libre, janvier 2013
Pleniere du GT Logiciel Libre, janvier 2013Pleniere du GT Logiciel Libre, janvier 2013
Pleniere du GT Logiciel Libre, janvier 2013
 
OWF 2012 Outcome
OWF 2012 OutcomeOWF 2012 Outcome
OWF 2012 Outcome
 
Demo Cup 2012
Demo Cup 2012Demo Cup 2012
Demo Cup 2012
 
Four Python Pains
Four Python PainsFour Python Pains
Four Python Pains
 
Nuxeo, an open source platform for content-centric business applications
Nuxeo, an open source platform for content-centric business applicationsNuxeo, an open source platform for content-centric business applications
Nuxeo, an open source platform for content-centric business applications
 
Open World Forum 2011 - Overview
Open World Forum 2011 - OverviewOpen World Forum 2011 - Overview
Open World Forum 2011 - Overview
 
Plénière du GT Logiciel Libre - Février 2011
Plénière du GT Logiciel Libre - Février 2011Plénière du GT Logiciel Libre - Février 2011
Plénière du GT Logiciel Libre - Février 2011
 
Samar - Premier bilan d'étape - Oct. 2010
Samar - Premier bilan d'étape - Oct. 2010Samar - Premier bilan d'étape - Oct. 2010
Samar - Premier bilan d'étape - Oct. 2010
 
Pleniere du GTLL - Septembre 2010
Pleniere du GTLL - Septembre 2010Pleniere du GTLL - Septembre 2010
Pleniere du GTLL - Septembre 2010
 

Último

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Último (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

ECM Meets the Semantic Web - Nuxeo World 2011

  • 1. When ECM Meets the Semantic Web 20 Oct 2011 - Olivier Grisel & Stefane Fermigier Open Source ECM Thursday, October 20, 2011
  • 2. Business Motivations 2 Thursday, October 20, 2011
  • 5. The DIKW hierarchy 5 Thursday, October 20, 2011
  • 6. But every coin has another side Thursday, October 20, 2011
  • 8. A few figures • 50% more data / content / information produced every year • 1.8 zettabytes of data produced in 2011 (= 1 billion terabytes) • Employees are drowning in a sea of email, status messages, etc., and spend on average more than 6 hours / weeks unsuccessfully searching for or recreating lost documents Thursday, October 20, 2011
  • 9. A Solution: the Semantic Web 9 Thursday, October 20, 2011
  • 10. A Brief History of the Web • Web 1.0 (1990-now): web of sites and pages, aka the World Wide Web • Web 2.0 (2000-now): web of people and of participation, aka the Social Web (Blogs, RSS, tags, Facebook, Wikipedia, etc.) • Web 3.0 (2010-now): web of data, of meaning and connected knowledge, aka the Semantic Web 10 Thursday, October 20, 2011
  • 12. “To a computer, then, the web is a flat, boring world devoid of meaning” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/ 12 Thursday, October 20, 2011
  • 13. “This is a pity, as in fact documents on the web describe real objects and imaginary concepts, and give particular relationships between them” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/ 13 Thursday, October 20, 2011
  • 14. “Adding semantics to the web involves two things: allowing documents which have information in machine-readable forms, and allowing links to be created with relationship values.” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/ 14 Thursday, October 20, 2011
  • 15. “The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/ 15 Thursday, October 20, 2011
  • 16. Means and Tools 16 Thursday, October 20, 2011
  • 17. 4 stages • Extract meaning from raw data / content • Connect information to form knowledge • Reason about this knowledge • Present this knowledge in actionable form 17 Thursday, October 20, 2011
  • 18. Extracting • Leverage metadata embedded in or associated with documents (when they exist) • Or use machine learning, NLP (Natural Language Processing) and image processing algorithms to extract meaning from text / images • Examples include: named entities extraction, automatic categorization / tagging, sentiment analysis, etc. 18 Thursday, October 20, 2011
  • 19. Interlude: Linked Open Data 19 Thursday, October 20, 2011
  • 20. 2007 2008 2009 2010 20 Thursday, October 20, 2011
  • 21. 2011! 21 Thursday, October 20, 2011
  • 22. Linking • Many Linked Open Data repositories have been made available over the last 10 years • RDF and graph database systems are now available to manage this huge mass of information (billions of triples) • Match information extracted from content with these public (or internal) data/knowledge bases 22 Thursday, October 20, 2011
  • 23. Reasoning • When you are working on reliable metadata (ex: RDFa embedded in web pages), you can use rule / inference engines to infer actionable knowledge from your content (ex: shopping recommendation engine) • Rules can also be used to clean up / flag errors when working with unreliable (e.g. automatically extracted) information 23 Thursday, October 20, 2011
  • 24. Presenting • Allow the users of your system to interact with the knowledge thus extracted or produced, in a way that allows them to do their jobs better • A smart presentation system solves the information overload issue by contextualizing the information, i.e. presenting only information relevant to what the user is currently doing 24 Thursday, October 20, 2011
  • 25. R&D Projects Involving Nuxeo 25 Thursday, October 20, 2011
  • 26. IKS project • European R&D project under the FP7, with 13 partners (6 SMEs) and a 8.5M EUR budget • Goal: create a semantic software “stack” that will be used by CMS vendors to add semantic features to their products • Started in Jan. 2009, will last until Dec. 2012 • First tangible result: Apache Stanbol (more about this later) 26 Thursday, October 20, 2011
  • 27. SAMAR project • French collaborative R&D project with 10 partners, and a 4.5M EUR budget • Goal: create a platform for managing multimedia content in arabic, for news agencies such as AFP • Will include: automated translation, named entities extraction, content classification • First results: integration between Nuxeo and Temis (more later) 27 Thursday, October 20, 2011
  • 28. State of the Art Semantic ECM at Nuxeo 28 Thursday, October 20, 2011
  • 29. The Semantic Engine • From unstructured content to Knowledge • Language guessing • Topic classification (Business, Sports, Media, ...) • Named Entities extraction and linking • Relationships and properties extraction 29 Thursday, October 20, 2011
  • 30. Demo time! 30 Thursday, October 20, 2011
  • 34. RESTful is Beautiful 34 Thursday, October 20, 2011
  • 37. = Semantic Engines (Apache OpenNLP) + Fast Linked Data local index (Apache Solr) + Semantic Rule Engine 37 (Apache Jena) Thursday, October 20, 2011
  • 38. Apache Stanbol Engine 1 DBpedia Engine 2 2 1 Engine 3 Freebase Nuxeo DM 3 addon Geonames LDAP Local IT infrastructure (LAN) 38 Thursday, October 20, 2011
  • 39. How to build engines? 39 Thursday, October 20, 2011
  • 40. Training statistical models for NER with Wikipedia and DBpedia • Extract sentences with link positions in Wikipedia articles • DBPedia to the find type of the target entity (Person, Location, Organization) • Apache Pig scripts to compute the join + format the result as training files for OpenNLP • Apache OpenNLP to build and evaluate the models • Apache Hadoop for distributed processing • Apache Whirr for deployment and management on Amazon EC2 cluster 40 Thursday, October 20, 2011
  • 45. Training statistical models for topic classification from Wikipedia and DBpedia • Filter category tree from DBpedia SKOS entries (~500k) • Pig scripts to compute the joins with articles abstracts for all the articles categorized in Wikipedia • Export as 2.8GB TSV file to be indexed in Apache Solr • Use Solr MoreLikeThisHandler to find the top 3 most related Wikipedia category for any kind of text • Apache Whirr & Hadoop for deployment and management on Amazon EC2 cluster 45 Thursday, October 20, 2011
  • 46. Wrap Up on Recent Work • Full offline mode: Stanbol EntityHub • Multi-lingual Indexes • New UI for occurrences reviews • Temis Luxid Annotation Factory integration 46 Thursday, October 20, 2011
  • 47. What’s next? • Stanbol and Temis connection in Admin Center • Embedded Stanbol mode for easy deployment • More OpenNLP models for more languages • Finalize topic classification - handle hierarchy • Tight integration with Nuxeo DM search features 47 Thursday, October 20, 2011
  • 48. Thank you for your attention! 48 Thursday, October 20, 2011