Slides of the AIMS (http://aims.fao.org/) webinar of 21 September 2017 of Martin Kaltenböck and Timea Turdean (Semantic Web Company) about: Text Mining in PoolParty Semantic Suite (https://www.poolparty.biz)
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Text Mining in PoolParty Semantic Suite
1. Martin Kaltenböck
CFO, Semantic Web Company
Timea Turdean
Technical Consultant, SWC
POOLPARTY
SEMANTIC
SUITE
AIMS Webinar
21st Sept 2017
1
2. PoolParty
Drupal
Integration
2
Agenda
▸ Introduction Semantic Web Company (SWC)
▸ Introduction PoolParty Semantic Suite
▸ Using PoolParty for Text & Data Mining
▹ Text Mining for continuous knowledge graph modelling
▹ Entity linking and data integration
▹ Classification and semantic annotation / tagging
▸ DEMO(s) of text mining capability of PoolParty
▸ Customer Success Stories
▹ REEEP ClimateTagger
▹ healthdirect Australia
▹ CTCN Semantic Search
▹ EIP Water Matchmaking
▸ Q&A Session
4. INTRODUCING
SEMANTIC
WEB COMPANY
Semantic Web Company (SWC)
▸ Founded in 2004
▸ Based in Vienna
▸ Privately held
▸ 40+ employees, experts in text
mining & linked data
▸ ~15-20% revenue growth / year
▸ 2.5 Mio Euro funding for R&D
▸ SWC named to KMWorld’s 2017
‘100 Companies That Matter in
Knowledge Management’
▸ Organising SEMANTiCS
conference series for 13 years
▸ https://www.semantic-web.com
4
5. INTRODUCING
POOLPARTY
PoolParty Semantic Suite
▸ First release in 2009
▸ Current version 6.0
▸ W3C standards compliant
▸ Over 200 installations
worldwide
▸ 50% of revenue is reinvested
into PoolParty development
PoolParty on-premises or
used as a cloud service
▸ KMWorld listed PoolParty as
Trend-Setting Product 2015,
2016 and 2017
▸ https://www.poolparty.biz/
5
6. SELECTED
CUSTOMER
REFERENCES
AND PARTNERS
SWC head-
quarters
6
Customer References
● Credit Suisse
● Boehringer Ingelheim
● Roche
● adidas
● The Pokémon Company
● Canadian Broadcasting Corporation
● Harvard Business School
● Wolters Kluwer
● Talend
● HealthStream
● TC Media
● Techtarget
● Seek
● Alliander N.V.
● Pearson - Always Learning
● Education Services Australia
● American Physical Society
● Healthdirect Australia
● World Bank Group
● Inter-American Development Bank
● Renewable Energy Partnership
● Wood MacKenzie
● Oxford University Press
● International Atomic Energy Agency
● Norwegian Directorate of Immigration
● Ministry of Finance (AT)
● Council of the E.U.
● Australian National Data Service
Partners
● Accenture
● EPAM Systems
● Enterprise Knowledge
● Mekon Intelligent Content Solutions
● B-S-S Business Software Solutions
● MarkLogic
● Wolters Kluwer
● Digirati
● Quark
US
East
US
West
AUS/
NZL
UK
8. TECHNICAL
CORE
COMPONENTS
8
Bain Capital is a venture capital
company based in Boston, MA.
Since inception it has invested in
hundreds of companies including AMC
Entertainment, Brookstone, and Burger
King. The company was co-founded by
Mitt Romney.
Taxonomy &
Ontology Server
Entity Extractor &
Text Mining
Data Integration &
Data Linking
Unstructured
Data
Semi-
structured
Data
Structured
Data
Unified
Views
PoolParty
GraphSearch
Identify new
candidate concepts
to be included in a
controlled vocabulary
Controlled vocabularies
as a basis for highly
precise entity
extraction
Entity Extractor informs
all incoming data
streams about its
semantics and links them
Schema mapping
based on ontologies
RDF
Graph Database
11. ‘Elevator
Pitch’
▸ Built as a ‘Semantic Middleware’
▸ Outstanding user-friendliness
▸ Fully standards-compliant
▸ Highly precise entity extraction
▸ Comprehensive API
▸ Excellent maintainability of extraction models
▸ Integrated with leading search engines & graph databases
▸ Integrated with leading content management platforms
▸ Product configuration options for growing requirements
▸ Highly expertised partners / service team
11
12. Product
Overview
All products are
available as
cloud services or
for on-premise
installation
> PoolParty
Feature & Price
Matrix
12
PoolParty
Basic
Server
PoolParty
Advanced
Server
PoolParty
Enterprise
Server
PoolParty
Semantic
Integrator
SKOS Taxonomy Management
Multiple Projects
Taxonomy Rest API
Import/Export (incl. Excel)
Rollback and History
Ontologies and Custom Schemes
Quality Management & Reports
Advanced Corpus Management
Vocabulary Mapping, Linked Data Mapping
Linked Data Enrichment, Frontend, and SPARQL endpoint
Entity Extractor Extractor API
Auto Populate project from DBpedia
Export to Remote Repository
Workflow Management
SKOS-XL (optional)
Integration with Graph databases
Integration with Search engines
Data linking & mapping
Data transformation pipelines with UnifiedViews
Graph Search Server
16. Metadata and
semantic data
16
The Peggy Guggenheim Collection
is a modern art museum on the
Grand Canal in the Dorsoduro
sestiere of Venice, Italy. It is one of
the most visited attractions in
Venice. The collection is housed in
the Palazzo Venier dei Leoni, an
18th-century palace, which was
the home of the American heiress
Peggy Guggenheim for three
decades. She began displaying her
private collection of modern
artworks to the public seasonally
in 1951. After her death in 1979, it
passed to the Solomon R.
Guggenheim Foundation, which
eventually opened the collection
year-round.
17. Metadata and
semantic data
17
The Peggy Guggenheim Collection
is a modern art museum on the
Grand Canal in the Dorsoduro
sestiere of Venice, Italy. It is one of
the most visited attractions in
Venice. The collection is housed in
the Palazzo Venier dei Leoni, an
18th-century palace, which was
the home of the American heiress
Peggy Guggenheim for three
decades. She began displaying her
private collection of modern
artworks to the public seasonally
in 1951. After her death in 1979, it
passed to the Solomon R.
Guggenheim Foundation, which
eventually opened the collection
year-round.
Peggy Guggenheim
Peggy Guggenheim
Collection
Venice
Canale
Grande
http://my.com/resource/328832
skos:preLabel
http://my.com/docs/45367
skos:preLabel
http://my.com/docs/52345
skos:preLabel
http://my.com/resource/328832
skos:preLabel
18. Metadata and
semantic data
18
The Peggy Guggenheim Collection
is a modern art museum on the
Grand Canal in the Dorsoduro
sestiere of Venice, Italy. It is one of
the most visited attractions in
Venice. The collection is housed in
the Palazzo Venier dei Leoni, an
18th-century palace, which was
the home of the American heiress
Peggy Guggenheim for three
decades. She began displaying her
private collection of modern
artworks to the public seasonally
in 1951. After her death in 1979, it
passed to the Solomon R.
Guggenheim Foundation, which
eventually opened the collection
year-round.
Peggy Guggenheim
Peggy Guggenheim
Collection
Venice
museum
Canale
Grande
skos:preLabel
http://my.com/docs/45367
skos:preLabel
http://my.com/docs/52345
skos:preLabel
skos:preLabel
http://my.com/resource/62545
skos:preLabel
http://www.mycom.com/
images/90546089
imgae
has ladmark
named after
http://my.com/resource/328832
http://my.com/resource/328832
hosted in
hosted in
has
19. Metadata and
semantic data
19
The Peggy Guggenheim Collection
is a modern art museum on the
Grand Canal in the Dorsoduro
sestiere of Venice, Italy. It is one of
the most visited attractions in
Venice. The collection is housed in
the Palazzo Venier dei Leoni, an
18th-century palace, which was
the home of the American heiress
Peggy Guggenheim for three
decades. She began displaying her
private collection of modern
artworks to the public seasonally
in 1951. After her death in 1979, it
passed to the Solomon R.
Guggenheim Foundation, which
eventually opened the collection
year-round.
Peggy Guggenheim
Collection
dct:title
Mike Miller
Michael Miller
skos:prefLabel
skos:altLabel
dct:creator
http://my.com/docs/328832
http://my.com/people/32schema:Article
rdf:type
http://my.com/img/99.jpg
schema:image
skos:subject
Peggy Guggenheim
Collection Venice
museum
skos:prefLabel
skos:subject
skos:altLabel
skos:broader
skos:prefLabel
schema:image
Canale
Grande
skos:prefLabel
20. Resolving Language Problems
“While most people can deal with
linguistic features as synonyms,
homographs, polyhierarchies,
and even with far more peculiar
characteristics of natural
languages, machines often
struggle with automatic sense-
making because of the lack of a
semantic knowledge model that
can be used programmatically.”
22. PoolParty
Extractor
Uses several components of a knowledge model:
▸ Taxonomies based on the SKOS standard
▸ Ontologies based on RDF Schema or OWL
▸ Word form dictionaries
▸ Blacklists and stop word lists
▸ Disambiguation settings
▸ Domain-specific reference document corpus
▸ Statistical language model
22
23. PoolParty’s
SKOS editor
23
The Audi Q3 is a compact
crossover SUV made by
Audi.
It is based on the PQ35
platform of Volkswagen.
A5 platform
A series
25. ‘Setting the
rules’ for text
mining & entity
extraction via
thesaurus
25
Proper use of an funduscope
requires a bit of practice and
familiarity with the functions of
your device.
Diagnostic Equipment
Ophtalmoscope
28. Corpus
analysis results
in a network of
concepts and
terms
28
I need support to
continuously extend our
taxonomy / controlled
vocabulary!
skos:
Concept
Reference
Corpus
- Websites
- PDF, Word, …
- Abstracts from
DBpedia
- RSS Feeds
skos:
Concept
skos:
Concept
Term 1
Term 3
Term 7
Term 8
Term 6
Term 4
Term 2
Term 5
- Relevant terms and phrases
- Relevancy of concepts
- co-occurence between concepts and terms
- co-occurence between terms and terms
31. PoolParty as a
supervised
learning
system
31
Content Manager
Integrator
Taxonomist/
Ontologist
Thesaurus
Server
Extractor
PowerTagging
uses API
is user of
is user of
is basis of
is basis of
Index
annotates
enriches
Referenc
e Corpus
CMS
extends
is basis of
analyzes
uses API
33. PoolParty
Semantic
Integrator -
at a glance
https://youtu.be/l_LppfS3wxk
33
Deep Data
Analytics
Semantic
Search
Semantic
Integrator
Unstructured
Data
Structured
Data
ETL / Monitoring / Scheduling
39. Use Cases:
Text Mining &
Linked Data
▸ Climate Tagger (PDF)
Streamline and catalogue data and information resources
▸ healthdirect Australia (PDF)
Semantic Search based on the Australian Health Thesaurus
▸ CTCN Semantic Search
Integrating thousands of documents from several sources on climate technology
▸European Innovation Partnership /EIP) on Water
Online Marketplace including semantic Matchmaking
39
40. Place your screenshot here
40
Climate Tagger
Help organizations in the
climate and development
arenas catalogue, categorize,
contextualize, and connect data
and information resources.
Climate Tagger is backed by the
expansive Climate Compatible
Development Thesaurus.
http://www.climatetagger.net
42. Place your screenshot here
42
EIP Water
Matchmaking
Controlled vocabularies enable
accurate matchmaking
between Supply and Demand
for Water Innovation in Europe.
Matchmaking is based upon
the EIP Water Innovation
Thesaurus (GEMET based).
http://www.eip-water.eu
43. Place your screenshot here
43
CTCN Semantic
Search
Help organisations in the climate
technology field to explore and find
relevant content from thousands of
Drupal Nodes and several sources
using PoolParty, PowerTagging and
s0nr webmining
CTCN is backed by the CTCN
Climate Technology Thesaurus.
https://www.ctc-n.org/semantic-search
44. Place your screenshot here
44
healthdirect
Australia
Integrated views and
semantic search over more
than 100 trusted sources.
Harmonization of various
metadata systems through
the use of a central
vocabulary hub:
Australian Health Thesaurus.
http://www.healthdirect.gov.au
45. SUMMARY
WHY
TAXONOMISTS
AND
INFORMATION
ARCHITECTS
LIKE
POOLPARTY
Read more
Different project stakeholders expect specific
qualities from a semantic technology platform:
45
I am a taxonomist. I need a tool that
provides convenient functionalities and
intuitive user interfaces for my daily work.
I am an information architect. Enterprise
metadata management deserves scalable
technologies, which provide semantic services
on top of rich APIs based on standards.
Welcome - 3’ - Martin & Timea
SWC & PP - 10’ max - Martin
Using PP - 10 - Timea
Demos - 12’ - Timea
Customer Stories - 10’ (max) - Martin
TOTAL = 45’ plus Q&A
In the core of each application that builds upon a semantic information architecture, we clearly distinguish between content layer, metadata layer, semantic layer, and the navigation logic on top
Metadata layer
Not actionable
Semantic layer
Adding meaning to the metadata
Strings become ‘things’ and can be linked between them and enriched with more data.
Semantic layer
Adding meaning to the metadata
Strings become ‘things’ and can be linked between them and enriched with more data.
By adding a semantic layer that contains facts one can quickly find the document when searching for museum, even if ‘museum’ is not in the text.
A semantic layer is a network (or graph) of things including its relations and attributes such as its various names. This layer serves like a glue to link all information available for a certain business object (‘thing’ or ‘resource’) scattered across various repositories and data silos in order to create a complete picture of it.
* Disambiguation ‘Jaguar is owned by Tata Motors’, Jaguar = homograph.
* Synonyms = refer to the same things
* A polyhierarchy describes an entity or concept as a child concept of at least two parent concepts.
Taxonomies based on the SKOS standard
‘Setting the rules’ for text mining & entity extraction via thesaurus
Ontologies based on RDF Schema or OWL
Word form dictionaries
Blacklists and stop word lists
Annotated concepts will then be compared with the surroundings of the potentially ambiguous extracted entities in a given text.
Annotated concepts will then be compared with the surroundings of the potentially ambiguous extracted entities in a given text.
Domain-specific reference document corpus
Show dupal.poolparty.biz/PoolParty
Show dupal.poolparty.biz