Unstructured content represents more than 80% of information assets of an enterprise. A triple is a way of encoding information about objects and enables computer to access, mesh, and take action on information. Triples make claims about objects and may be published in knowledge bases accessed by parties that have no particular knowledge of each other. Based an award-winning natural language processing pipeline that analyzes the content, Luxid® extracts information about of the organization’s entities of interests and their relationships to derive precise and relevant triples in 20 languages.
2. scopeKM
Knowledge Management
Text analysis with Triples -2-
Understanding the meaning of naturally spoken
and written text
It’s a potential treasure trove of business information – but are they exploiting?
Luxid® extracts business information from the unstructured content, structures it
into triples and feeds them into the MarkLogic triple store, enabling to query,
visualize and analyze them for insights that are critical to the competitiveness.
What’s a triple?
A triple is a way of encoding information
about objects. It is a key part of RDF
(Resource Description Framework), a W3C
standard that enables computer to access,
mesh, and take action on information that
is distributed across the Web. RDF triples
take the shape of statements that link a
subject to an object via a predicate. In each
of the following three triples, the predicate
linking both objects has been italicized.
Leonardo_DiCaprio stars_in Titanic
James_Cameron directed Titanic
Titanic lounched_in 1997
Triples typically make claims about world
objects (also called resources or entities) –
in the above example, actors, directors,
movies – and may be published in
knowledge bases accessed by parties that
have no particular knowledge of each
other. To ensure robust operation, RDF
triples therefore unambiguously identify
each of the entities they refer to with a
Unique Resource Identifier (URI), and
predicates by reference to a vocabulary (or
ontology) published alongside the
knowledge base.
How can triples be used?
Triples can be queried, navigated
visualized and analyzed in the context of
any task that has at its core the
exploitation of knowledge, whether
proprietary to proprietary to the
organization, or available from a third
party. Recurring use cases that leverage
triples include:
Linked Open Data
DBPedia is an open data initiative that
involves the public sharing of knowledge
housed in a query-able triple store. Similar
query-able information repositories
include Geonames (geographical features),
data.gov (US federal, state, and local data)
as well as legislation.gov.uk (UK statutory
law). In the Life Sciences, UniProt and
DrugBank are similar initiatives that offer
information about proteins and drugs.
Commercial Information
Products
Triple stores can likewise be exploited for
commercial publications. In portals, they
enable new added-value information and
analytics features alongside more
traditional content-driven offerings. The
3. scopeKM
Knowledge Management
Text analysis with Triples -3-
BBC coverage of the 2012 Olympics is well-
known example based on this approach to
report key information about countries,
teams, players, and disciplines. Knowledge
bases that rely on query-able triple stores
are also growing product category. They
enable the seamless integration of
structured information into end-user
workflow applications and analytics tools.
Enterprise Linked Data
Triple stores may also house proprietary
information about any entity present in an
organization’s world view: other
Organizations (suppliers, competitors,
employees, notable individuals), Products
(Parts, Accessories, Options), Objects of
research (molecules, diseases,
investments), etc.
Here again, such information can then be
queried, explored, visualized or analyzed
to answer questions such as the following:
€ What business relationships exist
between a potential partner and my
competitors?
€ How do our clinical results
compare to publicly available
information about side effects
caused by molecules with
comparable modes of action?
€ Which experts are mostly closely
involved in our area of
investigation yet most remote from
our teams?
Provided relevant information is also
available from third-party triple stores
(commercial or open), it can be conjointly
analyzed with proprietary triples, enabling
insights that would not be available
otherwise.
Semantically enriched triple store
4. scopeKM
Knowledge Management
Text analysis with Triples -4-
How do you crate
triples?
Unstructured content represents in
average more than 80% of organization’s
information assets, a potential treasure
trove of business insights. Thanks Luxid®,
a complementary application MarkLogic
that is extracted via Web Services, a
company can now extract business
information from it and feed it as triples
into the triple store.
Based an award-winning natural language
processing pipeline that analyzes the
content, Luxid® extracts information
about of the organization’s entities of
interests and their relationships to derive
precise and relevant triples in 20
languages. Aligned with the taxonomy or
ontology, these triples then become
natively accessible to any application
leveraging the MarkLogic triple store, in
particular for querying, visualization and
analytics purpose.
Platform overview and key components
€ Robust and scalable platform based on UIMA/XML
architecture
€ Extracts RDF triples based on entities, relationships,
sentiments, topics or terminology mentioned in text
€ Categorizes documents and performs corpus
clustering
€ Extraction engines based on syntax, statistics, taxono-
my, machine learning and domain-specific rules
€ 20 language supported
€ Each Skill Cartridge focuses on recurring areas of
interests: people and location names, information
about companies and their relationships, cate-
gorization of news, biology, economy, security, etc.
€ The Studio enables the customization of existing Skill
Cartridges as well as the creation of new ones
€ Import of the taxonomy/thesaurus into Skill Cartidge
€ Exploit the morpho-syntactic reasoning, statistical
models, machine learning and/or domain-specific
rules
€ Measure, track and optimize extraction rules