ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic Web stack in a corporate IS environment.

Linked Enterprise Data
LEVERAGING THE SEMANTIC WEB STACK
IN A CORPORATE ENVIRONMENT

ISWC 2012 – BOSTON
FABRICE LACROIX – LACROIX@ANTIDOT.NET
1
Copyright Antidot™

Antidot – who we are

French-based Software Vendor
 Since 1999 | Paris, Lyon, Aix-en-Provence
 Information access | Data management

Mission: Provide our customers with innovative
customizable solutions that help them create
value with their data, and make their employees
more aware and efficient.

2

Clients
Enterprises Publishing E-commerce

Healthcare

3

Unstructured documents

files, ECM, collaborative spaces
intranet, extranet, Web sites
e-mails, instant messaging

4

Structured data

CRM, ERP, directory
knowledge bases
business applications (production, support)

5

IS are bloated
1 practice => 1 need => 1 application => 1 silo
Information system is driven by the process
Data are numerous, various and scattered

6

Solutions or workarounds?

BI MDM

SOA Search

7

Solutions and workarounds
Enterprise Search brings little value to users
 Document oriented
 Does not solve real business problems

Google like Verity like

8

What we want

9

What we want
ERP

CRM

Production

LDAP

ECM

Support Files 10

Changing the paradigm

Switching from an application view to a
data centric way of thinking.

11

Bring out the implicit

Build the Giant Enterprise Graph

12

LED

Linked Enterprise Data
application of the Semantic Web technologies
and Linked Data principles to the enterprise
infrastructure

13

What works for the Web…

Federating silos on the Web

http://www.w3.org/People/Ivan/CorePresentations/RDFTutorial/Slides.html#(102)
14

…can’t always be used

in corporate IS
 Legacy apps can’t be "Sparql’ed"
 80% un- or semi- structured data don’t fit in the model
as such
 Defining vocabularies/ontologies for silos is too
complex and expensive
 Don’t want RDF per se but valuable information
 External data is available in XML/JSON through Web
Services
 Staff trained for RDB, XML, Web apps.
 No Risk and stability strategy: SemWeb technology
considered as new and immature
15

The RDF/storage approach

Setting up a global RDF repository does not
work either
 ITs are afraid by the "RDF everywhere" activists

16

Semantic Web technology
still is the right solution
in corporate environment
BUT it is not an aim
JUST use it

as a means
17

Just do it

Think of it as a stream paradigm
 build new objects using existing data
 without interfering with the existing infrastructure
 with SemWeb somewhere under the hood

18

Enterprise Graph HowTo

Construct the graph
 generate triples from data
 create triples from documents
Leverage the graph
 enrich
 infer
Browse the graph
 select resources
 build objects
Trash the graph
19

How: extract & normalize

Harvest and normalize
 as in an ETL
 fetch, clean, transform…
 normalize records (names, IDs) to prepare the
linking step

For databases
 db2triples : an RDB2RDF implementation by
Antidot (open source, W3C validated)

20

How: semantize

Don’t transform everything in RDF
 cherry-pick a subset of interesting fields for
each object and create their RDF triples
counterpart
 interesting == needed for linking or inferring

Semantize

21

How: semantize

Triples generation
 Be smart: avoid upfront ontology design, use
small vocabularies
 Be pragmatic: transform XML tags and field
names to predicates
 Be agile: only insert what you need. And when
you need more, add more.

Semantic Web fuels the modeling, linking
and information building process

22


Construct the graph
Leverage the graph
 enrich
 infer
Browse the graph
 build objects
Trash the graph
23

How: semantize

 Extract metadata and transform them as
needed to RDF.
➡ Ex: author =>dc:creator

 Use of text-mining to extract named entities:
people, organizations, products…
➡ generate those entities list using the data sources:
directory for employees, CRM for companies and
people, ERP for products
➡ create triples like doc_URI quotes entity_URI

24

How: semantize

 Compare documents using various and
dedicated algorithms
➡ is the same
➡ is included
➡ is similar
➡ is related
 Generates new triples
➡ create triples like
<docA>is_sub_version_of<docB>

25


Construct the graph
Leverage the graph
 enrich
 infer
Browse the graph
 build objects
Trash the graph
26

How: enrich

Enrich the graph
 run specific algorithms to generate more links
and triples (classifiers, topic detection, …)
 insert external data gathered from the LOD or
other external datasets or APIs

27

How: infer

Create new knowledge
 add rules according to your needs

IF a coworker is quoted in documents
AND this coworker belongs to a business unit
THEN the business unit is bound to the documents
28


Construct the graph
Leverage the graph
 enrich
 infer
Browse the graph
 build objects
Trash the graph
29

How: build

Build
 select resources corresponding to objects
seeds (using Sparql queries)
 for each seed, follow links smartly in order to
create basic objects

Build

30

How: build

Finalize
 decorate the new knowledge objects with data
set apart (not loaded in the triplestore)
 now we have rich user-actionable objects

Build Finalize
31


Construct the graph
Leverage the graph
 enrich
 infer
Browse the graph
 build objects
Trash the graph
32

How: expose

Make the new information available to
users and to the entire IS

Relational DB
Enrich
Harvest Semantize
RDF Triplestore
(Linked Data)

Normalize Classify
Annotate

Indexation AFS search
engine

33

Conclusion

It works!
 The triples we create and the inference rules
we add are dictated by the goal / application
➡ usage and value oriented
 We benefit from the lazy-flexible-dynamic
modeling of RDF-RDFS-OWL
➡ we are agile
 What matters is the graph. But the graph is
not the triplestore
➡ storage independent

34

There’s an app for that

Antidot Information Factory
 a software solution designed specifically
to leverage structured and unstructured data
 enable large-scale processing of existing data
 automate publishing of enriched or newly
created information.

Harvest Normalize Semantize Enrich Build Expose

35

The Giant Enterprise Graph

Now we have a path to let SemWeb enter
the enterprise

36

Discuss
Understand
Learn
Exchange

www.antidot.net
info@antidot.net

THANKS FOR YOUR ATTENTION
QUESTIONS?

37

ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic Web stack in a corporate IS environment.

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (16)

Similar a ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic Web stack in a corporate IS environment.

Similar a ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic Web stack in a corporate IS environment. (20)

Más de Antidot

Más de Antidot (20)

Último

Último (20)

ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic Web stack in a corporate IS environment.

Notas del editor