2. An “Intense” definition
Enterprise – Linked Data – Clouds
• Enterprise not all of them
• Linked Data is not exactly what you get when
you google up
• Cloud has a double meaning
3. Knowledge Intensive Enterprises
• Those that will live and dies by their ability to
incorporate new diversely structured
knowledge in their processes and products
– Examples:
• Health Care Life Science
• Scientific and Technical Publishing
• Defense, Intelligence
• …
4. Example story (Pharmaceutical company0
To stay competitive, Pharmaceutical companies need to leverage all the data
available from inside sources as well as from the increasingly many public
HCLS data sources available. Due to the diversity of this data with respect to
nature, formats, quality, there are complex integration issues . Goals:
• The ability to speed up “In silico” scientific workflows
• The ability to create large scale “data maps” or “aggregated views”
• The ability to receive recommendations and suggestions for new data
connections
• Provide their R&D departments with superior tools for investigating their
internal knowledge; search engines and data browsing tools
• The ability to leverage the ever increasing body of public, crowd curated
open data
4 of 16
10. And this data..
• IS BIG
• Can be Fast
• IS Extremely Variable
• Gartner’s 3v: Volume Velocity Variability
11. Scale is only 1 dimension
Multiple dimensions of WeD data integration
• RDF tool stack flexibility
• Cluster scalable processing scalability
• “Cloud” Pipelines dynamicity
12. How we started : a search engine for
the web of data (Sindice.com)
Web of data
650,000,000 Knowledge Graphs 5 TB + of “Big Knowledge
Data”data.
13. SindiceTech
• Incorporating requirements from enterprises
– Scientific and Technical content companies
– Defense
– Pharma and Biotech
• Inheriting 5 years of IP with R&D on:
– Semantic Technologies RDF and a pragmatic
stack around it
– Handle very large amount of Knowledge Data
• Hadoop/NOSQL
• Semantic Information Retrieval
14. Source
BI / DSS
Systems
RDBMS Pivot
Pipeline Composer UI Browser
S3 Semantic IR (SIRen)
SparQLed
Loaders / Outbox
Adaptors / Inbox
Integration
Transformati Solr
HDFS on &
Analytics No SQL
FTP Pipeline
RDBMS
Semantic Layer (RDF)
Event Logging (Splunk / Logstack)
3rd Party
Big Data Layer (Hadoop, Hive, Pig) / Cloudera BI / DSS
e.g. SAS
Other Cloud Layer (e.g. Amazon, Openstack) HPA
Middleware for Big Knowledge Processing