Más contenido relacionado La actualidad más candente (20) Similar a Transforming Data Management and Time to Insight with Anzo Smart Data Lake® (20) Más de Cambridge Semantics (15) Transforming Data Management and Time to Insight with Anzo Smart Data Lake®1. Transforming Data Management and Time to Insight
with Anzo Smart Data Lake®
©2016 Cambridge Semantics Inc. All Rights Reserved
Steve Hamby
Managing Director, Government
Cambridge Semantics Inc.
678.346.6386
steve@cambridgesemantics.com
2. This webinar is targeted to Federal Government CIOs and
staff that are researching enterprise data management and
mining tools to help them understand how Smart Data Lakes
enable a viable mechanism for addressing their top priorities
3. ©2016 Cambridge Semantics Inc. All rights reserved. 3
Cambridge Semantics (CSI)
The Anzo Smart Data Lake® enables enterprise-wide, governed,
self-serve, diverse data discovery & analytics
Company:
▪ Founded by senior team from IBM’s Advanced Internet Technology Group
▪ Complemented by MPP technology team previously at Netezza & Paraccel (Amazon Redshift)
▪ Select customers:
Software:
▪ Anzo: Links & harmonizes diverse data using Semantic knowledge graphs for governed self-
serve data preparation, discovery & analytics
▪ Currently 3rd generation of the product in production use
MIT Innovation
Showcase
Business
Intelligence /
Analytics Solutions
4. ©2016 Cambridge Semantics Inc. All rights reserved. 4
What is the Smart Data Lake?
• Ovum: a governed, managed, and transparent default ingest point for raw data that
provides inventory, security, and integration to enable holistic enterprise analytics
• CSI: a flexible and scalable system of parallelized data ingestion for structured and
unstructured data that uses semantic graph models to link and contextualize the diverse
enterprise data, and a clustered, in-memory graph query engine to provide users self-
service data discovery, analytics and visualization capabilities across enterprise concepts
representing real-world entities & relationships
Machine-Assisted Smart Data Prep &
Integration
Semantic Knowledge Graphs
In-Memory Clustered Graph Query
Engine
Interactive Discovery & Analytics with
Query Builder Dashboard
5. ©2016 Cambridge Semantics Inc. All rights reserved. 5
The Data Prep, Discovery & Analytics Problem
Current Approach: Labor-intensive, Time-consuming with Limited Insight Capabilities
• Data Preparation Process Iterative and Labor Intensive
• Data Models & Data Prepared to Respond to Specific Questions – Full granularity not
exposed
• Harmonizing Unstructured Data is Too Complex
• Understanding & Overview of Available Data Limited
• Poor/Limited Governance & Provenance across Diverse Data
6. ©2016 Cambridge Semantics Inc. All rights reserved. 6
Enterprise Data Management and Mining
• Enterprise Data Warehouse (1980s and 1990s): central repository(ies) of integrated,
processed data from one or more disparate sources, containing current and historical data
that is used for creating analytical reports for operational users throughout the enterprise
Single, “boil-the-ocean”, static view of the enterprise
• Data Mart (1990s and 2000s): simple form of a data warehouse that is focused on a single
subject / functional area (e.g. sales, finance or marketing), and are often created and
controlled by a single department within an organization
Several, often competing, “authoritative” views of the enterprise
• Data Lake (2010 - present): a collection of storage instances of various data assets for the
purpose of presenting an unrefined view of data to only the most highly skilled analysts, to
help them explore their data refinement and analysis techniques independent of any of the
system-of-record compromises that may exist in a traditional analytic data store (such as a
data mart or data warehouse)
“Data shredder / blender” —> Hire smart data scientists
7. ©2016 Cambridge Semantics Inc. All rights reserved. 7
Feature Comparison of Legacy Approaches
1. Governed: Enforce processes on integrating data sources to enhance enterprise data;
2. Non-volatile: Do not overlay existing data values, but create a new record with the new value;
3. Integrated: Provide holistic enterprise view of data across business functions and over time;
4. Agile: Rapidly evolve use cases supported as organization needs change;
5. Data Agnostic: Support all data, regardless of structure or implied authoritativeness;
6. Data Centric: Analytics are attached to data in a horizontally scalable environment versus pushing data to
analytics, which creates significant network load;
7. Standards-based: Uses open standards to promote interoperability and reduce vendor lock-in;
8. Low TCO: Total cost of ownership is directly proportional to number of supported use cases.
9. ©2016 Cambridge Semantics Inc. All rights reserved. 9
Introducing Anzo Smart Data Lake
Anzo Smart Data Lake
•Data Prep - Combine
•Data Discovery - Explore
•Data Analytics - Act
Provide a Knowledge Graph of All Diverse Data,
Structured, Unstructured, Internal or External,
at Big Data Scale while Providing End-User Governed Self-Service Discovery & Analytics
A fully integrated Data Prep, Data Discovery and Data Analytics product
which is an overlay to existing systems
12. ©2016 Cambridge Semantics Inc. All rights reserved. 12
Data Analytics: Act - Governed Self-Service on Trusted
Data
Traditional
Approach
Smart Data
Lake
● Explore entities & relationships across all
data sets to discover & analyze new
insights
● Create individualized analytics
dashboards based on custom extract
without having to learn a query language
● No more waiting on IT for DBA’s or data
wranglers to integrate data
● Trusted and validated data. Data
harmonization for synonyms
● Governance of users.. IT can set data
policies
● Self-serve analytics post data loading
into self-serve BI tool
● Little governance or control
● Poor handling of unstructured and
diverse data
13. ©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential 13
Open Standards Enable Smart Data
Logical models represented as OWL Ontologies are the key to a Smart Data Lake
• Next step in a decades-long evolution in IT
– Relational theory started a movement focused on modeling data in rich queryable
structures
– Logical models are data too and not conflated with database storage constructs
– Rich canonical models allowing unlimited entity types & relationships
– Open Standards based representations (W3C’s OWL & RDF)
• CSI is the market leader in operationalizing ontologies
– Rich detailed data representations in the language of the business or domain
– Simplifies data ingestion through ETL/ELT automation, interlinking & harmonization
– Ideal for merging data from both structured and unstructured sources
– Dramatically simplified end-user data self-service through automated query
(SPARQL) generation
14. ©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential 14
Work the Way you Think
Graph-Based Data Discovery and Analytics
15. ©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential 15
Work the Way you Think
Graph-Based Data Discovery and Analytics
16. ©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential 16
Anzo Smart Data Lake®
Graph-based Data Discovery & Analytics for the Enterprise
An enterprise-scale offering to easily harmonize, discover and analyze all data
whether from inside or outside the enterprise
Smart Data Prep &
Integration
Combine
Smart Data
Discovery
Enterprise Smart
Data Lake
Explore
Smart Data
Analytics
Act
Standard
Ontology
(e.g. NIEM,
BFO)
17. ©2016 Cambridge Semantics Inc. All rights reserved. 17
Anzo Smart Data Lake®
Graph-based Data Discovery for the Enterprise
A Semantic knowledge graph links & harmonizes
diverse data to provide governed self-service
discovery, analytics & data management at big data
scale
Enterprise
Corporate Systems
Enterprise
Unstructured Data
External Data
Feeds
Social Media &
Diverse Data
Sources
Third-Party &
Partner Systems
Cambridge Semantics
Query Builder Dashboards
Last Mile Analytics Tools
Examples:
Machine-Assisted Smart Data
Prep & Integration
Semantic Knowledge Graphs
In-Memory Clustered Graph
Query Engine
Interactive Discovery &
Analytics with Query Builder
Dashboard
Data
Prep & Ingestion
Data
Discovery &
Analytics
Easily harmonize, discover and analyze all data whether from inside or outside the
enterprise
19. ©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential 19
“ “
It would be nice if somehow all
data and information was coming
from a single place, like spokes
on a wheel.
Our highly qualified, highly
paid PhD’s should not be
spending their time searching
for and formatting data.
“
In a recent regulatory investigation, our
scientists had to sift through tons of
documents and systems to manually
trace data. All together, the investigation
spanned multiple months
Anzo Smart Data Lake®
Example Customer Challenges and Customer Satisfaction Quotes
“
The integration/connectivity
that the tool affords is very
impressive
“
Very powerful …
Far exceeded
expectations
“
The ability to handle structured
and unstructured data is very
impressive
20. ©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential 20
Example Government Use Cases Already Transforming
Data Management and Time to Insight
• U.S. Intelligence Organization
– Problem: Need better insight into contract performance, trends, and patterns
– Background:
• Recent push from Congress to leverage small business
• Diverse mission set and classification of specific contracts complicates insight issues
– Solution: Use Anzo Smart Data Platform® to ingest contracts data to provide contract analysts
better insight
– Result: Analysts can search, analyze, and conduct interactive data discovery across contracts that
are within their “Need to Know” purview to better understand and optimize contracting
• Defense Organization
– Problem: New multi-billion dollar program with 500+ legacy data sources over-spending and
under-delivering; had immediate need for determining the optimal mix of legacy data sources to
integrate to perform mission, and which ones to terminate sustainment
– Background:
• Data in relational databases, proprietary flat files, XML, and some .csv and Excel
• Mission requirements and available data sources were documented in Excel
– Solution: Use Anzo Smart Data Platform® to integrate Excel containing mission requirements and
available data sources; generate a data lineage for all data sources; create reports detailing
optimal data sources to integrate to accomplish mission with least cost
– Result: Mission-oriented data integration roadmap with faster time to implementation and
significant cost reduction
21. ©2016 Cambridge Semantics Inc. All rights reserved. 21
Anzo Smart Data Lake®
Driving Business Efficiency Use Case
• Every agency has key
performance metrics related to
their mission
– They manage their performance based
on these KPPs / KPIs
• Every agency has a requirement
to share “business” related data
– Energy efficiency, personnel data,
contracting performance data, etc.
• Anzo Smart Data Lake® can help
agencies drive business efficiency
through analyzing this data
– Better understand the data they share
– Analyze trends in the data
– Compare with other agencies
– Develop metrics (similar to KPP / KPI
on the mission) that provide a
“scoring” mechanism
– Rationalize investments / needed
contractual obligations
– Predict future performance
22. ©2016 Cambridge Semantics Inc. All rights reserved. 22
Anzo Smart Data Lake®
Enabling Self-Service to Citizens
• Data.gov was created to provide
citizens data to enable them to:
– Conduct research
– Develop web and mobile
applications
– Design data visualizations
– And more
• Filtering through the data is a
timely task for citizens in using
the data
– Anzo Smart Data Lake® can
provide a robust, scalable solution
to process large amounts of data
• Including text and structured
– Using Ontology to easily
categorize data for the citizen
– Enabling citizens to produce
better analysis and do better
services with the results of their
analysis
23. ©2016 Cambridge Semantics Inc. All rights reserved. 23
Summary, Q&A and Next Steps
• Anzo Smart Data Lake can help your agency
transform its data management to drive faster
time to insight and cost reduction
– Platform has proven success in Financial Services and
Life Sciences (very large, complex data)
• Business Efficiency is a missing metric for agencies
– Data is already being shared
• Citizens are already doing wonderful things with
data.gov
– Analysis is expensive; Anzo can help
• Next Steps
– Proof of Concept
– Unsolicited Proposal