Data Lakes are early in the Gartner hype cycle, but companies are getting value from their cloud-based data lake deployments. Break through the confusion between data lakes and data warehouses and seek out the most appropriate use cases for your big data lakes.
Determinants of health, dimensions of health, positive health and spectrum of...
Data lake benefits
1. Strategic Advisory
Big Data – Cloud -‐ Analytics
Info
Strategy
Fishing in the
big data lake
DATA EXPLORATION AND DISCOVERY ANALYTICS
FOR DEEPER BUSINESS INSIGHTS
2. InfoStrategy
What is a “data lake”
data lake (plural data lakes)
A massive, easily accessible data repository
built on (relatively) inexpensive computer
hardware for storing "big data". Unlike data marts,
which are optimized for data analysis by storing only some
attributes and dropping data below the level aggregation, a
data lake is designed to retain all attributes,
especially so when you do not yet know what the
scope of data or its use will be.
http://en.wiktionary.org/wiki/data_lake
… Enterprise Data Hub sounds too boring !
3. InfoStrategy
Optimise business through insights
Insight
Action
Optimise
Move a metric
Change a product
Change behaviour/process
Hindsight
Realtime
Foresight
Trusted information
Act on insights gained
Execute theories
Measure
Outcomes
Sentiment
Feedback
Explore datasets, discover correlations, patterns.
Undiscovered facts
Information Value
Data Volumes
Forecasting, planning & trending
Statistical Analysis
Operational reporting, SCADA control
Alerts & Events
Historical reporting, Proof of operation
Regulatory, statutory, financial
Uncover previously
unknown facts
from enriched data
in the data lake
4. InfoStrategy
Future state of analytics
Strategic Intent
To improve BI and Analytical capabilities to a level where organisations are able to
access and analyse information in a secure, timely and cost-‐effective manner.
Gain key insights to optimise the operations of your business, predict the best
possible outcomes for growth, new opportunities, and competitive advantage
across all business lines.
Mission Statement
“Providing advanced analytics capability across all business units, empowering our
people with the processes and supporting technologies to exploit our information
assets for business benefit.”
Target Operating Model will deliver:
Rapid access to data to uncover new facts via advanced data exploration and
discovery analytics.
Clarity of who is responsible and accountable for maintaining critical information
assets via a well structured governance and engagement model.
A trusted and highly secure source of data for all analytical information requirements
via a data quality assurance program.
Trawling for value in the big data lake
5. InfoStrategy
‘Fish stocks’ are replenished from existing and future
operational systems plus external sources
Core
Transactional Data
“operational”
Management
Reporting
Unstructured &
External Data
“contextual”
Enterprise Dashboards
Reporting
Consolidation
Data ScientistsBusiness AnalystsBusiness UsersCustomers
Data Extraction
Discovery Analytics
Platform
Visualisation
Analysis
Data Preparation
Data Collection
Operational
Reporting
Operational Dashboards
Real-‐time Reports
Alerts & Exceptions
Embedded BI
Production Data Repository
“Data Lake”
Information Governance
Data Management
Supplier &
Industry Data
“comparative”
6. InfoStrategy
Consolidated
Management
Reporting
Operational
Supporting
Capability
Discovery
Analytics
To meet the demand for rapid access to information
users must adopt a flexible multi-‐platform architecture
What reporting does for established operations … discovery analytics does for new business development.
The trend within industry is to move away from the single-‐platform monolithic data warehouses towards a physically distributed environment
for information delivery. Many businesses are extending their data warehouse environments to include new standalone data platforms that
are conducive to discovery analytics. A holistic view is maintained via a common, single replicated dataset and an enterprise information
management program, governing delivery and access to key information (data lake).
Source Applications
ERP
CRM
HR
Finance
Telemetry
Geospatial GIS
Documents
Email
Files
Real-time Data
Capture
Cleansing
Loading
Data Warehouse
Modelling
Relational DW
Data Marts
Analysis Cubes
Analytics Delivery
Cloud-based Service Model
Actuarial
Applications
Event-Based
Applications
Reporting
Production
Reporting
OLAP Analytics
Ad Hoc Query
External
Data
Exploration &
Discovery
Metadata Integration
Event Processing Results
Detailed Datasets Results
Collection and blending Insights
Portal
PDF
Desktop
Guided
Visualisation
Mobile BI
Active
Dashboards
Data Replication
Historical Data Preparation
Storytelling
Information Governance
Operational Reporting
Dimensional
Modelling
ProductioniseInsights
7. InfoStrategy
Principles: Easier access information to discover new
facts about the business.
◦ Described as a ‘sandpit’ environment, providing the ability to explore and discover new
facts about the business, it’s members and customers, partners and competitive
pressures.
◦ Also used for testing a hypothesis or running scenarios across the data
◦ Getting answers to ‘one-‐off’ questions which are not addressed through the normal
published, scheduled operational reporting channels
◦ Data is replicated from all operational systems into a single landing area, ensuring
traceability and reconciliation to all consuming applications, such as the data warehouse,
analytical application, and other business applications.
◦ Clearly defined critical business entities/records are synchronised (or Mastered) across
all applications eliminating duplication and confusion. Data quality attributes are defined
and managed for each critical business entity.
◦ A fully integrated Member/Customer view is established across both analytical and
transactional applications.
◦ Using the replicated data to build more dynamic analytical data structures for scheduled
production reporting and ah-‐hoc analysis
◦ Provide users with the tools to access and analyse data, freely explore current and new
datasets, and visualise patterns and discoveries to gain deep insights.
Providing business users with direct
access to data to meet immediate
information needs where the
accuracy of the data is not the
primary objective.
Having a single source of truth
across all business applications at
detailed level from which all
information requests are satisfied.
Improved environment for more
cost effective and faster business
intelligence delivery.
Provide business users with the ability to access production information directly, collect it as needed, and
prepare the data for analysis. Exploring the data to uncover previously unknown facts about the business, and
sharing those facts visually with others. Enrich production data with external “context” to extend insights.
Key Principles Description
8. InfoStrategy
Benefits of Discovery Analytics versus traditional data
warehousing
Classic Data Warehouse Issues Discovery Analytics Benefit
Lengthy IT Backlog and lack of resources to extend the
EDW to support new business requirements.
Data can be explored and analysed outside of the EDW
environment before it is put into production use.
High costs of supporting increasing data volumes and
new types of data.
Data can be filtered and transformed before it is loaded
into the EDW
Lack of flexibility in the EDW data model to support
constantly changing business requirements.
Data discovery support dynamic schema on read
approach which reduces the need for detailed up-‐front
modelling.
Need to have data quality and governance processes in
place before user can access the EDW data.
The investigative nature of data discovery has lower data
quality and governance requirements
Growing use of personal data marts to overcome IT
barriers and the performance overheads of ad hoc
processing
The flexibility and performance of data discovery
encourages shared use of data and analytics.
Recent proof of concept for Discovery Analytics in the cloud (AWS), has provided some
considerable cost & time savings in infrastructure and hosting, viz.:
$55 per day to host a 960GB data warehouse
$32 per day to host a Data Integration server AND a BI server.
2.5 weeks to setup POC environment and start analysis and visualising results.
9. InfoStrategy
Discovery Analytics Target POC Architecture
Structured
Data
Unstructured
Data
ERP
Telemetry
Web/External
Replication of corporate data, enriched with external data and
content, available in a centrally available and scalable repository
ready for exploration, discovery and predictive analysis to gain
deep insights and actionable results.
10. InfoStrategy
Fishing safely with the appropriate life vests is
important too.
Security and data management standards are available
International
Standard on
Assurance
Engagements
Service Organisation
Control framework
Federal Information
Management
Security Act
Payment Card
Industry –Data
Security Standard
Federal Information
Processing Standard
International Standards
Organisation –
Information Security
Standard
Source: Amazon Web Services
11. Info
Strategy
To learn more about how InfoStrategy
can help you develop your big data
strategy to solve your big business
problems, or to arrange a Proof of
Concept, please contact us today using
the details below.
InfoStrategy Pty Ltd
246 Oxford St, Balmoral
Queensland 4171
Australia
Tel: +61 7 3151 2021
Email:
contactus@infostrategy.com.au