Many organizations are immature when it comes to data and analytics use. The answer lies in delivering a greater level of insight from data, straight to the point of need.
There are so many Data Architecture best practices today, accumulated from years of practice. In this webinar, William will look at some Data Architecture best practices that he believes have emerged in the past two years and are not worked into many enterprise data programs yet. These are keepers and will be required to move towards, by one means or another, so it’s best to mindfully work them into the environment.
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
Data Architecture Best Practices for Advanced Analytics
1. Data Architecture
Best Practices for
Advanced Analytics
Presented by: William McKnight
“#1 Global Influencer in Big Data” Thinkers360
President, McKnight Consulting Group
A 2 time Inc. 5000 Company
@williammcknight
www.mcknightcg.com
(214) 514-1444
Second Thursday of Every Month, at 2:00 ET
With William McKnight
6. “With the move to ChaosSearch, 98% of all
operational burdens have been lifted from us,
allowing us to focus on Blackboard-specific
tasks.”
Joel Snook, Director, DevOps Engineering
15. A Unified Data Lake Architecture for Log and SQL Analytics
14
ChaosSearch uniquely solves for known and unknown data and queries
* Source: Gartner. DW = data warehouse
The Zone of Confusion Within the Data
and Analytics Infrastructure Model
Expanding
Understanding and
Investigating
Founational
Core
Innovation
and Exploration
Establish
Value
Traditional DW
Data Lake
Zone of Confusion
Questions
Known Unknown
Data
Unknown
Known
Bring together search and relational analytics
• Eliminate pipelines, ETL, data movements
• Faster insights
A unified data lake architecture that supports:
• Innovation & exploration
• Investigative queries
• Operational analytics
Disrupts the economics of Big Data
16. A Unified Data Lake Architecture for Log and SQL Analytics
15
ChaosSearch uniquely solves for broadest scope of analytics needs
Bring together search and relational analytics
• Eliminate pipelines, ETL, data movements
• Faster insights
A unified data lake architecture that supports:
• Innovation & exploration
• Investigative queries
• Operational analytics
Disrupts the economics of Big Data
* Source: Gartner. DW = data warehouse
The Zone of Confusion Within the Data
and Analytics Infrastructure Model
Expanding
Understanding and
Investigating
Founational
Core
Innovation
and Exploration
Establish
Value
Traditional DW
Data Lake
Questions
Known Unknown
Data
Unknown
Known
17. William McKnight
President, McKnight Consulting Group
• Consulted to Pfizer, Scotiabank, Fidelity, TD
Ameritrade, Teva Pharmaceuticals, Verizon, and many
other Global 1000 companies
• Frequent keynote speaker and trainer internationally
• Hundreds of articles, blogs and white papers in
publication
• Focused on delivering business value and solving
business problems utilizing proven, streamlined
approaches to information management
• Former Database Engineer, Fortune 50 Information
Technology executive and Ernst&Young Entrepreneur
of Year Finalist
• Owner/consultant: Data strategy and implementation
consulting firm
William McKnight
The Savvy Manager’s Guide
The
Savvy
Manager’s
Guide
Information
Management
Information Management
Strategies for Gaining a
Competitive Advantage with Data
2
19. Data is Under Management when it is…
• In a leveragable platform
• In an appropriate platform for its profile and
usage
• With high non-functionals (Availability,
performance, scalability, stability, durability,
secure)
• Data is captured at the most granular level
• Data is at a data quality standard (as
defined by Data Governance)
• Enables self-service
4
Best Practice:
Enpower
everyone with
true self-service
20. “80% of analysts’ time
is spent simply discovering
and preparing data.”
What’s Your Data Strategy, Thomas Davenport, HBR 2017
Best Practice:
Start getting
concerned with
the tools and
processes of the
analyst
23. Data Lakes
• P
8
Parquet format
Best Practice:
Put big data in
data lakes
Best Practice:
Index the data
lake
24. Data Lakes
• Common & centralized storage for the enterprise
• No defined data model into which the data is
formed
• No relationships between the datasets
• Historical data retention
• All data formats
• For big data
• Analytical processing
• Data scientists and analysts
• Less governance/quality than data warehouse
– Focus: Ingestion
9
25. Graph Databases
Bridge
vertex
Bridge
vertex
10
• Subject: John R Peterson Predicate: Knows Object: Frank T
Smith
• Subject: Triple #1 Predicate: Confidence Percent Object: 70
• Subject: Triple #1 Predicate: Provenance Object: Mary L Jones
Best Practice:
Use graph
databases for
sizable
connected data
26. Data Virtualization
“The right answer is not
always to centralize the
data. Data Virtualization
will be of utmost
importance as the
‘perpetual short-term’
solution to the need.”
11
Data Warehouses
Marts & Cubes Operational
Data Stores Transactional
Sources
File Systems
Big Data
Enterprise Data
Virtualization
Best Practice:
Enable data
virtualization for
edge and
temporary
needs
27. Enterprise Analytic Stack
• Dedicated Compute
• Storage
• Data Integration
• Streaming
• Analytics
• Data Exploration
• Data Lake
• Business Intelligence
• Machine Learning
• Identity Management
• Data Catalog
• Data Virtualization
Best Practice:
Leverage best of
breed for your
analytics stack
28. • Autonomous Administration
• Lack of Platform Features Leads to Increased
Configuration and Management
– stored procedures, referential integrity and uniqueness
capabilities
– mission critical options for backup and disaster recovery, which
typically includes a standby database
– full ANSI-SQL compliance
• Performance
Total Cost of Ownership is More Than Just
Cloud Costs
Best Practice:
Get a strong
handle on your
cloud costs
29. Capabilities for Data Integration for
Enterprise Data
• Comprehensive Native Connectivity
• Multi-Latency Data Ingestion
• Data Integration (in ETL, ELT, Streaming)
• Data Quality and Data Governance
• Data Cataloging and Metadata
Management
• Enterprise Trust, Enterprise Scale (or Class)
• AI Intelligence and Automation
• Ecosystem and Multi-cloud
30. Data Integration Options
Project Technical Environments Recommended For
Consideration Project Scope
Heterogenous:
Cloudera Any Any
IBM Any Any
Informatica Any Any
Talend Any Any
Specialist:
AWS (Glue) Environments on AWS with core of Redshift, EMR Any
Azure (Azure Data Factory) Environments on Azure with core of Synapse, HDInsight Any
FiveTran Any Contained scope
Google Environments on GCP with core of BQ, DataProc Any
Matillion Any Contained scope
Oracle Environments with Oracle database Any
SAP SAP-only environments SAP projects
Best Practice:
Use fit-for-
purpose data
integration
32. Architecture Component Needs
• Security and Privacy
• Governance and Compliance
• Availability
• Backup and Recovery
• Performance
• Scalability
• Licensing
17
33. Analytics Reference Architecture
Logs
(Apps, Web,
Devices)
User tracking
Operational
Metrics
Offload
data
Raw Data Topics
JSON, AVRO
Processed
Data Topics
Sensors
and
/ or
Transactiona
l/ Context
Data
OLTP/ODS
ETL
Or
EL with
T in Spark
Batch
Low
Latency
Applications
Files
In-
database
analytics
Reach
through
or ETL/ELT
or
Stream
Processing
or
Stream
Processing
Q
Q
Data
Warehouse
Data Lake
35. Data Mesh
Logs
(Apps, Web,
Devices)
User tracking
Operational
Metrics
Offload
data
Sensors
and
/ or
Transactiona
l/ Context
Data
OLTP/ODS
Batch
Low
Latency
Applications
Files
In-
database
analytics
Reach
through
or ETL/ELT
or
Stream
Processing
or
Stream
Processing
Q
Q
Data
Warehouse
Data Lake
ETL
Or
EL with
T in Spark
Raw Data Topics
JSON, AVRO
Processed
Data Topics
36. Data Fabric
Logs
(Apps, Web,
Devices)
User tracking
Operational
Metrics
Raw Data Topics
JSON, AVRO
Processed
Data Topics
Sensors
and
/ or
Transactiona
l/ Context
Data
OLTP/ODS
ETL
Or
EL with
T in Spark
Batch
Low
Latency
Applications
Files
In-
database
analytics
or
Stream
Processing
Data
Warehouse
Data Lake
38. Data Cloud (Snowflake)
Logs
(Apps, Web,
Devices)
User tracking
Operational
Metrics
Raw Data Topics
JSON, AVRO
Processed
Data Topics
Sensors
and
/ or
Transactiona
l/ Context
Data
OLTP/ODS
ETL
Or
EL with
T in Spark
Batch
Low
Latency
Applications
Files
In-
database
analytics
or
Stream
Processing
Data
Warehouse
Data Lake
39. Summary
• Get all enterprise data under management
• RDBMS (/columnar), Cloud Storage/Parquet, Graph cover most analytic
platform needs
• Cost of ownership is more than the cloud costs
• Data Integration is vital to Data Architecture for Modern Analytics
• The Data Mesh and Data Fabric are decentralizing the architecture
24
40. Upcoming Topics
• Is Our Information Management Mature?
• The Future based on AI & Analytics
• Organizational Change Management for Data & Analytics
Driven Projects
• Graph Database Use Cases
• Assessing New Databases: Translytical Use Cases
25
Second Thursday of Every Month, at 2:00 ET
41. Data Architecture
Best Practices for
Advanced Analytics
Presented by: William McKnight
“#1 Global Influencer in Big Data” Thinkers360
President, McKnight Consulting Group
A 2 time Inc. 5000 Company
@williammcknight
www.mcknightcg.com
(214) 514-1444
Second Thursday of Every Month, at 2:00 ET
#AdvAnalytics