SlideShare a Scribd company logo
1 of 43
The Modern Analytics
Architecture
Making Big Data UsefulJoseph D’Antoni, Solutions Architect
Anexinet
May 7-9, 2014 | San Jose, CA
Please silence
cell phones
Joey D’Antoni
Joey has over 15 years of experience with a wide variety of data platforms, in both
Fortune 50 companies as well as smaller organizations
He is a frequent speaker on database administration, big data, and career
management
He is the co-president of the Philadelphia SQL Server User’s Group
He wants you to make sure you can restore your data
Agenda
• Data Warehouses—how did we get here?
• Big Data—Hadoop and more
• Modern Analytic Tools
• Building Our New Architecture
4
Data Warehouses—A History
• Data Warehousing had it origins in
the 1970s—A.C. Nielsen provided
clients with data marts
• In 1988—Bill Inmon (IBM) published
“An Architecture for a Business
Information System”
• In 1996—Ralph Kimball published
“The Data Warehouse Toolkit” which
showcased models for OLAP style
modelling
5
Data Warehouse Models
• Star Schema
• Advantage is that the DW is easier
to use
• Facts and dimensions allow queries
to perform faster
• Loading and ETL become more
complicated
• Structure changes are very
expensive
Dimensional Model
6
Data Warehouse Model
• Tables are grouped by subject area
(consumer, finance, products)
• Tables are linked by joins
• Very easy to add information into
the database
• Queries are harder to write, and
joins can be very expensive
performance wise
Normalization
7
Data Warehousing Challenges
Data Quality
ETL
Performance and Scalability
Costs—Licensing and
Hardware
8
Data Quality
9
Extract, Transform, Load (ETL) Process
10
Some Database Business
Doesn’t Care
About
Process
Your
Some
Credit—Buck Woody, Microsoft
Performance and Scalability
Given the volume of data,
DW queries can be very
slow
We use techniques like
data compression to make
them faster
CPU was older problem—
now tends to be storage
11
Costs
Data Warehouses need large
servers
Database systems are
licensed by the size of the
server (core)
Data Warehouses need a
whole lot fast storage
Large volumes of fast storage
(SANs) are expensive
12
Traditional Solutions
13
Classic Data Analysis
Data Warehouse &
BI Solutions
ETL
…Uses Just a Subset
Common Technical Themes
There are a lot of “big data” solutions, but most of
have a lot of things in common
• Built in HA/DR through multiple copies of the data
• Designed for analytics processing more than OLTP
• Derived from Open Source solutions
• Designed around local storage and commodity hardware
Components Of Modern Architecture
Hadoop
• (And it’s ecosystem)
EDW
Analytics Engine
Visualization Engine
Big Data Workflow for Combined Data and Analytics
Data Acquire Organize Analyze Decide
StructuredSemi-StructuredUn-Structured
Master and
Reference
Transactions
Machine
Generated
(Logs)
Web
Text, Image,
Audio, Video
DBMS (OLTP)
Files
NoSQL
(Key Value
Data Store)
HDFS
ETL/ELT
Change Data
Capture
Real-Time
Message-
Based
Hadoop MR
ODS
Data
Warehouse
Streaming
(CEP Engine)
In-
Database
Analytics
Analytics
• Reporting and
dashboards
• Alerting and
recommendations
• EPM, Social Apps
• Text analytics and
search
• Advanced
analytics
• Interactive
discovery
Hardware
Big Data
Cluster
High
Speed
Network
RDBMS
Cluster
In-
Memory
Analytics
Source—Gartner,
Credit Suisse, 8/12
Are We Leaving the RDBMS?
CPUs
19
Hadoop
Project Starts
Exadata
Launched
Costs—Big Data versus Data
Warehouse
20
$-
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
$300,000.00
$350,000.00
Server Storage Licensing Total
Hadoop and Data Warehouse Costs
Hadoop Data Warehouse
• For same costs you build a
15-node Hadoop cluster
• The Hadoop cluster would
have 3840 GB of RAM
versus the 1024 in the DW
sever
Enter the Yellow Elephant
21
Hadoop
Hadoop is the leading Big Data platform
(eco-system)
Invented by Yahoo
• Scales Horizontally (2 socket x86 servers in
massive clusters)
• Uses big, slow, local storage
• Extremely fault-tolerant
• In a nutshell—it’s a Distributed File System (3
copies of data in cluster) and a programming
framework called MapReduce
Introducing Hadoop
23
Host 1
Name Node
Host 3
Data Node
Host 5
Data Node
Host 2
Secondary
Name Node
Host 4
Data Node
Host 6
Data Node
How Map Reduce Works
24
• Automatic
parallelism
• Fault tolerance
Map Phase
Input File: foo.log
HDFS
Block 1
HDFS
Block 19
HDFS
Block 105
1) Read splits
into records
Split 1
K:0 V…
Map
Task 1
K:INFO
V…
Split 2
K:123
V…
Map
Task 2
K:INFO V:1
K:WARN V:1
Split 3
K:332 V…
K:368 V…
Map
Task 3
K:Debug
V:1
K:INFO V:1
2) Run Map
3) Write and
Sort Output
Hadoop Ecosystem
HDFS
MapReduce
Note: This is only a
subset of ecosystem!
YARN
Spark and Shark
• Hadoop 2
Enhancements
• Spark is in-memory
• Shark integrates Spark
with Hive
28
Hadoop Architectural Decisions
• Distribution
• Components
• Support
• Cloud vs On-Premises
Choosing Your Hadoop Distribution
Hadoop Vendors
Technology Vendor Description
Hadoop Distributions Apache Completely open source
software for distributed
clusters and map/reduce
Cloudera Industry leading commercial
distribution, good
management tools
Hortonworks Open source distribution—
Apache compatible
MapR Multiple enhancements to
Apache Hadoop (rewrite of
HDFS), high performance,
enterprise ready
Pivotal HD EMC spinoff with strong
financial backing, this is full
high performance RDBMS
(with BI connectors) on top of
Hadoop
Cloud vs On-Premises
32
• Short Term Use
• Rapid Scale
• Test Use Cases
• Pay as you go
• Internet data
source
• Large long term
implementations
• Well known workloads
• Shared clusters
• Large initial investment
On-Premises
Analytics Engine
33
Analytics
Hadoop is was
not fast
Full scans of files
So How Do We
Rapidly Analyze
Data?
34
Columnar Databases
Microsoft SQL Server (2012
& 2014)
PDW
HP Vertica
HBase
ParAccel
InfiniDB
EMC Greenplum
35
In-Memory Databases
SQL Server 2014
SAP Hana
Oracle Times Ten
VoltDB
Apache Spark
36
Analytics Tools Past and Present
37
38
Data Visualization
Tools for Data Visualization
Excel (Power View and Power
Map)
Tableau
Qlik
Platfora
Pentaho
Bringing This All Together
Power Query (Excel)
40
Some Database Business
Doesn’t Care
About
Process
Your
Some
Q & A ?
Session Evaluations
Submit by 5pmFriday May
9 to WIN prizes
Your feedback is
important and valuable.
ways to access
Go to
passbac2014/evals
Download the PASS EVENT
App from your App Store
and search: PASS BAC
2014
Follow the QR code link
displayed on session
signage throughout the
conference venue and in
the program guide
for attending this session and
the PASS Business Analytics
Conference 2014
Thank
You
May 7-9, 2014 | San Jose, CA

More Related Content

What's hot

Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
StampedeCon
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
DataWorks Summit
 

What's hot (20)

Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data Solutions
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azure
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma Presentation
 
Cloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntCloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure Hunt
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 

Viewers also liked

2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
DB Tsai
 
2014 spark with elastic search
2014   spark with elastic search2014   spark with elastic search
2014 spark with elastic search
Henry Saputra
 

Viewers also liked (20)

The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
 
A big-data architecture for real-time analytics
A big-data architecture for real-time analyticsA big-data architecture for real-time analytics
A big-data architecture for real-time analytics
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Top Agile Metrics
Top Agile MetricsTop Agile Metrics
Top Agile Metrics
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Real time data analytics - part 1 - backend infrastructure
Real time data analytics - part 1 - backend infrastructureReal time data analytics - part 1 - backend infrastructure
Real time data analytics - part 1 - backend infrastructure
 
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
Integrating Elastic and Apache Spark - Elastic London Meetup (2015-09-24)
 
963
963963
963
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloud
 
Hadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageHadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better Storage
 
Real time analytics @ netflix
Real time analytics @ netflixReal time analytics @ netflix
Real time analytics @ netflix
 
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...
 
2014 spark with elastic search
2014   spark with elastic search2014   spark with elastic search
2014 spark with elastic search
 
ElasticSearch on AWS
ElasticSearch on AWSElasticSearch on AWS
ElasticSearch on AWS
 
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using SparkBuilding an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using Spark
 
Nested and Parent/Child Docs in ElasticSearch
Nested and Parent/Child Docs in ElasticSearchNested and Parent/Child Docs in ElasticSearch
Nested and Parent/Child Docs in ElasticSearch
 
Real time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchReal time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and Elasticsearch
 
Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)
Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)
Real Time Fuzzy Matching with Spark and Elastic Search-(Sonal Goyal, Nube)
 
Real Time search using Spark and Elasticsearch
Real Time search using Spark and ElasticsearchReal Time search using Spark and Elasticsearch
Real Time search using Spark and Elasticsearch
 

Similar to The modern analytics architecture

Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
MLconf
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
Jesus Rodriguez
 

Similar to The modern analytics architecture (20)

Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 

More from Joseph D'Antoni

Building perfect sql servers, every time -oops
Building perfect sql servers, every time -oopsBuilding perfect sql servers, every time -oops
Building perfect sql servers, every time -oops
Joseph D'Antoni
 
Accelerating Database Performance Using Compression
Accelerating Database Performance Using CompressionAccelerating Database Performance Using Compression
Accelerating Database Performance Using Compression
Joseph D'Antoni
 
Windows server 2012 failover clustering new features
Windows server 2012 failover clustering new featuresWindows server 2012 failover clustering new features
Windows server 2012 failover clustering new features
Joseph D'Antoni
 
Sql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanSql saturday powerpoint dc_san
Sql saturday powerpoint dc_san
Joseph D'Antoni
 

More from Joseph D'Antoni (20)

DBA Fundamentals VC
DBA Fundamentals VCDBA Fundamentals VC
DBA Fundamentals VC
 
Building perfect sql servers, every time -oops
Building perfect sql servers, every time -oopsBuilding perfect sql servers, every time -oops
Building perfect sql servers, every time -oops
 
Pass 2013 dantoni azure a gs
Pass 2013 dantoni azure a gsPass 2013 dantoni azure a gs
Pass 2013 dantoni azure a gs
 
Accelerating Database Performance Using Compression
Accelerating Database Performance Using CompressionAccelerating Database Performance Using Compression
Accelerating Database Performance Using Compression
 
Pass bac jd_sm
Pass bac jd_smPass bac jd_sm
Pass bac jd_sm
 
Sql server 2012 ha and dr sql saturday boston
Sql server 2012 ha and dr sql saturday bostonSql server 2012 ha and dr sql saturday boston
Sql server 2012 ha and dr sql saturday boston
 
Accelerating Database Performance with Compression
Accelerating Database Performance with CompressionAccelerating Database Performance with Compression
Accelerating Database Performance with Compression
 
Sql Server 2012 HA and DR -- SQL Saturday Richmond
Sql Server 2012 HA and DR -- SQL Saturday RichmondSql Server 2012 HA and DR -- SQL Saturday Richmond
Sql Server 2012 HA and DR -- SQL Saturday Richmond
 
Sql server 2012 ha and dr sql saturday tampa
Sql server 2012 ha and dr sql saturday tampaSql server 2012 ha and dr sql saturday tampa
Sql server 2012 ha and dr sql saturday tampa
 
Windows server 2012 failover clustering new features
Windows server 2012 failover clustering new featuresWindows server 2012 failover clustering new features
Windows server 2012 failover clustering new features
 
Sql server 2012 ha and dr sql saturday dc
Sql server 2012 ha and dr sql saturday dcSql server 2012 ha and dr sql saturday dc
Sql server 2012 ha and dr sql saturday dc
 
San presentation nov 2012 central pa
San presentation nov 2012 central paSan presentation nov 2012 central pa
San presentation nov 2012 central pa
 
Always on availability groups way too deep
Always on availability groups way too deepAlways on availability groups way too deep
Always on availability groups way too deep
 
South jersey sql virtualization
South jersey sql virtualizationSouth jersey sql virtualization
South jersey sql virtualization
 
Virtualization for DBA
Virtualization for DBAVirtualization for DBA
Virtualization for DBA
 
Sql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalSql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_final
 
Sql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalSql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_final
 
Sql server 2012 ha dr nova
Sql server 2012 ha dr novaSql server 2012 ha dr nova
Sql server 2012 ha dr nova
 
Sql server 2012 ha dr
Sql server 2012 ha drSql server 2012 ha dr
Sql server 2012 ha dr
 
Sql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanSql saturday powerpoint dc_san
Sql saturday powerpoint dc_san
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

The modern analytics architecture

  • 1. The Modern Analytics Architecture Making Big Data UsefulJoseph D’Antoni, Solutions Architect Anexinet May 7-9, 2014 | San Jose, CA
  • 3. Joey D’Antoni Joey has over 15 years of experience with a wide variety of data platforms, in both Fortune 50 companies as well as smaller organizations He is a frequent speaker on database administration, big data, and career management He is the co-president of the Philadelphia SQL Server User’s Group He wants you to make sure you can restore your data
  • 4. Agenda • Data Warehouses—how did we get here? • Big Data—Hadoop and more • Modern Analytic Tools • Building Our New Architecture 4
  • 5. Data Warehouses—A History • Data Warehousing had it origins in the 1970s—A.C. Nielsen provided clients with data marts • In 1988—Bill Inmon (IBM) published “An Architecture for a Business Information System” • In 1996—Ralph Kimball published “The Data Warehouse Toolkit” which showcased models for OLAP style modelling 5
  • 6. Data Warehouse Models • Star Schema • Advantage is that the DW is easier to use • Facts and dimensions allow queries to perform faster • Loading and ETL become more complicated • Structure changes are very expensive Dimensional Model 6
  • 7. Data Warehouse Model • Tables are grouped by subject area (consumer, finance, products) • Tables are linked by joins • Very easy to add information into the database • Queries are harder to write, and joins can be very expensive performance wise Normalization 7
  • 8. Data Warehousing Challenges Data Quality ETL Performance and Scalability Costs—Licensing and Hardware 8
  • 10. Extract, Transform, Load (ETL) Process 10 Some Database Business Doesn’t Care About Process Your Some Credit—Buck Woody, Microsoft
  • 11. Performance and Scalability Given the volume of data, DW queries can be very slow We use techniques like data compression to make them faster CPU was older problem— now tends to be storage 11
  • 12. Costs Data Warehouses need large servers Database systems are licensed by the size of the server (core) Data Warehouses need a whole lot fast storage Large volumes of fast storage (SANs) are expensive 12
  • 14. Classic Data Analysis Data Warehouse & BI Solutions ETL …Uses Just a Subset
  • 15. Common Technical Themes There are a lot of “big data” solutions, but most of have a lot of things in common • Built in HA/DR through multiple copies of the data • Designed for analytics processing more than OLTP • Derived from Open Source solutions • Designed around local storage and commodity hardware
  • 16. Components Of Modern Architecture Hadoop • (And it’s ecosystem) EDW Analytics Engine Visualization Engine
  • 17. Big Data Workflow for Combined Data and Analytics Data Acquire Organize Analyze Decide StructuredSemi-StructuredUn-Structured Master and Reference Transactions Machine Generated (Logs) Web Text, Image, Audio, Video DBMS (OLTP) Files NoSQL (Key Value Data Store) HDFS ETL/ELT Change Data Capture Real-Time Message- Based Hadoop MR ODS Data Warehouse Streaming (CEP Engine) In- Database Analytics Analytics • Reporting and dashboards • Alerting and recommendations • EPM, Social Apps • Text analytics and search • Advanced analytics • Interactive discovery Hardware Big Data Cluster High Speed Network RDBMS Cluster In- Memory Analytics Source—Gartner, Credit Suisse, 8/12
  • 18. Are We Leaving the RDBMS?
  • 20. Costs—Big Data versus Data Warehouse 20 $- $50,000.00 $100,000.00 $150,000.00 $200,000.00 $250,000.00 $300,000.00 $350,000.00 Server Storage Licensing Total Hadoop and Data Warehouse Costs Hadoop Data Warehouse • For same costs you build a 15-node Hadoop cluster • The Hadoop cluster would have 3840 GB of RAM versus the 1024 in the DW sever
  • 21. Enter the Yellow Elephant 21
  • 22. Hadoop Hadoop is the leading Big Data platform (eco-system) Invented by Yahoo • Scales Horizontally (2 socket x86 servers in massive clusters) • Uses big, slow, local storage • Extremely fault-tolerant • In a nutshell—it’s a Distributed File System (3 copies of data in cluster) and a programming framework called MapReduce
  • 23. Introducing Hadoop 23 Host 1 Name Node Host 3 Data Node Host 5 Data Node Host 2 Secondary Name Node Host 4 Data Node Host 6 Data Node
  • 24. How Map Reduce Works 24 • Automatic parallelism • Fault tolerance
  • 25. Map Phase Input File: foo.log HDFS Block 1 HDFS Block 19 HDFS Block 105 1) Read splits into records Split 1 K:0 V… Map Task 1 K:INFO V… Split 2 K:123 V… Map Task 2 K:INFO V:1 K:WARN V:1 Split 3 K:332 V… K:368 V… Map Task 3 K:Debug V:1 K:INFO V:1 2) Run Map 3) Write and Sort Output
  • 26. Hadoop Ecosystem HDFS MapReduce Note: This is only a subset of ecosystem!
  • 27. YARN
  • 28. Spark and Shark • Hadoop 2 Enhancements • Spark is in-memory • Shark integrates Spark with Hive 28
  • 29. Hadoop Architectural Decisions • Distribution • Components • Support • Cloud vs On-Premises
  • 30. Choosing Your Hadoop Distribution
  • 31. Hadoop Vendors Technology Vendor Description Hadoop Distributions Apache Completely open source software for distributed clusters and map/reduce Cloudera Industry leading commercial distribution, good management tools Hortonworks Open source distribution— Apache compatible MapR Multiple enhancements to Apache Hadoop (rewrite of HDFS), high performance, enterprise ready Pivotal HD EMC spinoff with strong financial backing, this is full high performance RDBMS (with BI connectors) on top of Hadoop
  • 32. Cloud vs On-Premises 32 • Short Term Use • Rapid Scale • Test Use Cases • Pay as you go • Internet data source • Large long term implementations • Well known workloads • Shared clusters • Large initial investment On-Premises
  • 34. Analytics Hadoop is was not fast Full scans of files So How Do We Rapidly Analyze Data? 34
  • 35. Columnar Databases Microsoft SQL Server (2012 & 2014) PDW HP Vertica HBase ParAccel InfiniDB EMC Greenplum 35
  • 36. In-Memory Databases SQL Server 2014 SAP Hana Oracle Times Ten VoltDB Apache Spark 36
  • 37. Analytics Tools Past and Present 37
  • 39. Tools for Data Visualization Excel (Power View and Power Map) Tableau Qlik Platfora Pentaho
  • 40. Bringing This All Together Power Query (Excel) 40 Some Database Business Doesn’t Care About Process Your Some
  • 41. Q & A ?
  • 42. Session Evaluations Submit by 5pmFriday May 9 to WIN prizes Your feedback is important and valuable. ways to access Go to passbac2014/evals Download the PASS EVENT App from your App Store and search: PASS BAC 2014 Follow the QR code link displayed on session signage throughout the conference venue and in the program guide
  • 43. for attending this session and the PASS Business Analytics Conference 2014 Thank You May 7-9, 2014 | San Jose, CA