SlideShare una empresa de Scribd logo
1 de 44
HADOOP & THE DATA WAREHOUSE:
WHEN TO USE WHICH
Steve Wooledge – Teradata Labs
Jim Walker – Hortonworks
1
Topics
• Trends in enterprise data architectures
• The value of an integrated data warehouse
• The value of Hadoop
• Bringing it all together and next steps
Big Data Comes with BIG HEADACHES
Even free software like Hadoop is causing
companies to spend more money…Many CIOs believe
data is inexpensive because storage has become
inexpensive. But data is inherently messy—it can be
wrong, it can be duplicative, and it can be irrelevant—
which means it requires handling, which is where the
real expenses come in.
“
”
Through 2015, 85% of Fortune 500 organizations will
be unable to exploit big data for competitive advantage.
“ ”Source: The Wall Street Journal. “CIOs’ Big Problem with Big Data”. Aug 2012
Source: Gartner. “Information Innovation: Innovation Key Initiative Overview”. April 2012
Organizations Face Several Obstacles with Big Data
Source: Big Analytics 2012 Survey, Teradata
Difficulty
managing
multiple systems,
new types of data
Hard to find right
skills; Lack of
supportability
for new systems &
“data scientists”
Difficulty
deploying and
integrating new
systems
Difficulty
providing
accessibility to
fast insights on
big data
Shift from a Single Platform to an Ecosystem
“Big Data requirements are solved by
a range of platforms including
analytical databases, discovery
platforms, and NoSQL solutions
beyond Hadoop.”
“We will abandon the old models
based on the desire to implement for
high-value analytic applications.”
"Logical" Data Warehouse
Source: “Big Data Comes of Age”. EMA and 9sight Consulting. Nov 2012.
AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
DISCOVERY
PLATFORM
CAPTURE | STORE | REFINE
INTEGRATED
DATA WAREHOUSE
LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS
Engineers
Data Scientists
Business Analysts
Front-Line WorkersCustomers / PartnersMarketing
Operational SystemsExecutives
TERADATA UNIFIED DATA ARCHITECTURE
Topics
• Trends in enterprise data architectures
• The value of an integrated data warehouse
• The value of Hadoop
• Bringing it all together and next steps
DUAL
SYSTEMS
DATA
MARTS
ANALYTICAL
ARCHIVE
TEST/
DEV
The Value of The Data Warehouse
INDEPENDENT
DATA MART
Business Analysts
Knowledge Workers
DATA MININGBUSINESS INTELLIGENCE APPLICATIONS
Customers/Partners
Marketing
Executives
Front-line Workers
Operational Systems
INTEGRATED
DATA WAREHOUSE
DATA
LAB
Integrated Analytics
Advanced
Analytics
Temporal
OLAP
Optimization
Geospatial
Big Data
Integration
Application
Development
Agile
Analytics
Data
Exploration
Benefits
•Easy to consume data
•Rationalization of data
from multiple sources
into single enterprise
view
•Clean, safe, secure data
•Cross-functional
analysis
•Transform once, use
many
•Fast response times
SQL Advantages with an MPP RDBMS
• Full ANSI SQL:
• The lingua franca of business users when accessing data
• Decades of standardization (stable, feature rich, portable)
• Mature 3rd Party SQL based tools that provide business users with
self service direct access to the data
• BI Tools
• In-database statistical packages
• Analytic applications (CRM, SCM, MDM)
• Easily parallelized
• Scalable when manipulating large data sets
6/27/2013 9
ACID Advantages in an MPP RDBMS
• Guarantees database actions are
processed reliably
• Ensures 100% query result accuracy
• Supports updates and deletes
• Needed for applications that require
100% consistency
6/27/2013 10
Atomicity - All of the pieces are
committed or none are committed.
Consistency - Creates a new and
valid state of data, or, if any failure
occurs, returns all data to its original
state.
Isolation - Processed and not yet
committed transactions must remain
isolated from any other
transactions.
Durability - Committed data is
saved such that in event of a failure
and system restart, the data is
available in its correct state.
Tight Vertical Integration
• End-to-end management of resources
• Efficient utilization of resources
• Engineered extremely well for known data
• Fine-grained parallelism and resource management
• Consistency of service level delivery
Best Practices Management:
• Workload functions
• Workload groups
• Exceptions
• Priorities
• Time periods
Low Latency Advantages of MPP RDBMS
Multi-temperature
storage with automated
distribution of data based
on access patterns:
• In-Memory
• Solid-State Drives
• Fast Hard Drives
• Fat Hard Drives
6/27/2013 12
• Indexes
• Statistics
• Advanced partitioning
Cost Based Optimizer Advantages in an MPP RDBMS
• Best practices optimizer determines how
the query will be processed most
efficiently, with no “hints” or degrees of
parallelism necessary.
• In chess, you can look out a few moves
to decide your best next move, but you
can’t envision all move and countermove
sequences for the entire game:
• The Grand Master has the
knowledge, experience, and
intelligence to identify and use
the right strategy.
• With Hadoop, the user takes a
heavy role in optimizing the
execution of queries.
• With an MPP RDBMS, the
software is the optimizer.
6/27/2013 13
Query Rewrite
• semantic optimization
• different types of vendor tools
Fast/Efficient Data Access
• Access path - Indexing
• Partitioning (CP & PPI)
• Advanced partitioning schemes
(range & case based, multilevel,
dynamic)
• IO Optimizations (efficient
scans/sync scan) scan optimization
Query Complexity
• Join costing & planning
• Aggregation
Many ways to process a complex query…
Granular Security Advantages in an MPP RDBMS
• Row level security
• Column level security
• An MPP RDBMS tightly integrates mature security features
• User-level security controls
• Increased user authentication options
• Support for security roles
• Enterprise directory integration
• Auditing and monitoring controls
• Encryption
6/27/2013 14
MPP RDBMS Customer Examples
6/27/2013 15
Topics
• Trends in enterprise data architectures
• The value of an integrated data warehouse
• The value of Hadoop
• Bringing it all together and next steps
© Hortonworks Inc. 2012
By the year 2015, we believe half the worlds
data will be processed by Apache Hadoop
Key Hadoop Features for the EDW
•Storage/Processing
•Metadata
© Hortonworks Inc. 2012
Data
Explosion
The World of Data is Changing
Page 18
By 2015, organizations that build a modern information management
system will outperform their peers financially by 20 percent.
– Gartner, Mark Beyer, “Information Management in the 21st Century”
1 Zettabyte
(ZB)
=
1 Billion
TBs
15x
growth rate of
machine
generated data
by 2020
Source: IDC
© Hortonworks Inc. 2012
StorageApache Hadoop: Center of Big Data Strategy
Open Source data management
with scale-out storage &
distributed processing
Page 19
HDFS
• Distributed across “nodes”
• Natively redundant
• Name node tracks locations
Processing
Map Reduce
• Splits a task across processors
“near” the data & assembles results
• Self-Healing, High Bandwidth
Clustered Storage
Key Characteristics
• Scalable
– Efficiently store and process
petabytes of data
– Linear scale driven by additional
processing and storage
• Reliable
– Redundant storage
– Failover across nodes and racks
• Flexible
– Store all types of data in any format
– Apply schema on analysis and
sharing of the data
• Economical
– Use commodity hardware
– Open source software guards
against vendor lock-in
© Hortonworks Inc. 2012
HCatalog
Table access
Aligned metadata
REST API
• Raw Hadoop data
• Inconsistent, unknown
• Tool specific access
Apache HCatalog provides flexible metadata
services across tools and external access
Metadata Services
• Consistency of metadata and data models across tools
(MapReduce, Pig, HBase and Hive)
• Accessibility: share data as tables in and out of HDFS
• Availability: enables flexible, thin-client access via REST API
Shared table
and schema
management
opens the
platform
© Hortonworks Inc. 2012
Page 21
“how to” deliver an open
source enterprise product
• Identify requirements
• Open community delivery
• Enterprise rigor
Apache
Hadoop
Test &
Patch
Design & Develop
Release
Apache
Pig
Apache
HCatalo
g
Apache
HBase
Other
Apache
Projects
Apache
Hive
Apache
Ambari
An Open Apache Community
Fastest path to innovation is an open community
© Hortonworks Inc. 2012
Big Data: It’s About Scale & Structure
Page 22
RDBMS HadoopNoSQLMPPEDW
best fit use
schemaRequired on write Required on read
speedReads are fast Writes are fast
governanceStandards and structured Loosely structured
processingLimited, no data processing Processing coupled with data
data typesStructured Multi and unstructured
Interactive OLAP Analytics
Complex ACID Transactions
Operational Data Store
Data Discovery
Processing unstructured data
Massive Storage/Processing
costSoftware License Support only
resourcesKnown entity Growing, complexities, wide
© Hortonworks Inc. 2012
An Emerging Data Architecture
Page 23
APPLICATIONSDATASYSTEMS
TRADITIONAL REPOS
RDBMS EDW MPP
DATASOURCES
MOBILE
DATA
OLTP,
POS
SYSTEMS
OPERATIONAL
TOOLS
MANAGE &
MONITOR
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensor data, social media)
DEV & DATA
TOOLS
BUILD &
TEST
Business
Analytics
Custom
Applications
Enterprise
Applications
HORTONWORKS
DATA PLATFORM
© Hortonworks Inc. 2012
Interoperating With Your Tools
Page 24
APPLICATIONSDATASYSTEMS
DEV & DATA
TOOLS
OPERATIONAL
TOOLS
Viewpoint
Microsoft Applications
HADOOP
DATASOURCES
MOBILE
DATA
OLTP,
POS
SYSTEMS
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensor data, social media)
AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP
DISCOVERY
PLATFORM
CAPTURE | STORE | REFINE
INTEGRATED
DATA WAREHOUSE
LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS
Engineers
Data Scientists
Business Analysts
Front-Line WorkersCustomers / PartnersMarketing
Operational SystemsExecutives
TERADATA UNIFIED DATA ARCHITECTURE
© Hortonworks Inc. 2012
By the year 2015, we believe half the worlds
data will be processed by Apache Hadoop
Key Hadoop Features for the EDW
•Storage/Processing
•Metadata
© Hortonworks Inc. 2012
By the year 2015, we believe half the worlds
data will be processed by Apache Hadoop
Key Hadoop Features for the EDW
•Storage/Processing
•Metadata
•FAMILIARITY
Organizations Face Several Obstacles with Big Data
Source: Big Analytics 2012 Survey, Teradata
Difficulty
managing
multiple systems,
new types of data
Hard to find right
skills; Lack of
supportability
for new systems &
“data scientists”
Difficulty
deploying and
integrating new
systems
Difficulty
providing
accessibility to
fast insights on
big data
Topics
• Trends in enterprise data architectures
• The value of an integrated data warehouse
• The value of Hadoop
• Bringing it all together and next steps
Confidential and proprietary. Copyright © 2013 Teradata Corporation.30
Teradata Unified Data Architecture
• Hadoop
- Collect ALL
interaction data
• Teradata Aster
- Discovery customer
behavioral patterns
• Teradata
- Operationalize
Insights
The right technology on the right analytical problems using best of
breed technologies
Confidential and proprietary. Copyright © 2013 Teradata Corporation.31
Improved Customer Service and Retention
Hadoop
captures, stores
and transforms
social, images
and call records
Path, pattern &
graph analysis
Data Sources
Multi-Structured
Raw Data
Call Center
Voice Records
Check Images
Traditional Data Flow
Analysis +
Marketing
Automation
(Customer
Campaign)
Capture, Store
and Refine Layer
ETL Tools
Hadoop
Call Data
Integrated DW
DimensionalData
AnalyticResults
Discovery
Platform
Sentiment
Scores
SOCIAL
FEEDS
CLICKSTREAM
DATA
Confidential and proprietary. Copyright © 2013 Teradata Corporation.32
Teradata Workload-Specific Platforms
670
1650
2700
6700
Data Mart
Appliance
Extreme
Data
Appliance
Data
Warehouse
Appliance
Active
Enterprise
Data
Warehouse
Appliance for
Hadoop
Aster Big
Analytics
Appliance
SAS High
Performance
Analytics
Scale Up to 12TB Up to 186PB Up to 1.6PB Up to 61PB Up to 10PB Up to 5PB Up to 52TB
Work-
loads
Test /
Development
or Smaller
Data Marts
Analytical
Archive,
Deep Dive
Analytics
Strategic
Intelligence,
Decision
Support
System, Fast
Scan
Strategic &
Operational
Intelligence,
Real Time
Update, Active
workloads
Appliance for
Storing,
Capturing and
Refining Data.
Hortonworks
HDP 1.1
Discovery
Platform for
Big Data
Analytics with
embedded SQL
MapReduce for
new data types
& sources
Dedicated
appliance for
SAS high-
performance
analytic model
development
700
Confidential and proprietary. Copyright © 2013 Teradata Corporation.33
Teradata Unified Data Architecture
• Hadoop
- Collect ALL
interaction data
• Teradata Aster
- Discovery customer
behavioral patterns
• Teradata
- Operationalize
Insights
The right technology on the right analytical problems using best of
breed technologies
SQL-H SQL-H
Aster-Teradata
Connector
Aster Connector
for Hadoop
Teradata Connector
for Hadoop
Confidential and proprietary. Copyright © 2013 Teradata Corporation.34
Teradata SQL-H™
A Business User’s Bridge to Access Hadoop Data
Teradata SQL-H Gives Business
Users a Better Way to Access
Data Stored in Hadoop
• Trusted: Use existing tools/skills and
enable self-service BI with granular
security
• Allow standard ANSI SQL access to
Hadoop data
• Fast: Queries run on Teradata, data
accessed from Hadoop
• Efficient: Intelligent data access
leveraging the Hadoop HCatalog
Hadoop Layer: HDFS
Pig
Hive
Hadoop
MR
Teradata: SQL-H
HCatalog
Data
DataFiltering
Confidential and proprietary. Copyright © 2013 Teradata Corporation.35
The App Store of Big Data
PATH ANALYSIS
Discover Patterns in Rows of
Sequential Data
TEXT ANALYSIS
Derive Patterns and Extract
Features in Textual Data
STATISTICAL ANALYSIS
High-Performance Processing of
Common Statistical Calculations
SEGMENTATION
Discover Natural Groupings of
Data Points
MARKETING ANALYTICS
Analyze Customer Interactions to
Optimize Marketing Decisions
DATA TRANSFORMATION
Transform Data for More
Advanced Analysis
Graph Analysis
Graph analytics processing and
visualization
SQL-MapReduce
Visualization
Graphing and visualization tools
linked to key functions of the
MapReduce analytics library
Aster Discovery Portfolio: Accelerate Time to Insights
Some of the 80+ out-of-the-box analytical apps
Confidential and proprietary. Copyright © 2013 Teradata Corporation.36
Big Data Analytics & Discovery
Example Customers: Teradata Aster Big Analytics Appliance
XL Axiata
Confidential and proprietary. Copyright © 2013 Teradata Corporation.37
Discovering Deep Insights in Retail
Transforming Web Walks into DNA Sequences
Situation
Large retailer with 700M
visits/year, 2M customers / day
look at 1M products online
Problem
Increase ability of web content
owners to self-serve insights
Solution
Treat web walks like DNA
sequences of simple patterns.
Impact
• Data: loaded logs into Hortonworks
• Loaded 2 months of raw data in 1
hour, vs. 1 day on old system
• Can load a day’s log data in 60 sec
• Sessionize: Creates sequence for
visit, e.g., boils 20 customer clicks
down to 1 line:
• <Home –Search -Look at Product -
Add to Basket – Pay – Exit>
• Analyze: Business analysts can now
do path analysis
• Act:
• Segmentations by behavior can
increase conversion rates by 5-10%.
• Web design changes can drive
another 10-20% more visitors into
the sales funnel
Confidential and proprietary. Copyright © 2013 Teradata Corporation.38
Example: Online Checkout Flow Analysis
• Customers who have reached the checkout process follow an “ideal path”.
• deliveryslots > deliveryinformation > coupons > substitutions > paymentinfo > orderconfirmation
• Determine how and when (and ultimately, why) customers deviate from this path.
• Discover obstacles preventing purchase and optimize visitor flow through the web site.
• The Aster SQL-MapReduce Framework enables a variety of different path visualizations.
Teradata Portfolio for Hadoop
”Taking Hadoop from Silicon Valley to Main Street”
Most Trusted & Flexible Hadoop Platforms for Your Next-Generation
Unified Data Architecture™
1. Teradata Aster Big Analytics Appliance
2. Teradata Appliance for Hadoop
3. Teradata Commodity Offering for Hadoop (Dell)
4. Teradata Software-only for Hadoop (Hortonworks Data Platform)
Complete consulting and training capability
• Big Analytics Services – across the UDA
• Data Integration Optimization – ETL, ELT across the UDA
• Hadoop deployment & mentoring
• Teradata delivering Hortonworks training
• Hadoop Managed Services - operations & administration
Customer Support for Hadoop
• World-class Teradata customer support, backed by Hortonworks
What We Announced Today
Teradata Appliance for Hadoop
Value-Added Software Bringing Hadoop to Enterprise
Access: SQL-H™
Management: Viewpoint, TVI
Administration: Hadoop Builder,
Intelligent start/stop, DataNode
swap, deferred drive replace
High Availability : NameNode
HA, Master Machine Failover
Refining, Metadata,
Entity Resolution
Security & Data Access
HCatalog KerberosKerberos
41 6/27/2013 Teradata Confidential
Complete Consulting and Training Capability
Post-sale Services Areas of Focus
Teradata Analytic
Architecture Services
Services to scope, design, build, operate and maintain an optimal UDA approach
for Teradata, Aster, and Hadoop
Teradata DI Optimization Assess structured/non-structured data, discuss data loading techniques,
determine best platform, optimize load scripts/processes
Teradata Big Analytics Assess data value/cost of capture, identify source of “exhaust” data, create
conceptual architecture, refine and enrich the data, implement initial analytics in
Aster or best-fit tool
Teradata Workshop for
Hadoop
Introduction workshop (across all of UDA)
Teradata Data Staging for
Hadoop
Load data into landing-area; set-up data exploration/refining area; Scope
architecture and analytics; set-up Hadoop repository; Load sample data
Teradata Platform for
Hadoop
Installation guidance and mentoring for Hadoop platform, D-I-Y after installation
Teradata Managed
Services for Hadoop
Operations, management, administration, backup, security, process control for
Hadoop
Teradata Training Courses
for Hadoop
Two comprehensive, multi-day training offerings: 1) Administration of Apache
Hadoop and 2) Developing Solutions Using Apache Hadoop
42 6/27/2013 Teradata Confidential
When to Use Which?
The best approach by workload and data type
Processing as a Function of Schema Requirements and Stage of Data Pipeline
Low Cost
Storage and
Fast Loading
Data Pre-
Processing,
Refining,
Cleansing
“Simple math
at scale”
(Score, filter,
sort, avg.,
count...)
Joins,
Unions,
Aggregates
Analytics
(Iterative and
data mining)
Reporting
Stable
Schema
Evolving
Schema
Aster
(SQL +
MapReduce
Analytics)
Format,
No Schema
Hadoop Hadoop Hadoop Aster Aster
Aster
(MapReduce
Analytics)
Teradata/
Hadoop
Teradata Teradata Teradata Teradata Teradata
Hadoop
Aster /
Hadoop
Aster /
Hadoop
Aster Aster Aster
Hadoop Hadoop Hadoop Aster Aster Aster
Financial Analysis, Ad-Hoc/OLAP
Enterprise-Wide BI and Reporting
Spatial/Temporal
Active Execution
Interactive Data Discovery
Web Clickstream, Set-Top Box Analysis
CDRs, Sensor Logs, JSON
Social Feeds, Text, Image Processing
Audio/Video Storage and Refining
Storage and Batch Transformations
Confidential and proprietary. Copyright © 2013 Teradata Corporation.43
When to Use Which?
The best approach by workload and data type
Processing as a Function of Schema Requirements and Stage of Data Pipeline
Low Cost
Storage and
Fast Loading
Data Pre-
Processing,
Refining,
Cleansing
“Simple math
at scale”
(Score, filter,
sort, avg.,
count...)
Joins,
Unions,
Aggregates
Analytics
(Iterative and
data mining)
Reporting
Stable
Schema
Evolving
Schema
Aster
(SQL +
MapReduce
Analytics)
Format,
No Schema
Hadoop Hadoop Hadoop Aster Aster
Aster
(MapReduce
Analytics)
Teradata/
Hadoop
Teradata Teradata Teradata Teradata Teradata
Hadoop
Aster /
Hadoop
Aster /
Hadoop
Aster Aster Aster
Hadoop Hadoop Hadoop Aster Aster Aster
6/27/2013 44
Questions
and Answers
Thank You!

Más contenido relacionado

La actualidad más candente

Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at AirbnbNeo4j
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architectureSudheer Kondla
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseDatabricks
 
Unlocking Business Value Using Data
Unlocking Business Value Using DataUnlocking Business Value Using Data
Unlocking Business Value Using DataSplunk
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoDataKitchen
 
Metadata Strategies
Metadata StrategiesMetadata Strategies
Metadata StrategiesDATAVERSITY
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations PresentationAdam Doyle
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Seven building blocks for MDM
Seven building blocks for MDMSeven building blocks for MDM
Seven building blocks for MDMKousik Mukherjee
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsDatabricks
 
TOP_407070357-Data-Governance-Playbook.pptx
TOP_407070357-Data-Governance-Playbook.pptxTOP_407070357-Data-Governance-Playbook.pptx
TOP_407070357-Data-Governance-Playbook.pptxSabrinaLameiras1
 
The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...Pieter De Leenheer
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 

La actualidad más candente (20)

Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at Airbnb
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Exemple de Dashboard
Exemple de DashboardExemple de Dashboard
Exemple de Dashboard
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
 
Unlocking Business Value Using Data
Unlocking Business Value Using DataUnlocking Business Value Using Data
Unlocking Business Value Using Data
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps Manifesto
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Metadata Strategies
Metadata StrategiesMetadata Strategies
Metadata Strategies
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Seven building blocks for MDM
Seven building blocks for MDMSeven building blocks for MDM
Seven building blocks for MDM
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
 
Data modeling for the business
Data modeling for the businessData modeling for the business
Data modeling for the business
 
TOP_407070357-Data-Governance-Playbook.pptx
TOP_407070357-Data-Governance-Playbook.pptxTOP_407070357-Data-Governance-Playbook.pptx
TOP_407070357-Data-Governance-Playbook.pptx
 
The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 

Destacado

Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseDataWorks Summit
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsCloudera, Inc.
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data WarehouseCaserta
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...Kai Wähner
 
ETL big data with apache hadoop
ETL big data with apache hadoopETL big data with apache hadoop
ETL big data with apache hadoopMaulik Thaker
 
Teradata Investor Presentation
Teradata Investor Presentation Teradata Investor Presentation
Teradata Investor Presentation teradata2014
 
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...Cloudera, Inc.
 
Maximizing Business Value: Optimizing Technology Investment
Maximizing Business Value: Optimizing Technology InvestmentMaximizing Business Value: Optimizing Technology Investment
Maximizing Business Value: Optimizing Technology InvestmentTeradata
 
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten IchibaRakuten Group, Inc.
 
BSI Teradata: The Shocking Case of Home Electronics Planet
BSI Teradata: The Shocking Case of Home Electronics PlanetBSI Teradata: The Shocking Case of Home Electronics Planet
BSI Teradata: The Shocking Case of Home Electronics PlanetTeradata
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHumza Naseer
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseGwen (Chen) Shapira
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopDatameer
 
Build a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 MinutesBuild a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 MinutesCaserta
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerDataWorks Summit
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkVivian S. Zhang
 
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and BeyondThe Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and BeyondInside Analysis
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with RedshiftAmazon Web Services
 

Destacado (20)

Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
 
Teradata Intelligent Memory
Teradata Intelligent MemoryTeradata Intelligent Memory
Teradata Intelligent Memory
 
ETL big data with apache hadoop
ETL big data with apache hadoopETL big data with apache hadoop
ETL big data with apache hadoop
 
Teradata Investor Presentation
Teradata Investor Presentation Teradata Investor Presentation
Teradata Investor Presentation
 
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
 
Maximizing Business Value: Optimizing Technology Investment
Maximizing Business Value: Optimizing Technology InvestmentMaximizing Business Value: Optimizing Technology Investment
Maximizing Business Value: Optimizing Technology Investment
 
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba
[RakutenTechConf2013] [B-3_2] DWH/Hadoop in Rakuten Ichiba
 
BSI Teradata: The Shocking Case of Home Electronics Planet
BSI Teradata: The Shocking Case of Home Electronics PlanetBSI Teradata: The Shocking Case of Home Electronics Planet
BSI Teradata: The Shocking Case of Home Electronics Planet
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle Database
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Build a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 MinutesBuild a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 Minutes
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and BeyondThe Intelligent Thing -- Using In-Memory for Big Data and Beyond
The Intelligent Thing -- Using In-Memory for Big Data and Beyond
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
 

Similar a Hadoop and the Data Warehouse: When to Use Which

Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...MapR Technologies
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02email2jl
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresKangaroot
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
Modul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxModul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxNouhaElhaji1
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdataTom Rogers
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseHadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseDataWorks Summit
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Vantara
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...DataWorks Summit
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 

Similar a Hadoop and the Data Warehouse: When to Use Which (20)

Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Modul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxModul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptx
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseHadoop is not an Island in the Enterprise
Hadoop is not an Island in the Enterprise
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Hadoop and the Data Warehouse: When to Use Which

  • 1. HADOOP & THE DATA WAREHOUSE: WHEN TO USE WHICH Steve Wooledge – Teradata Labs Jim Walker – Hortonworks 1
  • 2. Topics • Trends in enterprise data architectures • The value of an integrated data warehouse • The value of Hadoop • Bringing it all together and next steps
  • 3. Big Data Comes with BIG HEADACHES Even free software like Hadoop is causing companies to spend more money…Many CIOs believe data is inexpensive because storage has become inexpensive. But data is inherently messy—it can be wrong, it can be duplicative, and it can be irrelevant— which means it requires handling, which is where the real expenses come in. “ ” Through 2015, 85% of Fortune 500 organizations will be unable to exploit big data for competitive advantage. “ ”Source: The Wall Street Journal. “CIOs’ Big Problem with Big Data”. Aug 2012 Source: Gartner. “Information Innovation: Innovation Key Initiative Overview”. April 2012
  • 4. Organizations Face Several Obstacles with Big Data Source: Big Analytics 2012 Survey, Teradata Difficulty managing multiple systems, new types of data Hard to find right skills; Lack of supportability for new systems & “data scientists” Difficulty deploying and integrating new systems Difficulty providing accessibility to fast insights on big data
  • 5. Shift from a Single Platform to an Ecosystem “Big Data requirements are solved by a range of platforms including analytical databases, discovery platforms, and NoSQL solutions beyond Hadoop.” “We will abandon the old models based on the desire to implement for high-value analytic applications.” "Logical" Data Warehouse Source: “Big Data Comes of Age”. EMA and 9sight Consulting. Nov 2012.
  • 6. AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP DISCOVERY PLATFORM CAPTURE | STORE | REFINE INTEGRATED DATA WAREHOUSE LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS Engineers Data Scientists Business Analysts Front-Line WorkersCustomers / PartnersMarketing Operational SystemsExecutives TERADATA UNIFIED DATA ARCHITECTURE
  • 7. Topics • Trends in enterprise data architectures • The value of an integrated data warehouse • The value of Hadoop • Bringing it all together and next steps
  • 8. DUAL SYSTEMS DATA MARTS ANALYTICAL ARCHIVE TEST/ DEV The Value of The Data Warehouse INDEPENDENT DATA MART Business Analysts Knowledge Workers DATA MININGBUSINESS INTELLIGENCE APPLICATIONS Customers/Partners Marketing Executives Front-line Workers Operational Systems INTEGRATED DATA WAREHOUSE DATA LAB Integrated Analytics Advanced Analytics Temporal OLAP Optimization Geospatial Big Data Integration Application Development Agile Analytics Data Exploration Benefits •Easy to consume data •Rationalization of data from multiple sources into single enterprise view •Clean, safe, secure data •Cross-functional analysis •Transform once, use many •Fast response times
  • 9. SQL Advantages with an MPP RDBMS • Full ANSI SQL: • The lingua franca of business users when accessing data • Decades of standardization (stable, feature rich, portable) • Mature 3rd Party SQL based tools that provide business users with self service direct access to the data • BI Tools • In-database statistical packages • Analytic applications (CRM, SCM, MDM) • Easily parallelized • Scalable when manipulating large data sets 6/27/2013 9
  • 10. ACID Advantages in an MPP RDBMS • Guarantees database actions are processed reliably • Ensures 100% query result accuracy • Supports updates and deletes • Needed for applications that require 100% consistency 6/27/2013 10 Atomicity - All of the pieces are committed or none are committed. Consistency - Creates a new and valid state of data, or, if any failure occurs, returns all data to its original state. Isolation - Processed and not yet committed transactions must remain isolated from any other transactions. Durability - Committed data is saved such that in event of a failure and system restart, the data is available in its correct state.
  • 11. Tight Vertical Integration • End-to-end management of resources • Efficient utilization of resources • Engineered extremely well for known data • Fine-grained parallelism and resource management • Consistency of service level delivery Best Practices Management: • Workload functions • Workload groups • Exceptions • Priorities • Time periods
  • 12. Low Latency Advantages of MPP RDBMS Multi-temperature storage with automated distribution of data based on access patterns: • In-Memory • Solid-State Drives • Fast Hard Drives • Fat Hard Drives 6/27/2013 12 • Indexes • Statistics • Advanced partitioning
  • 13. Cost Based Optimizer Advantages in an MPP RDBMS • Best practices optimizer determines how the query will be processed most efficiently, with no “hints” or degrees of parallelism necessary. • In chess, you can look out a few moves to decide your best next move, but you can’t envision all move and countermove sequences for the entire game: • The Grand Master has the knowledge, experience, and intelligence to identify and use the right strategy. • With Hadoop, the user takes a heavy role in optimizing the execution of queries. • With an MPP RDBMS, the software is the optimizer. 6/27/2013 13 Query Rewrite • semantic optimization • different types of vendor tools Fast/Efficient Data Access • Access path - Indexing • Partitioning (CP & PPI) • Advanced partitioning schemes (range & case based, multilevel, dynamic) • IO Optimizations (efficient scans/sync scan) scan optimization Query Complexity • Join costing & planning • Aggregation Many ways to process a complex query…
  • 14. Granular Security Advantages in an MPP RDBMS • Row level security • Column level security • An MPP RDBMS tightly integrates mature security features • User-level security controls • Increased user authentication options • Support for security roles • Enterprise directory integration • Auditing and monitoring controls • Encryption 6/27/2013 14
  • 15. MPP RDBMS Customer Examples 6/27/2013 15
  • 16. Topics • Trends in enterprise data architectures • The value of an integrated data warehouse • The value of Hadoop • Bringing it all together and next steps
  • 17. © Hortonworks Inc. 2012 By the year 2015, we believe half the worlds data will be processed by Apache Hadoop Key Hadoop Features for the EDW •Storage/Processing •Metadata
  • 18. © Hortonworks Inc. 2012 Data Explosion The World of Data is Changing Page 18 By 2015, organizations that build a modern information management system will outperform their peers financially by 20 percent. – Gartner, Mark Beyer, “Information Management in the 21st Century” 1 Zettabyte (ZB) = 1 Billion TBs 15x growth rate of machine generated data by 2020 Source: IDC
  • 19. © Hortonworks Inc. 2012 StorageApache Hadoop: Center of Big Data Strategy Open Source data management with scale-out storage & distributed processing Page 19 HDFS • Distributed across “nodes” • Natively redundant • Name node tracks locations Processing Map Reduce • Splits a task across processors “near” the data & assembles results • Self-Healing, High Bandwidth Clustered Storage Key Characteristics • Scalable – Efficiently store and process petabytes of data – Linear scale driven by additional processing and storage • Reliable – Redundant storage – Failover across nodes and racks • Flexible – Store all types of data in any format – Apply schema on analysis and sharing of the data • Economical – Use commodity hardware – Open source software guards against vendor lock-in
  • 20. © Hortonworks Inc. 2012 HCatalog Table access Aligned metadata REST API • Raw Hadoop data • Inconsistent, unknown • Tool specific access Apache HCatalog provides flexible metadata services across tools and external access Metadata Services • Consistency of metadata and data models across tools (MapReduce, Pig, HBase and Hive) • Accessibility: share data as tables in and out of HDFS • Availability: enables flexible, thin-client access via REST API Shared table and schema management opens the platform
  • 21. © Hortonworks Inc. 2012 Page 21 “how to” deliver an open source enterprise product • Identify requirements • Open community delivery • Enterprise rigor Apache Hadoop Test & Patch Design & Develop Release Apache Pig Apache HCatalo g Apache HBase Other Apache Projects Apache Hive Apache Ambari An Open Apache Community Fastest path to innovation is an open community
  • 22. © Hortonworks Inc. 2012 Big Data: It’s About Scale & Structure Page 22 RDBMS HadoopNoSQLMPPEDW best fit use schemaRequired on write Required on read speedReads are fast Writes are fast governanceStandards and structured Loosely structured processingLimited, no data processing Processing coupled with data data typesStructured Multi and unstructured Interactive OLAP Analytics Complex ACID Transactions Operational Data Store Data Discovery Processing unstructured data Massive Storage/Processing costSoftware License Support only resourcesKnown entity Growing, complexities, wide
  • 23. © Hortonworks Inc. 2012 An Emerging Data Architecture Page 23 APPLICATIONSDATASYSTEMS TRADITIONAL REPOS RDBMS EDW MPP DATASOURCES MOBILE DATA OLTP, POS SYSTEMS OPERATIONAL TOOLS MANAGE & MONITOR Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media) DEV & DATA TOOLS BUILD & TEST Business Analytics Custom Applications Enterprise Applications HORTONWORKS DATA PLATFORM
  • 24. © Hortonworks Inc. 2012 Interoperating With Your Tools Page 24 APPLICATIONSDATASYSTEMS DEV & DATA TOOLS OPERATIONAL TOOLS Viewpoint Microsoft Applications HADOOP DATASOURCES MOBILE DATA OLTP, POS SYSTEMS Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media)
  • 25. AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP DISCOVERY PLATFORM CAPTURE | STORE | REFINE INTEGRATED DATA WAREHOUSE LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS Engineers Data Scientists Business Analysts Front-Line WorkersCustomers / PartnersMarketing Operational SystemsExecutives TERADATA UNIFIED DATA ARCHITECTURE
  • 26. © Hortonworks Inc. 2012 By the year 2015, we believe half the worlds data will be processed by Apache Hadoop Key Hadoop Features for the EDW •Storage/Processing •Metadata
  • 27. © Hortonworks Inc. 2012 By the year 2015, we believe half the worlds data will be processed by Apache Hadoop Key Hadoop Features for the EDW •Storage/Processing •Metadata •FAMILIARITY
  • 28. Organizations Face Several Obstacles with Big Data Source: Big Analytics 2012 Survey, Teradata Difficulty managing multiple systems, new types of data Hard to find right skills; Lack of supportability for new systems & “data scientists” Difficulty deploying and integrating new systems Difficulty providing accessibility to fast insights on big data
  • 29. Topics • Trends in enterprise data architectures • The value of an integrated data warehouse • The value of Hadoop • Bringing it all together and next steps
  • 30. Confidential and proprietary. Copyright © 2013 Teradata Corporation.30 Teradata Unified Data Architecture • Hadoop - Collect ALL interaction data • Teradata Aster - Discovery customer behavioral patterns • Teradata - Operationalize Insights The right technology on the right analytical problems using best of breed technologies
  • 31. Confidential and proprietary. Copyright © 2013 Teradata Corporation.31 Improved Customer Service and Retention Hadoop captures, stores and transforms social, images and call records Path, pattern & graph analysis Data Sources Multi-Structured Raw Data Call Center Voice Records Check Images Traditional Data Flow Analysis + Marketing Automation (Customer Campaign) Capture, Store and Refine Layer ETL Tools Hadoop Call Data Integrated DW DimensionalData AnalyticResults Discovery Platform Sentiment Scores SOCIAL FEEDS CLICKSTREAM DATA
  • 32. Confidential and proprietary. Copyright © 2013 Teradata Corporation.32 Teradata Workload-Specific Platforms 670 1650 2700 6700 Data Mart Appliance Extreme Data Appliance Data Warehouse Appliance Active Enterprise Data Warehouse Appliance for Hadoop Aster Big Analytics Appliance SAS High Performance Analytics Scale Up to 12TB Up to 186PB Up to 1.6PB Up to 61PB Up to 10PB Up to 5PB Up to 52TB Work- loads Test / Development or Smaller Data Marts Analytical Archive, Deep Dive Analytics Strategic Intelligence, Decision Support System, Fast Scan Strategic & Operational Intelligence, Real Time Update, Active workloads Appliance for Storing, Capturing and Refining Data. Hortonworks HDP 1.1 Discovery Platform for Big Data Analytics with embedded SQL MapReduce for new data types & sources Dedicated appliance for SAS high- performance analytic model development 700
  • 33. Confidential and proprietary. Copyright © 2013 Teradata Corporation.33 Teradata Unified Data Architecture • Hadoop - Collect ALL interaction data • Teradata Aster - Discovery customer behavioral patterns • Teradata - Operationalize Insights The right technology on the right analytical problems using best of breed technologies SQL-H SQL-H Aster-Teradata Connector Aster Connector for Hadoop Teradata Connector for Hadoop
  • 34. Confidential and proprietary. Copyright © 2013 Teradata Corporation.34 Teradata SQL-H™ A Business User’s Bridge to Access Hadoop Data Teradata SQL-H Gives Business Users a Better Way to Access Data Stored in Hadoop • Trusted: Use existing tools/skills and enable self-service BI with granular security • Allow standard ANSI SQL access to Hadoop data • Fast: Queries run on Teradata, data accessed from Hadoop • Efficient: Intelligent data access leveraging the Hadoop HCatalog Hadoop Layer: HDFS Pig Hive Hadoop MR Teradata: SQL-H HCatalog Data DataFiltering
  • 35. Confidential and proprietary. Copyright © 2013 Teradata Corporation.35 The App Store of Big Data PATH ANALYSIS Discover Patterns in Rows of Sequential Data TEXT ANALYSIS Derive Patterns and Extract Features in Textual Data STATISTICAL ANALYSIS High-Performance Processing of Common Statistical Calculations SEGMENTATION Discover Natural Groupings of Data Points MARKETING ANALYTICS Analyze Customer Interactions to Optimize Marketing Decisions DATA TRANSFORMATION Transform Data for More Advanced Analysis Graph Analysis Graph analytics processing and visualization SQL-MapReduce Visualization Graphing and visualization tools linked to key functions of the MapReduce analytics library Aster Discovery Portfolio: Accelerate Time to Insights Some of the 80+ out-of-the-box analytical apps
  • 36. Confidential and proprietary. Copyright © 2013 Teradata Corporation.36 Big Data Analytics & Discovery Example Customers: Teradata Aster Big Analytics Appliance XL Axiata
  • 37. Confidential and proprietary. Copyright © 2013 Teradata Corporation.37 Discovering Deep Insights in Retail Transforming Web Walks into DNA Sequences Situation Large retailer with 700M visits/year, 2M customers / day look at 1M products online Problem Increase ability of web content owners to self-serve insights Solution Treat web walks like DNA sequences of simple patterns. Impact • Data: loaded logs into Hortonworks • Loaded 2 months of raw data in 1 hour, vs. 1 day on old system • Can load a day’s log data in 60 sec • Sessionize: Creates sequence for visit, e.g., boils 20 customer clicks down to 1 line: • <Home –Search -Look at Product - Add to Basket – Pay – Exit> • Analyze: Business analysts can now do path analysis • Act: • Segmentations by behavior can increase conversion rates by 5-10%. • Web design changes can drive another 10-20% more visitors into the sales funnel
  • 38. Confidential and proprietary. Copyright © 2013 Teradata Corporation.38 Example: Online Checkout Flow Analysis • Customers who have reached the checkout process follow an “ideal path”. • deliveryslots > deliveryinformation > coupons > substitutions > paymentinfo > orderconfirmation • Determine how and when (and ultimately, why) customers deviate from this path. • Discover obstacles preventing purchase and optimize visitor flow through the web site. • The Aster SQL-MapReduce Framework enables a variety of different path visualizations.
  • 39. Teradata Portfolio for Hadoop ”Taking Hadoop from Silicon Valley to Main Street” Most Trusted & Flexible Hadoop Platforms for Your Next-Generation Unified Data Architecture™ 1. Teradata Aster Big Analytics Appliance 2. Teradata Appliance for Hadoop 3. Teradata Commodity Offering for Hadoop (Dell) 4. Teradata Software-only for Hadoop (Hortonworks Data Platform) Complete consulting and training capability • Big Analytics Services – across the UDA • Data Integration Optimization – ETL, ELT across the UDA • Hadoop deployment & mentoring • Teradata delivering Hortonworks training • Hadoop Managed Services - operations & administration Customer Support for Hadoop • World-class Teradata customer support, backed by Hortonworks What We Announced Today
  • 40. Teradata Appliance for Hadoop Value-Added Software Bringing Hadoop to Enterprise Access: SQL-H™ Management: Viewpoint, TVI Administration: Hadoop Builder, Intelligent start/stop, DataNode swap, deferred drive replace High Availability : NameNode HA, Master Machine Failover Refining, Metadata, Entity Resolution Security & Data Access HCatalog KerberosKerberos
  • 41. 41 6/27/2013 Teradata Confidential Complete Consulting and Training Capability Post-sale Services Areas of Focus Teradata Analytic Architecture Services Services to scope, design, build, operate and maintain an optimal UDA approach for Teradata, Aster, and Hadoop Teradata DI Optimization Assess structured/non-structured data, discuss data loading techniques, determine best platform, optimize load scripts/processes Teradata Big Analytics Assess data value/cost of capture, identify source of “exhaust” data, create conceptual architecture, refine and enrich the data, implement initial analytics in Aster or best-fit tool Teradata Workshop for Hadoop Introduction workshop (across all of UDA) Teradata Data Staging for Hadoop Load data into landing-area; set-up data exploration/refining area; Scope architecture and analytics; set-up Hadoop repository; Load sample data Teradata Platform for Hadoop Installation guidance and mentoring for Hadoop platform, D-I-Y after installation Teradata Managed Services for Hadoop Operations, management, administration, backup, security, process control for Hadoop Teradata Training Courses for Hadoop Two comprehensive, multi-day training offerings: 1) Administration of Apache Hadoop and 2) Developing Solutions Using Apache Hadoop
  • 42. 42 6/27/2013 Teradata Confidential When to Use Which? The best approach by workload and data type Processing as a Function of Schema Requirements and Stage of Data Pipeline Low Cost Storage and Fast Loading Data Pre- Processing, Refining, Cleansing “Simple math at scale” (Score, filter, sort, avg., count...) Joins, Unions, Aggregates Analytics (Iterative and data mining) Reporting Stable Schema Evolving Schema Aster (SQL + MapReduce Analytics) Format, No Schema Hadoop Hadoop Hadoop Aster Aster Aster (MapReduce Analytics) Teradata/ Hadoop Teradata Teradata Teradata Teradata Teradata Hadoop Aster / Hadoop Aster / Hadoop Aster Aster Aster Hadoop Hadoop Hadoop Aster Aster Aster Financial Analysis, Ad-Hoc/OLAP Enterprise-Wide BI and Reporting Spatial/Temporal Active Execution Interactive Data Discovery Web Clickstream, Set-Top Box Analysis CDRs, Sensor Logs, JSON Social Feeds, Text, Image Processing Audio/Video Storage and Refining Storage and Batch Transformations
  • 43. Confidential and proprietary. Copyright © 2013 Teradata Corporation.43 When to Use Which? The best approach by workload and data type Processing as a Function of Schema Requirements and Stage of Data Pipeline Low Cost Storage and Fast Loading Data Pre- Processing, Refining, Cleansing “Simple math at scale” (Score, filter, sort, avg., count...) Joins, Unions, Aggregates Analytics (Iterative and data mining) Reporting Stable Schema Evolving Schema Aster (SQL + MapReduce Analytics) Format, No Schema Hadoop Hadoop Hadoop Aster Aster Aster (MapReduce Analytics) Teradata/ Hadoop Teradata Teradata Teradata Teradata Teradata Hadoop Aster / Hadoop Aster / Hadoop Aster Aster Aster Hadoop Hadoop Hadoop Aster Aster Aster