Más contenido relacionado La actualidad más candente (20) Similar a Hortonworks and HP Vertica Webinar (20) Hortonworks and HP Vertica Webinar1. Using HP Vertica and Apache Hadoop
…for customer analytics
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
We do Hadoop.
2. Your speakers…
John Kreisa, VP Strategic Alliance Marketing
Hortonworks
Chris Selland, VP Business Development
HP Software, Big Data Group
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
3. Poll
Where are you in your Hadoop journey?
• Researching our options
• Currently evaluating some software
• Deep in a trial
• What’s Hadoop?
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4. Big Data Market Trends & Projections
Big
Data
Explosion
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
% by which org’s
leveraging modern info
management systems
outperform peers by 2015
85%
from new
data types
ñ
Hadoop
enabled
DBMS’s
50x
data growth 2010 to
2020
1 Zettabyte (ZB)
=
1 Billion TBs
15x
growth rate of
machine generated
data by 2020
The US has 1/3 of the world’s data
Big Data is 1 of 5 US GDP Game Changers $325 billion
incremental annual GDP from big data analytics in retail and manufacturing by
2020
5. Over 50% of Internet connections are things:
2011: 15+ billion permanent, 50+ billion intermittent
2020: 30+ billion permanent, >200 billion intermittent
Cameras and microphones widely
deployed
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
New routes to market via intelligent
objects
Content and services via
connected products
Everything has
a URL
Remote sensing of objects and
environment
Augmented
reality
Situational decision
support
Building and
infrastructure management
Source: Gartner Keynote at Hadoop Summit 2013
6. A Data Architecture Under Pressure From New Data
DATA
SYSTEM
APPLICATIONS
Business
Analy4cs
RDBMS
EDW
MPP
REPOSITORIES
SOURCES
Exis4ng
Sources
Custom
Applica4ons
(CRM,
ERP,
Clickstream,
Logs)
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Packaged
Applica4ons
2.8
ZB
in
2012
85%
from
New
Data
Types
15x
Machine
Data
by
2020
40
ZB
by
2020
Source: IDC
OLTP,
ERP,
CRM
Systems
Unstructured
documents,
emails
Server
logs
Clickstream
Sen>ment,
Web
Data
Sensor.
Machine
Data
Geoloca>on
7. Hadoop Within An Emerging Modern Data Architecture
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
DEV
&
DATA
TOOLS
Build &
Test
OPERATIONS
TOOLS
Provision,
Manage &
Monitor
DATA
SYSTEM
REPOSITORIES
SOURCES
RDBMS
EDW
MPP
OLTP,
ERP,
CRM
Systems
Documents,
Emails
Web
Logs,
Click
Streams
Social
Networks
Machine
Generated
Sensor
Data
Geoloca>on
Data
Governance
& Integration
Security
Operations
Data Access
Data Management
APPLICATIONS
Business
Analy4cs
Custom
Applica4ons
Packaged
Applica4ons
8. Hadoop: Typically Used For New Analytic Applications
SCALE SCOPE
New Analytic Apps
New types of data
LOB-driven
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
9. Clickstream
Capture and
analyze website
visitors’ data trails
and optimize your
website
Sensors
Discover patterns in
data streaming
automatically from
remote sensors and
machines
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Server Logs
Research logs to
diagnose process
failures and prevent
security breaches
Hadoop Value: New types of data
Sentiment
Understand how
your customers
feel about your
brand and products
– right now
Geographic
Analyze location-based
data to
manage operations
where they occur
Unstructured
Understand patterns
in files across
millions of web
pages, emails, and
documents
10. New Analytic Applications For New Types Of Data
$
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
• Supplier Consolidation
• Supply Chain and Logistics
• Assembly Line Quality Assurance
• Proactive Maintenance
• Crowdsourced Quality Assurance
• New Account Risk Screens
• Fraud Prevention
• Trading Risk
• Maximize Deposit Spread
• Insurance Underwriting
• Accelerate Loan Processing
• Call Detail Records (CDRs)
• Infrastructure Investment
• Next Product to Buy (NPTB)
• Real-time Bandwidth
Allocation
• New Product Development
• 360° View of the Customer
• Analyze Brand Sentiment
• Localized, Personalized
Promotions
• Website Optimization
• Optimal Store Layout
Financial
Services
Retail Telecom Manufacturing
Healthcare Utilities,
Oil & Gas
Public
Sector
• Genomic data for medical trials
• Monitor patient vitals
• Reduce re-admittance rates
• Store medical research data
• Recruit cohorts for
pharmaceutical trials
• Smart meter stream analysis
• Slow oil well decline curves
• Optimize lease bidding
• Compliance reporting
• Proactive equipment repair
• Seismic image processing
• Analyze public sentiment
• Protect critical networks
• Prevent fraud and waste
• Crowdsource reporting for
repairs to infrastructure
• Fulfill open records requests
11. 360° Customer View for Home Supply Retailer
Problem
Lack of a unified customer record across all channels
• Global distribution online, in home and across 2000+ stores
• No “golden record” for analytics on customer buying behavior across all channels
• Data repositories on website traffic, POS transactions and in-home services existed
in isolation of each other
• Limited ability for targeted marketing to specific segments
• Data storage costs increasing
Solution
HDP delivers targeted marketing & data storage savings
• Golden record enables targeted, customized marketing
• Data warehouse offload saved millions in recurring expense
• Customer team continues to find unexpected, unplanned uses for their 360 degree
view of customer buying behavior
• New use case: price optimization versus competitors à several millions in top-line
revenue growth
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Creating Opportunity
Data: Clickstream,
Unstructured, Structured
Retail
Major home improvement
retailer
>$74B in revenue
>300K employees
>2,200 stores
RT2
12. Hadoop Incrementally Delivers A ‘Data Lake’
A Modern Data Architecture/Data Lake
SCALE SCOPE
RDBMS
MPP
EDW
New Analytic Apps
New types of data
LOB-driven
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Governance
& Integration
Security
Operations
Data Access
Data Management
Data Lake
An architectural shift in the
data center that uses Hadoop
to deliver deeper insight across
a large, broad, diverse set of
data at efficient scale
13. Hadoop: An Integrated Part Of The Modern Data Architecture
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
DEPTH
Hortonworks engages
in deep engineered
relationships with the
leaders in the data center,
applications and
operations
BREADTH
Hundreds of partners work
with us to certify their
applications to work with
Hadoop so they can
extend big data to their
users
DEV
&
DATA
TOOLS
Provision,
Manage &
Monitor
DATA
SYSTEM
APPLICATIONS
OPERATIONAL
TOOLS
INFRASTRUCTURE
HDP 2.1
Governance
& Integration
Security
Operations
Data Access
Data Management
Business
Analy4cs
Custom
Applica4ons
Packaged
Applica4ons
REPOSITORIES
Build & Test
On Premise or in
the Cloud
SOURCES
OLTP,
ERP,
CRM
Systems
Documents,
Emails
Web
Logs,
Click
Streams
Social
Networks
Machine
Generated
Sensor
Data
Geoloca>on
Data
14. Customer Analytics with HP Vertica +
Hortonworks
Chris Selland, VP Business Development
HP Software, Big Data Group
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
15. Completing Analytical Vision
Structured Semi-Structured Unstructured
CRM ERP Data Warehouse Web Social Log Files Machine Data Images
Data Types
Accuracy and Insight
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 15 without notice.
Dark Data
Traditional Big Data
Enterprise Data
Audio Video
16. Customer Analytics in the Big Data Era
Select Customers with < 2 Months Remaining on
Contract with 5+ dropped calls per week and lifetime
value > $500
From a database get me all matches from the CRM and Call Detail Records that match the query
Customer expressed negative sentiment through
social media, web log and/or support within the last 3
months
From unstructured sources get me all matches for weblogs, calls, chat, email that were negative for the structured results
Structured Data
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 16 without notice.
Unstructured Data
17. Introducing HP Vertica Dragline
Faster answers from Big Data at a fraction of the cost of traditional data warehouses
Store all your data in any format cost-effectively across
Vertica + Hadoop
Explore all your data directly in Hadoop without moving
or changing it
Serve all of your data consumers without compromise
from individualized queries to large complex reports
HP Vertica
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 17 without notice.
18. HP Vertica Dragline:
The Richest, Most Open SQL on Hadoop
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Challenge
Extracting Data from Hadoop
requires complex and brittle ETL
processes
Solution:
Hadoop Navigation and Analytics
Benefits:
• Navigate Hadoop data using its
native catalog
• Quickly & easily load native data
types from Hadoop to Vertica
• Avoid creating and maintaining
time-consuming schemas
• Use the full power of HP Vertica
SQL and Analytics
DEV
&
DATA
TOOLS
Provision,
Manage &
Monitor
DATA
SYSTEM
APPLICATIONS
OPERATIONAL
TOOLS
INFRASTRUCTURE
HDP 2.1
Governance
& Integration
Security
Operations
Data Access
Data Management
Business
Analy4cs
Custom
Applica4ons
Packaged
Applica4ons
REPOSITORIES
Build & Test
On Premise or in
the Cloud
SOURCES
OLTP,
ERP,
CRM
Systems
Documents,
Emails
Web
Logs,
Click
Streams
Social
Networks
Machine
Generated
Sensor
Data
Geoloca>on
Data
19. Flexible Vertica Hadoop Connectivity
Leverage existing tools in shared Vertica and Hadoop storage environment
HDP 2.1
Hortonworks Data Platform
HCatalog Connector
Hadoop Connector webHCAT
webHDFS
ANSI SQL
webHDFS
Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Provision,
Manage &
Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow,
Lifecycle &
Governance
Falcon
Sqoop
Flume
NFS
WebHDFS
BATCH, INTERACTIVE & REAL-TIME SECURITY
DATA ACCESS
YARN: Data Operating System
DATA MANAGEMENT
GOVERNANCE
& INTEGRATION
Authentication
Authorization
Accounting
Data Protection
Storage: HDFS
Resources: YARN
Access: Hive, …
Pipeline: Falcon
Cluster: Knox
OPERATIONS
Script
Pig
Search
Solr
SQL
Hive
HCatalog
NoSQL
HBase
Accumulo
Stream
Storm
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
HDFS
(Hadoop Distributed File System)
In-Memory
Spark
TezTez
Batch
Map
Reduce
Storage Tiering
HDFS Connector
External Tables and Copy
20. Data Tiering and Cost Optimization
Tier-off
older data
Cool
Cold
Hot
Dark Data
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 20 without notice.
Value
Discovery
Interactive Data
Frequently queried
Vertica data cache
Batch Data
Archive Data
Serve
Convert data to
Vertica storage format
Explore
Any format
Store
Location Format Any format
21. More than 140 Characters
JSON Record-Unstructured Data
{"filter_level":"medium","contributors":null,"text":“Listening to Meg Whitman talk about the New
Style of IT at #HPDiscover","geo":null,"retweeted":false,"in_reply_to_screen_name":null,"truncated":false,
"lang":"en","entities":{"symbols":[],"urls":[],"hashtags":[{"text":"nope","indices":[51,56]}], "user_mentions":
[]},"in_reply_to_status_id_str":null,"id":346104750565097474,"source":"!
<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>",
"in_reply_to_user_id_str":null,"favorited":false,"in_reply_to_status_id":null,"retweet_count":
0,"created_at":“Tue Jun 11 03:19:37 +0000 2013","in_reply_to_user_id":null, "favorite_count":0,"id_str":
"346104750565097474","place":null,"user":
{"location":"","default_profile":false,"profile_background_tile":true,"statuses_count":
2354,"lang":"en","profile_link_color":"FF0000","profile_banner_url":"https://pbs.twimg.com/profile_banners/
271588683/1370571522","id":271588683,"following":null,"protected":false,"favourites_count":
121,"profile_text_color":"3D1957","description":"Dance It is a part of me A part of who I am It has entered my
life Taken over my body It is in my walk In my movements In my thoughts I have become a
DANCER","verified":false,"contributors_enabled": false,"profile_sidebar_border_color":"65B0DA","name":"ashley
tousignant", "profile_background_color":"642D8B","created_at": "Thu Mar 24 20:25:59 +0000
2011","default_profile_image":false,"followers_count":434,"profile_image_url_https":"https://si0.twimg.com/
profile_images/3765534455/
eee814d484d70b8eb9ca5db08a122cbb_normal.jpeg","geo_enabled":true,"profile_background_image_url":"http://
a0.twimg.com/images/themes/theme10/bg.gif","profile_background_image_url_https":"https://si0.twimg.com/images/
themes/theme10/
bg.gif","follow_request_sent":null,"url":null,"utc_offset":null,"time_zone":null,"notifications":null,"profile_u
se_background_image":true,"friends_count":
844,"profile_sidebar_fill_color":"7AC3EE","screen_name":"01ashleymt","id_str":"271588683","profile_image_url":"h
ttp://a0.twimg.com/profile_images/3765534455/eee814d484d70b8eb9ca5db08a122cbb_normal.jpeg","listed_count":
0,"is_translator":false},"coordinates":null}!
!
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change 21 without notice.
22. HP Vertica Flex Zone
Avoid creating and maintaining time-consuming schemas
Faster SQL querying
Auto-schematization
Flexible parsers
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
22
on semi-structured data
semi-structured data loading
for JSON and delimited data
One-step schema
for blazing-fast performance
Load, manage, and explore semi-structured data
23. Major Computer Products Manufacturer
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
23
Analyzing Billions of Clicks
Challenge online
• Millions of website visitors
generate billions of clicks per
month
• Must store 5 years worth of
data to get full value of year-over-
year clickstream analysis
• Legacy database had sluggish
performance – queries took
48 hours after each day’s
transactions
• Extremely complex website –
many pages are generated
dynamically creating complex
clickstream trails
HP Vertica Solution
• Queries run in hours or even
minutes; 48x – 100x faster
• Industry-standard SQL
accelerated acceptance and
proficiency
• Speed of HP Vertica allows
iterative and recursive
analysis for deeper dives
• Functionality tailored to
individual interactions based
on nuanced understanding of
user behavior at an individual
level
24. Next steps…
More about HP Vertica & Hortonworks
http://hortonworks.com/partner/HP/
Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop 2
Don’t miss our next webinar! HP Converged Systems and Hortonworks
Planning for the Impacts of Big Data in the Data Center
http://info.hortonworks.com/hpconvergedandhortonworks.html
Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
25. End
Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved