What is Big Data? The 5 V's and Real World Use Cases
1.
2. What is Big Data?
A new generation of technologies and architectures
designed to economically extract value from very large
volumes of a wide variety of data, by enabling high
velocity capture, discovery and/or analysis
3. VELOCITY
VARIETY
VOLUME
+ VISUALIZATION
VALUE
Big Data’s impact can be expressed by The Five V’s
4. E-Commerce Site fed by outsourced Ad Servers
Ads appear on a wide range of sites with various offers
Massive amount of data is generated by these servers:
• Web logs and click stream data from the E-Commerce Site
• Ad logs and click stream data from the Ad Servers
• Results in relational transactions on the site
Goal: Maximize Traffic Analysis for Business Value
• Velocity Demo: Pinpoint activity in real-time & react
• Variety Demo: Examine historical trends across sources
• Visualization Demo: Enable ad-hoc data analysis for insights
Demo Context
5. WEB SERVERS
How to identify when Ad clicks results in Site Traffic?
High volume stream of log activity coming in:
• Web logs and Ad Server logs
Real-time stream analysis allows for pinpointing
data when it happens
LOG FILES Simultaneously join structured and unstructured
data in a persistent query
Can be used for A/B testing, Offer improvement,
Site Dynamic behavior, or Fraud Detection
AD SERVERS
Velocity Architecture
7. WEB SERVERS
How to do historical analysis on unstructured data?
M/R
LOG FILES
Ad Servers and Web Servers generate different log files with different formats
making them hard to analyze
Map/Reduce processing allows for us to execute a query across variant data
formats stored in Hadoop
Hive provides a traditional query interface to Map/Reduce
Correlate and connect high variety data for trend analysis
AD SERVERS
Variety Architecture
8. Access Azure blob storage via a Hive “view” and aggregate session data
CREATE EXTERNAL TABLE logs (
date1 STRING,
time1 STRING,
action STRING,
page_uri STRING,
cookie STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
STORED AS TEXTFILE
LOCATION 'asv://logs/logs/';
CREATE TABLE log_summary AS
SELECT l.cookie
,MAX(regexp_replace(cookie, '[-]', '') % 36) AS geo_hash
,MAX(l.time1) AS time1
,l.page_uri
,MAX(CASE LOWER(action) WHEN 'click' THEN concat(l.date1, ' ', l.time1) ELSE NULL END) AS click_time
,MIN(CASE LOWER(action) WHEN 'view' THEN concat(l.date1, ' ', l.time1) ELSE NULL END) AS view_time
,MAX(l.date1) AS date1
FROM logs l
GROUP BY l.cookie, l.page_uri;
Hive HQL Queries
10. Hadoop is an open source framework for building large scale,
distributed, data- intensive applications
• Hadoop is HDFS, the
kernel & M/R
• MapReduce brings the
code to the data
• Open set of tools exist to
extend its functional uses
and representations
Hadoop Ecosystem Overview
11. The "Map" step The "Reduce" step
The mappers are responsible for reading the input data and Each reducer executes a function on all values for a given
emitting key/value pairs. The input file can be CSV, XML, or any key. The framework ensures that all values for the same
format as long as it can be converted into k/v pairs. key are sent to the same reducer.
Map/Reduce Distributes Processing of Operations
12. WEB SERVERS
How to do ad-hoc data discovery and visualizations?
M/R
LOG FILES
Ad Servers and Web Servers generate different log files with different formats
making them hard to analyze
Map/Reduce processing allows for us to execute a query across variant data
formats stored in Hadoop
Hive provides a traditional query interface to Map/Reduce
Correlate and connect high variety data for trend analysis
AD SERVERS
Visualization Architecture
14. Big Data & Analytics Projects are often Additive
• New Capabilities layered on top of existing data & apps
• Analytics can drive Applications in new ways
Visualizations put Big Data in the hands of the Business
Summary
17. Envisioning & Strategy Briefing: Big Data, Analytics & Collaboration
Envisioning Session: Data is the App – Envisioning the Next
Generation, Data Driven Enterprise
Architecture Design Session: Big Data & Analytics
Healthcare / Life Sciences: Strategy Briefing or Architecture Design
Session – Big Data Architecture, Cloud & Use Case Driven Analytics
and applications, Portal, M-Health and UX design for Providers,
Patients, Pharma & Biotechnology
Financial Services: Strategy Briefing or Architecture Design Session –
Big Data & Analytics for Banking, Capital Markets, Retail Brokerage or
Insurance
Take the next steps - our offerings
19. DESIGN Differentiation
UX DATA SOCIAL Specialization
CODE Foundation
Who We Are
20. DESIGN
Differentiation
Strategy Analysis Creative
UX DATA SOCIAL
Desktop Analytics Web Content
Specialization
Mobile Big Data Intranets
Web Client Core SQL Collaboration
.NET SERVICES On-Premise
Foundation
Java PPP Cloud
Who We Are