Ben Marden - Making sense of Big Data

Ben Marden
HortonWorks
Making sense of Big Data

© Hortonworks Inc. 2013
Hortonworks
Making sense of Big Data
Benedict Marden
June 2013
Page 2

Hortonworks
Page 3

Why Data Driven Business?
Page 4
1110010100001010011101010100010010100100101001001000010010001001000001000100000100
01001001000100001011100001001000100010100100101111010100100010010010100101001001111
1001010010100011111010001001010000010010001010010111101010011001001010010001000111
Data driven decisions are
better decisions – its as simple
as that. Using big data enables
mangers to decide on the
basis of evidence rather than
intuition. For that reason it has
the potential to revolutionize
management
Harvard Business Review
October 2012

A Brief History of Apache Hadoop
Page 5
2013
Focus on INNOVATION
2005: Yahoo! creates
team under E14 to
work on Hadoop
Focus on OPERATIONS
2008: Yahoo team extends focus to
operations to support multiple
projects & growing clusters
Yahoo! begins to
Operate at scale
Enterprise
Hadoop
Apache Project
Established
Hortonworks
Data Platform
2004 2008 2010 20122006
STABILITY
2011: Hortonworks created to focus on
“Enterprise Hadoop“. Starts with 24
key Hadoop engineers from Yahoo

Leadership that Starts at the Core
Page 6
• Driving next generation Hadoop
– YARN, MapReduce2, HDFS2, High
Availability, Disaster Recovery
• 420k+ lines authored since 2006
– More than twice nearest contributor
• Deeply integrating w/ecosystem
– Enabling new deployment platforms
– (ex. Windows & Azure, Linux & VMware HA)
– Creating deeply engineered solutions
– (ex. Teradata big data appliance)
• All Apache, NO holdbacks
– 100% of code contributed to Apache

Hortonworks Snapshot
Page 7
• We distribute the only 100%
Open Source Enterprise
Hadoop Distribution:
Hortonworks Data
Platform
• We engineer, test & certify
HDP for enterprise usage
• We employ the core
architects, builders and
operators of Apache Hadoop
• We drive innovation within
Apache Software
Foundation projects
• We are uniquely positioned
to deliver the highest quality
of Hadoop support
• We enable the ecosystem to
work better with Hadoop
Develop Distribute Support
We develop, distribute and support
the ONLY 100% open source
Enterprise Hadoop distribution
Endorsed by Strategic Partners
Headquarters: Palo Alto, CA
Employees: 200+ and growing
Investors: Benchmark, Index, Yahoo

OS Cloud VM Appliance
HDP: Enterprise Hadoop Distribution
Page 8
PLATFORM SERVICES
HADOOP CORE
DATA
SERVICES
OPERATIONAL
SERVICES
Manage &
Operate at
Scale
Store, Proces
s and Access
Data
HORTONWORKS
DATA PLATFORM (HDP)
Distributed
Storage & Processing
Hortonworks
Data Platform (HDP)
Enterprise Hadoop
• The ONLY 100% open source
and complete distribution
• Enterprise grade, proven and
tested at scale
• Ecosystem endorsed to
ensure interoperability
Enterprise Readiness

6 Key Hadoop DATA TYPES
1. Sentiment
Understand how your customers feel about your
brand and products – right now
2. Clickstream
Capture and analyze website visitors’ data trails
and optimize your website
3. Sensor/Machine
Discover patterns in data streaming automatically
from remote sensors and machines
4. Geographic
Analyze location-based data to manage
operations where they occur
5. Server Logs
Research logs to diagnose process failures and
prevent security breaches
6. Text
Understand patterns in text across millions of web
pages, emails, and documents
Page
Value

Existing Data ArchitectureAPPLICATIONSDATASYSTEMS
TRADITIONAL REPOS
RDBMS EDW MP
P
DATASOURCES
OLTP, PO
S
SYSTEMS
OPERATIONAL
TOOLS
MANAGE &
MONITOR
Traditional Sources
(RDBMS, OLTP, OLAP)
DEV & DATA
TOOLS
BUILD &
TEST
Business
Analytics
Custom
Applications
Enterprise
Applications
Page 10

Next-Generation Data ArchitectureAPPLICATIONSDATASYSTEMS
TRADITIONAL REPOS
RDBMS EDW MP
P
DATASOURCES
OLTP, PO
S
SYSTEMS
OPERATIONAL
TOOLS
MANAGE &
MONITOR
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensors, social media)
DEV & DATA
TOOLS
BUILD &
TEST
Business
Analytics
Custom
Applications
Enterprise
Applications
ENTERPRISE
HADOOP PLATFORM
Page 11

Interoperating With Your Tools
Page 12
APPLICATIONSDATASYSTEMS
TRADITIONAL REPOS
DEV & DATA
TOOLS
OPERATIONAL
TOOLS
Viewpoint
Microsoft Applications
HORTONWORKS
DATA PLATFORM
DATASOURCES
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensors, social media)

Big Data
Transactions, Interactions, Observations
Hadoop Common Patterns of Use
Business Cases
HORTONWORKS
DATA PLATFORM
Refine Explore Enrich
Batch Interactive Online
“Right-time” Access to Data
Page 13

Operational Data RefineryDATASYSTEMSDATASOURCES
1
3
1 Capture
Process
Distribute & Retain
2
3
Refine Explore
Enric
h
2
APPLICATIONS
Transform & refine ALL
sources of data
Also known as Data
Reservoir or Catch Basin
TRADITIONAL REPOS
RDBMS EDW MPP
Business
Analytics
Custom
Applications
Enterprise
Applications
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensor data, social media)
Page 14
HORTONWORKS
DATA PLATFORM

Big Data Exploration & VisualizationDATASYSTEMSDATASOURCES
APPLICATIONS
Leverage “data lake”
to perform iterative
investigation for value
3
2
TRADITIONAL REPOS
RDBMS EDW MPP
1
Business
Analytics
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
Custom
Applications
Enterprise
Applications
1 Capture
Process
Explore & Visualize
2
3
Page 15
HORTONWORKS
DATA PLATFORM

DATASYSTEMSDATASOURCES
APPLICATIONS
Create intelligent
applications
Collect data, create
analytical models and
deliver to online apps
3
1
2
TRADITIONAL REPOS
RDBMS EDW MPP
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
Custom
Applications
Enterprise
Applications
NOSQL
1 Capture
Process & Compute
Deliver Model
2
3
Page 16
Application Enrichment
HORTONWORKS
DATA PLATFORM

Transferring Our Hadoop Expertise to You
The expert source for
Apache Hadoop training & certification
• World class training programs designed to
help you learn fast
– Role-based hands on classes with 50% lab time
• Expert consulting services
– Programs designed to transfer knowledge
• Industry leading Hadoop Sandbox program
– Fastest way to learn Apache Hadoop
– Multi-level tutorials for wide applicability
– Customizable and updateable
Page 17

Summary
• Leading the Innovation in Core Hadoop
• Addressing the requirements for Enterprise usage
• Enabling interoperability of the ecosystem
• No lock-in. 100% Open Source.
• Best in industry support with flexible pricing model
• Find out more
–www.hortonworks.com
–http://hortonworks.com/hadoop-training/
Page 18

Ben Marden - Making sense of Big Data

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (8)

Similar a Ben Marden - Making sense of Big Data

Similar a Ben Marden - Making sense of Big Data (20)

Último

Último (20)

Ben Marden - Making sense of Big Data

Notas del editor