Deutsche Telekom Perspective on HADOOP and Big Data Technologies

Deutsche Telekom Perspective on
HADOOP and Big Data Technologies
Gregory Smith
VP Solution Design and Emerging Technologies and Architectures
T-Systems North America
Gregory.Smith@t-systems.com

Deutsche Telekom and T-Systems Key Stats
 Deutsche Telekom is Europe’s largest telecom service provider
– Revenue: $75 billion
– Employees: 232,342
 T-Systems is the enterprise division of Deutsche Telekom
– Revenue: $13 billion
– Employees: 52,742
– Services: data center, end user computing, networking, systems
integration, cloud and big data
1

Overwhelmed by new data types?
2
Sentiment
data
Call detail records (CDRs)
Sensor- / machine-based data
Big Data
Transactions, Interactions, Observations
Clickstream
data

80% of new data in 2015 will land on Hadoop!
3
Hadoop is like a data warehouse,
but it can store more data, more kinds of data,
and perform more flexible analyses
Hadoop is open source
and runs on industry standard hardware,
so it's 1-2 orders of magnitude more economical
than conventional data warehouse solutions
Hadoop provides more cost effective storage, processing,
and analysis. Some existing workloads run faster, cheaper, better
Hadoop can deliver a foundation for profitable growth:
Gain value from all your data by asking bigger questions

Reference architecture view of Hadoop
4
Security
Operations
Infrastructure
Virtualization Compute / Storage / Network
WorkflowandSchedulingManagementandMonitoring
DataIsolationAccessManagementDataEncryption
Data
Integration
Data Processing
Batch Processing
Real Time/Stream
Processing
Search and Indexing
Application
Analytics Apps Transactional Apps
Analytics
Middleware
Presentation
Data Visualization and
Reporting
Clients
Real Time
Ingestion
Batch
Ingestion
Data
Connectors
Metadata
Services
Data Management
Distributed
Processing
(MapReduce)
Non-relational
DB
Structured
In-Memory
Distributed
Storage
(HDFS)
Hadoop Core
Hadoop Projects
Adjacent Categories

Example application landscape
ETL
Real Time
Streams
(Social,
sensors)
Structured and Unstructured Data
(HDFS, MAPR)
Real Time
Database
(Shark,
Gemfire, hBase,
Cassandra)
Interactive
Analytics
(Impala,
Greenplum,
AsterData,
Netezza…)
Batch
Processing
(Map-Reduce)
Real-Time
Processing
(s4, storm,
spark) Data Visualization
(Excel, Tableau)
(Informatica, Talend,
Spring Integration)
Compute Storage Networking
Cloud Infrastructure
HIVE
Machine Learning
(Mahout, etc…)
Source: Vmware

Disruptive innovations in Big Data
6
Traditional
Database
HADOOP
NoSQL
Database
MPP
Analytics
Data
Warehouse
Schema
Pre-defined, fixed
Required on write
Required on read
Store first, ask questions later
Processing
No or limited
data processing
Processing coupled with data
Parallel processing / scale
out
Data typesStructured Any, including unstructured
..
Physical
infrastructure
Enterprise grade
Mission critical
Commodity is an option
Much cheaper storage

Business
problem
Technology
Solution
Legacy BI
 Backward-looking
analysis
 Using data out of
business applications
 SAP Business Objects
 IBM Cognos
 MicroStrategy
 Structured
 Limited (2 – 3 TB in
RAM)
High Performance
BI
 Quasi-real-time
analysis
 Using data out of
business applications
 Oracle Exadata
 SAP HANA
 Structured
 Limited (2 – 8 TB in
RAM)
“Hadoop”
Ecosystem
 Forward-looking
predictive analysis
 Questions defined in
the moment, using
data from many
sources
 Hadoop distributions
 No ACID transactions
 Limited SQL Set (joins)
 Structured or
unstructured
 Unlimited (20 – 30 PB)
„True“ big data
Legacy vendor definition of big data
Selected Vendors
Data Type/Scalability
Innovations: Hadoop is 100x cheaper per TB
than in-memory appliances like HANA and
handles unstructured data as well

Innovations:
Store first, ask questions later
8
SAN Storage
3-5€/GB
Based on HDS
SAN Storage
NAS Filers
1-3€/GB
Based on Netapp
FAS-Series
White Box DAS1)
0.50-1.00€/GB
Hardware can be
self-assembled
Data Cloud1)
0.10-0.30€ /GB
Based on large
scale object
storage interfaces
Enterprise Class
Hadoop Storage
???€/GB
Based on Netapp
E-Series (NOSH)
1) Hadoop offers Storage + Compute (incl. search). Data Cloud offers Amazon S3 and native storage functions
? !Illustrative acquisition cost
Much cheaper storage
but not just storage…

Target use cases
9
IT Infrastructure
& Operations
Business
Intelligence &
Data Warehousing
Line of Business &
Business Analysts
CXO
Time to value
LongerShorter
Lower
Higher
Potential
value
 Lower Cost
Storage
 Enterprise
Data Lake
 Enterprise Data
Warehouse
Offload
 Enterprise Data
Warehouse
Archive
 ETL Offload
 Capacity Planning &
Utilization
 Customer Profiling &
Revenue Analytics
 Targeted Advertising
Analytics
 Service Renewal
Implementation
 CDR based Data
Analytics
 Fraud Management
 New
Business
Models
Cost effective storage,
processing, and analysis
Foundation for
profitable growth

Enterprise data warehouse offload use case
10
The Challenge
 Many EDWs are at capacity
 Running out of budget before
running out of relevant data
 Older data archived “in the dark”,
not available for exploration
The Solution
 Hadoop for data storage and
processing: parse, cleanse,
apply structure and transform
 Free EDW for valuable queries
 Retain all data for analysis!
Operational (44%)
ETL Processing (42%)
Analytics (11%)
DATA WAREHOUSE
Storage & Processing
HADOOP
Operational (50%)
Analytics (50%)
DATA WAREHOUSE
Cost is
1/10th

GOAL:
Platform that natively supports
mixed workloads as shared service
AVOID:
Systems separated by workload
type due to contention
From data puddles and ponds to lakes and oceans
Page 11
Big
Data
BU1
Big
Data
BU2
Big
Data
BU3
Big Data
Transactions, Interactions, Observations
Refine Explore Enrich
Batch Interactive Online

Questions to ask in designing a solution
for a particular business use case
 Which distribution is right for your needs today vs. tomorrow?
 Which distribution will ensure you stay on the main path of
open source innovation, vs. trap you in proprietary forks?
12
Security
Operations
Infrastructure
Data
Inte-
gra-
tion Data Processing
Application
Presentation
Data Management
Note: Distributions include more than just the Data Management layer but are discussed at this point in the presentation.
Not shown: Intel, Fujitsu and other distributions
 Widely adopted, mature distribution
 GTM partners include Oracle, HP, Dell, IBM
 Fully open source distribution (incl. management tools)
 Reputation for cost-effective licensing
 Strong developer ecosystem momentum
 GTM partners include Microsoft, Teradata, Informatica, Talend
 More proprietary distribution with features that appeal to some
business critical use cases
 GTM partner AWS (M3 and M5 versions only)
 Just announced by EMC, very early stage
 Differentiator is HAWQ – claims manifold query speed
improvement, full SQL instruction set

Common objections to Hadoop
13
We don’t have big
data problems
We don’t have
petabytes of data
We can’t justify
the budget for a
new project
We don’t have
the skills
We’re not sure
Hadoop is
mature/secure/
enterprise-ready
We already have a
scale-out strategy
for our EDW/ETL

MYTH:
Big Data means “Petabytes”
 Not just Volume
 Remember Variety, Velocity
 Plenty of issues at smaller scales
– Data processing
– Unstructured data
 Often warehouse volumes are small
because the technology is
expensive, not because there is no
relevant data
 Scalability is about growing with the
business, affordably and predictably
Every organization has data problems!
Hadoop can help…
14
MYTH:
Big Data means Data Science
 Hadoop solves existing problems
faster, better, cheaper than
conventional technology, e.g.
– Landing zone – capturing and
refining multi-structured data
types with unknown future value
– Cost effective platform for
retaining lots of data for long
periods of time
 Walk before you run
 Big Data Is a State of Mind

Waves of adoption – crossing the chasm
15
Wave 1
Batch Orientation
Wave 2
Interactive Orientation
Wave 3
Real-Time Orientation
 Mainstream,
70% of organizations
 Early adopters,
 Bleeding edge,
Adoption
today*
 Refine:
archival and
transformation
 Explore:
query and
visualization
 Enrich:
real-time decisions
Example use
cases
 Hour(s)  Minutes  SecondsResponse time
 Volume  VelocityData
characteristic
 EDW / RDBMS talk
to Hadoop
 Analytic apps talk
directly to Hadoop
 Derived data also
stored in Hadoop
Architectural
characteristic
 MapReduce, Pig,
Hive
 ODBC/JDBC, Hive  HBase, NoSQL,
SQL
Example
technologies
* Among organizations using Hadoop

Hadoop in a nutshell
 The Hadoop open source ecosystem delivers powerful innovation
in storage, databases and business intelligence, promising
unprecedented price / performance compared to existing
technologies.
 Hadoop is becoming an enterprise-wide landing zone for large
amounts of data. Increasingly it is also used to transform data.
 Large enterprises have realized substantial cost reductions by
offloading some enterprise data warehouse, ETL and archiving
workloads to a Hadoop cluster.
16

Challenges in the Enterprise
 Use-case identification and cost justification
 Cooperation and coordination from independent business units
 As Hadoop increases its footprint in business-critical areas, the
business will demand mature enterprise capabilities, e.g. DR,
snap-shots, etc.
 Hadoop’s disruptive approve is challenging strong legacy EDW
People, processes and technologies.
 Data harmonization is often a significant challenge.
 Fear of forking (think UNIX)
 Proprietary absorption (Borged)
 Audience: Hadoop address business problems, not IT problems
 Fear of data complexity (“I hated statistics class!”)
17

Questions?
gregory.smith@t-systems.com

Deutsche Telekom Perspective on HADOOP and Big Data Technologies

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Deutsche Telekom Perspective on HADOOP and Big Data Technologies

Similar a Deutsche Telekom Perspective on HADOOP and Big Data Technologies (20)

Más de DataWorks Summit

Más de DataWorks Summit (20)

Último

Último (20)

Deutsche Telekom Perspective on HADOOP and Big Data Technologies

Notas del editor