2. Big Data – Volume, Velocity, Variety (& Value)
7.9 ZB by 2015 3x
more bits in digital
universe than stars in the
physical universe
450 Billion
Business transactions per day
by 2020 (IDC)
Therapies tailored to a persons genome
Decoding the human genome:
• From 10 years to hours
• On track to hit <$1000 per person
Explosive growth, 30 Tb/month billing
data
Radical overhaul of customer service:
• Self service, real time access
• 30x performance increase
$600 B
Potential value to
US healthcare
90% of Data
In the world was created in
the last 2 years.
100 years
Worth of video uploaded to
YouTube every 10 days
>5 Billion
People calling, texting,
tweeting & browsing on cell
phones
“In God we trust, all others bring data” — NASA, Johnson Space Center
How
Will
Businesses
Manage
a
50x
Data
Growth
by
2020
in
an
Affordable
Way?
3. MACHINE
GENERATED
HUMAN
GENERATED
BUSINESS
GENERATED
Sources of Big Data
EDGE
SCALE
UP
DISTRIBUTED
REQUIRES
DIFFERENT
APPROACHES
6. Big Data Use Cases Across Industries
EducaCon
Financial
Services
7. Telco- China Mobile Group Guangdong
Hadoop & Xeon optimized Big Data storage & analytics
• Challenge: Deliver real time access to Call Data
Records (CDR) for billing self service
• Solution: Chose Hadoop + Xeon over RDMS to
remove data access bottlenecks, increase storage,
and scale system
• Benefits: Lower TCO, 30x performance increase,
stable operation, analytics on subscriber usage for
targeted promotions
• Data Characteristics:
• 30TB billing data/month
• Real-time retrieval of 30 days CDRs
• 300k records/second, 800k insert speed/sec
• 15 analytics queries
Analy&cs
8. Government - Smart Traffic Intelligent Transport System
Hadoop for Predictive Analytics
Crime prevention, Info sharing & Predictive Traffic
Analytics
Machine Generated Data:
• Embedded HBase client in camera for real-time inserts of
structured/unstructured data
• 30000 + camera data collection points
• 2 billion HBase records
• Petabytes of traffic data
• Terabytes of images
• 1 week of Data mining
Results:
• Automated queries for traffic violation
• Crime Prevention: ID fake
• Licenses <1 minute
• Traffic Routing
App
Servers
Regional
Data
Collec&on
Distributed
Processing
Across
District
Nodes
Derived
Analy&cs
Services
Crime
PrevenCon
CiCzen
Traffic
Services
9. Options For Hadoop Deployment
On-Premise (or private
cloud)
• Limited scalability
• Internal IT resources
to manage cluster
• CapEx – HW, DC
space, power &
cooling
On AWS (public cloud)
• Scalability
• Flexibility
• Easy to deploy to
multiple locations
• Additional resources
on demand
• OpEx
Hybrid Cloud model
• Provides bursting capacity
• Flexibility
• Scalability
• IT still needs to manage on-
premise cluster
Security Is Addressed In All Models
10. “Where do I start…?”
1. What is your business problem?
2. Do you have a (lots of) data problem?
3. Will big data analytics work for my
business problem?
Speak To AWS Today!