Más contenido relacionado La actualidad más candente (20) Similar a Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal (20) Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal1. 1© Copyright 2013 Pivotal. All rights reserved. 1© Copyright 2013 Pivotal. All rights reserved.
Hadoop: A
Foundation for
Change
Milind Bhandarkar
Chief Scientist, Pivotal
Twitter: @techmilind
2. 2© Copyright 2013 Pivotal. All rights reserved.
About Me
http://www.linkedin.com/in/milindb
Founding member of Hadoop team at Yahoo! [2005-2010]
Contributor to Apache Hadoop since v0.1
Built and led Grid Solutions Team at Yahoo! [2007-2010]
Parallel Programming Paradigms [1989-today] (PhD cs.illinois.edu)
Center for Development of Advanced Computing (C-DAC), National
Center for Supercomputing Applications (NCSA), Center for Simulation of
Advanced Rockets, Siebel Systems, Pathscale Inc. (acquired by QLogic),
Yahoo!, LinkedIn, and Pivotal (formerly EMC-Greenplum)
3. 3© Copyright 2013 Pivotal. All rights reserved.
First, technology is good. Then it gets
bad. Then it gets stable.
- Alistair Croll
(http://strata.oreilly.com/2013/01/data-warefare.html)
7. 7© Copyright 2013 Pivotal. All rights reserved.
W-1-W
WebMap : Graph processing for WWW
Dreadnaught: Infrastructure for WebMap
Juggernaut: Infrastructure for W-1-W
JFS, JMR, Condor: Abandoned for Hadoop
10. 10© Copyright 2013 Pivotal. All rights reserved.
Lessons Learned
Multi-Tenancy from ground-up
Agility in lieu of Performance
Provisioning vs Procurement
“Weird” use cases as learning experience
Academic collaboration
11. 11© Copyright 2013 Pivotal. All rights reserved.
(From Hadoop Summit 2010)
Who Uses Hadoop ?
12. 12© Copyright 2013 Pivotal. All rights reserved.
http://www.forbes.com/sites/davefeinleib/2012/06/19/the-big-data-landscape/
Big Data Landscape (June 2012)
13. 13© Copyright 2013 Pivotal. All rights reserved.
http://www.datameer.com/blog/perspectives/hadoop-ecosystem-as-of-january-2013-now-an-app.html
Hadoop Ecosystem (January 2013)
17. 17© Copyright 2013 Pivotal. All rights reserved.
Hadoop Economics is Game Changer
$-
$20,000
$40,000
$60,000
$80,000
2008 2009 2010 2011 2012 2013
Big Data Platform Price/TB
Big Data DB Hadoop
18. 18© Copyright 2013 Pivotal. All rights reserved.
“Typical” Hadoop Use-Case
“User” Modeling
Objective: Determine User-Interests by mining user-
activities
Large dimensionality of possible user activities
Typical user has sparse activity vector
Event attributes change over time
19. 19© Copyright 2013 Pivotal. All rights reserved.
Domain: Retail
User = Customer
Activities
– Online: Purchase, Ad click, FB Likes
– Offline : Brick-and-mortar purchases, returns, coupon clipping,
gift cards
Personalized Product Recommendation
20. 20© Copyright 2013 Pivotal. All rights reserved.
Domain: IT Infrastructure
“User” = HW & SW Components
Activities
– Log messages, Metrics, connectivity, communication events
Goal: Proactive alerting of imminent failures
21. 21© Copyright 2013 Pivotal. All rights reserved.
Domain: Healthcare
User = Patient
Activities
– Doctor Visits, Medicine refills, Medical History
– 3G/WiFi-enabled Pillbox...
Goal: Prevent Hospital Readmissions
22. 22© Copyright 2013 Pivotal. All rights reserved.
Domain: Telecom
User: Subscriber
Activities
– Calls made, duration, calls dropped, locations, ...
– “social” graph, status updates
Goal: Reduce customer churn
23. 23© Copyright 2013 Pivotal. All rights reserved.
Domain: Ad-Supported Web
User = User :-)
Activities
– Clicks on content, Likes, Repost
– Search Queries, Comments, Participation
Goal: Increase Engagement, Increase Clicks on
revenue-generating content (ads/premium content)
24. 24© Copyright 2013 Pivotal. All rights reserved.
User-Modeling Pipeline
Sessionization
Feature and Target Generation
Model Training
Offline Scoring & Evaluation
Batch Scoring & Upload to serving
28. 28© Copyright 2013 Pivotal. All rights reserved.
Storage Wars
HDFS
KosmosFS, LocalFS, Quantcast FS, S3
MapR
GPFS, Isilon, Atmos, Swift, NetApp
Lustre, Gluster, Ceph, PanFS, PVFS
EMC ViPR
29. 29© Copyright 2013 Pivotal. All rights reserved.
NoSQL = Not Yet SQL ?
Pivotal HAWQ
Cloudera Impala
Apache Drill, Spire (Drawn to Scale)
Cascading Lingual, Optiq
Hortonworks Stinger
More to come....
30. 30© Copyright 2013 Pivotal. All rights reserved.
Prepare for Convergence
HPC: Cache Coherence, Prefetching, Zero-copy, Low-
contention locks
“Big Data”: Caching, Mirroring, Sharding (various
flavors), relaxed consistency
Databases: Indexing, MVCC, Columnar
storage/processing, Cost-based optimization
31. 31© Copyright 2013 Pivotal. All rights reserved.
Convergence
Resource Allocation, Scheduling, Lifecycle
Management
Compute, Storage, and Communication isolation, Multi-
tenancy, Performance SLAs
Auth & Auth, Data/System Provisioning and
Management, Monitoring, Metadata Management,
Metering
32. 32© Copyright 2013 Pivotal. All rights reserved.
Hadoop As A Service
Hadoop Platform-As-A-Service
– EMR competitor proliferation
– OpenStack, CloudStack, Joyent...
Application-As-A-Service (Hadoop Inside)
– Cetas, Continuuity, Causata, Claritics, Tresata, Wibidata,…
Pivotal One
– CloudFoundry, Hadoop, HAWQ, Analytics
– Spring, Redis, RabbitMQ
33. 33© Copyright 2013 Pivotal. All rights reserved.
New Hardware Platforms
Mellanox - Hadoop Acceleration through Network
Levitated Merge
RoCE - Brocade, Cisco, Extreme, Arista...
ARM - Low power Hadoop servers
SSD - Velobit, Violin, FusionIO, Samsung..
Niche - Compression, Encryption…
34. 34© Copyright 2013 Pivotal. All rights reserved.
IAAS as the new Hardware
AWS, GCE, Azure
vSphere, OpenStack
Easy Provisioning
Scalable
Elastic
Ubiquitous
Needs bundling with Data & Analytics as Services
35. 35© Copyright 2013 Pivotal. All rights reserved.
Big Data Platform of Future ?
deploy
Public Cloud
Private Cloud
On Premise