SlideShare una empresa de Scribd logo
1 de 37
1© Copyright 2013 Pivotal. All rights reserved. 1© Copyright 2013 Pivotal. All rights reserved.
Hadoop: A
Foundation for
Change
Milind Bhandarkar
Chief Scientist, Pivotal
Twitter: @techmilind
2© Copyright 2013 Pivotal. All rights reserved.
About Me
 http://www.linkedin.com/in/milindb
 Founding member of Hadoop team at Yahoo! [2005-2010]
 Contributor to Apache Hadoop since v0.1
 Built and led Grid Solutions Team at Yahoo! [2007-2010]
 Parallel Programming Paradigms [1989-today] (PhD cs.illinois.edu)
 Center for Development of Advanced Computing (C-DAC), National
Center for Supercomputing Applications (NCSA), Center for Simulation of
Advanced Rockets, Siebel Systems, Pathscale Inc. (acquired by QLogic),
Yahoo!, LinkedIn, and Pivotal (formerly EMC-Greenplum)
3© Copyright 2013 Pivotal. All rights reserved.
First, technology is good. Then it gets
bad. Then it gets stable.
- Alistair Croll
(http://strata.oreilly.com/2013/01/data-warefare.html)
4© Copyright 2013 Pivotal. All rights reserved.
History (2003-2010)
5© Copyright 2013 Pivotal. All rights reserved.
Google Papers
6© Copyright 2013 Pivotal. All rights reserved.
Yahoo! Search
+
=
7© Copyright 2013 Pivotal. All rights reserved.
W-1-W
 WebMap : Graph processing for WWW
 Dreadnaught: Infrastructure for WebMap
 Juggernaut: Infrastructure for W-1-W
 JFS, JMR, Condor: Abandoned for Hadoop
8© Copyright 2013 Pivotal. All rights reserved.
Lucene, Nutch
9© Copyright 2013 Pivotal. All rights reserved.
Kryptonite
10© Copyright 2013 Pivotal. All rights reserved.
Lessons Learned
 Multi-Tenancy from ground-up
 Agility in lieu of Performance
 Provisioning vs Procurement
 “Weird” use cases as learning experience
 Academic collaboration
11© Copyright 2013 Pivotal. All rights reserved.
(From Hadoop Summit 2010)
Who Uses Hadoop ?
12© Copyright 2013 Pivotal. All rights reserved.
http://www.forbes.com/sites/davefeinleib/2012/06/19/the-big-data-landscape/
Big Data Landscape (June 2012)
13© Copyright 2013 Pivotal. All rights reserved.
http://www.datameer.com/blog/perspectives/hadoop-ecosystem-as-of-january-2013-now-an-app.html
Hadoop Ecosystem (January 2013)
14© Copyright 2013 Pivotal. All rights reserved.
15© Copyright 2013 Pivotal. All rights reserved.
16© Copyright 2013 Pivotal. All rights reserved.
17© Copyright 2013 Pivotal. All rights reserved.
Hadoop Economics is Game Changer
$-
$20,000
$40,000
$60,000
$80,000
2008 2009 2010 2011 2012 2013
Big Data Platform Price/TB
Big Data DB Hadoop
18© Copyright 2013 Pivotal. All rights reserved.
“Typical” Hadoop Use-Case
 “User” Modeling
 Objective: Determine User-Interests by mining user-
activities
 Large dimensionality of possible user activities
 Typical user has sparse activity vector
 Event attributes change over time
19© Copyright 2013 Pivotal. All rights reserved.
Domain: Retail
 User = Customer
 Activities
– Online: Purchase, Ad click, FB Likes
– Offline : Brick-and-mortar purchases, returns, coupon clipping,
gift cards
 Personalized Product Recommendation
20© Copyright 2013 Pivotal. All rights reserved.
Domain: IT Infrastructure
 “User” = HW & SW Components
 Activities
– Log messages, Metrics, connectivity, communication events
 Goal: Proactive alerting of imminent failures
21© Copyright 2013 Pivotal. All rights reserved.
Domain: Healthcare
 User = Patient
 Activities
– Doctor Visits, Medicine refills, Medical History
– 3G/WiFi-enabled Pillbox...
 Goal: Prevent Hospital Readmissions
22© Copyright 2013 Pivotal. All rights reserved.
Domain: Telecom
 User: Subscriber
 Activities
– Calls made, duration, calls dropped, locations, ...
– “social” graph, status updates
 Goal: Reduce customer churn
23© Copyright 2013 Pivotal. All rights reserved.
Domain: Ad-Supported Web
 User = User :-)
 Activities
– Clicks on content, Likes, Repost
– Search Queries, Comments, Participation
 Goal: Increase Engagement, Increase Clicks on
revenue-generating content (ads/premium content)
24© Copyright 2013 Pivotal. All rights reserved.
User-Modeling Pipeline
 Sessionization
 Feature and Target Generation
 Model Training
 Offline Scoring & Evaluation
 Batch Scoring & Upload to serving
25© Copyright 2013 Pivotal. All rights reserved.
What’s Next ?
26© Copyright 2013 Pivotal. All rights reserved.
Trough of Disillusionment ?
27© Copyright 2013 Pivotal. All rights reserved.
Or, Hadoop Everywhere ?
28© Copyright 2013 Pivotal. All rights reserved.
Storage Wars
 HDFS
 KosmosFS, LocalFS, Quantcast FS, S3
 MapR
 GPFS, Isilon, Atmos, Swift, NetApp
 Lustre, Gluster, Ceph, PanFS, PVFS
 EMC ViPR
29© Copyright 2013 Pivotal. All rights reserved.
NoSQL = Not Yet SQL ?
 Pivotal HAWQ
 Cloudera Impala
 Apache Drill, Spire (Drawn to Scale)
 Cascading Lingual, Optiq
 Hortonworks Stinger
 More to come....
30© Copyright 2013 Pivotal. All rights reserved.
Prepare for Convergence
 HPC: Cache Coherence, Prefetching, Zero-copy, Low-
contention locks
 “Big Data”: Caching, Mirroring, Sharding (various
flavors), relaxed consistency
 Databases: Indexing, MVCC, Columnar
storage/processing, Cost-based optimization
31© Copyright 2013 Pivotal. All rights reserved.
Convergence
 Resource Allocation, Scheduling, Lifecycle
Management
 Compute, Storage, and Communication isolation, Multi-
tenancy, Performance SLAs
 Auth & Auth, Data/System Provisioning and
Management, Monitoring, Metadata Management,
Metering
32© Copyright 2013 Pivotal. All rights reserved.
Hadoop As A Service
 Hadoop Platform-As-A-Service
– EMR competitor proliferation
– OpenStack, CloudStack, Joyent...
 Application-As-A-Service (Hadoop Inside)
– Cetas, Continuuity, Causata, Claritics, Tresata, Wibidata,…
 Pivotal One
– CloudFoundry, Hadoop, HAWQ, Analytics
– Spring, Redis, RabbitMQ
33© Copyright 2013 Pivotal. All rights reserved.
New Hardware Platforms
 Mellanox - Hadoop Acceleration through Network
Levitated Merge
 RoCE - Brocade, Cisco, Extreme, Arista...
 ARM - Low power Hadoop servers
 SSD - Velobit, Violin, FusionIO, Samsung..
 Niche - Compression, Encryption…
34© Copyright 2013 Pivotal. All rights reserved.
IAAS as the new Hardware
 AWS, GCE, Azure
 vSphere, OpenStack
 Easy Provisioning
 Scalable
 Elastic
 Ubiquitous
 Needs bundling with Data & Analytics as Services
35© Copyright 2013 Pivotal. All rights reserved.
Big Data Platform of Future ?
deploy
Public Cloud
Private Cloud
On Premise
36© Copyright 2013 Pivotal. All rights reserved.
Questions ?
A NEW PLATFORM FOR A NEW ERA

Más contenido relacionado

La actualidad más candente

Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
datasalt
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
m_hepburn
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
Hortonworks
 

La actualidad más candente (20)

Apache Spark and future of advanced analytics
Apache Spark and future of advanced analyticsApache Spark and future of advanced analytics
Apache Spark and future of advanced analytics
 
Solutions Linux 2013: Extracting value from Big Data through a new informatio...
Solutions Linux 2013: Extracting value from Big Data through a new informatio...Solutions Linux 2013: Extracting value from Big Data through a new informatio...
Solutions Linux 2013: Extracting value from Big Data through a new informatio...
 
Deep Learning with Cloudera
Deep Learning with ClouderaDeep Learning with Cloudera
Deep Learning with Cloudera
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
 
Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Yahoo Microstrategy 2008
Yahoo Microstrategy 2008
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Hadoop core concepts
Hadoop core conceptsHadoop core concepts
Hadoop core concepts
 
Big data edel
Big data edelBig data edel
Big data edel
 
Introduction To Big Data and Use Cases using Hadoop
Introduction To Big Data and Use Cases using HadoopIntroduction To Big Data and Use Cases using Hadoop
Introduction To Big Data and Use Cases using Hadoop
 
All data accessible to all my organization - Presentation at OW2con'19, June...
 All data accessible to all my organization - Presentation at OW2con'19, June... All data accessible to all my organization - Presentation at OW2con'19, June...
All data accessible to all my organization - Presentation at OW2con'19, June...
 
Big data 101
Big data 101Big data 101
Big data 101
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
 
Adam Fuchs' Accumulo Talk at NoSQL Now! 2013
Adam Fuchs' Accumulo Talk at NoSQL Now! 2013Adam Fuchs' Accumulo Talk at NoSQL Now! 2013
Adam Fuchs' Accumulo Talk at NoSQL Now! 2013
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Geode is Not a Cache, it's an Analytics Engine
Geode is Not a Cache, it's an Analytics EngineGeode is Not a Cache, it's an Analytics Engine
Geode is Not a Cache, it's an Analytics Engine
 

Destacado

Engagement slideshow final 6 4-2011
Engagement slideshow final 6 4-2011Engagement slideshow final 6 4-2011
Engagement slideshow final 6 4-2011
bryanbigos
 

Destacado (18)

Engagement slideshow final 6 4-2011
Engagement slideshow final 6 4-2011Engagement slideshow final 6 4-2011
Engagement slideshow final 6 4-2011
 
Unlocking Big Data through Analytics and Search - Big Data Cloud - June 3 Meetup
Unlocking Big Data through Analytics and Search - Big Data Cloud - June 3 MeetupUnlocking Big Data through Analytics and Search - Big Data Cloud - June 3 Meetup
Unlocking Big Data through Analytics and Search - Big Data Cloud - June 3 Meetup
 
Big Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & AppsBig Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & Apps
 
Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?
 
Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.
 
BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Ka...
BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Ka...BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Ka...
BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Ka...
 
Streak + Google Cloud Platform
Streak + Google Cloud PlatformStreak + Google Cloud Platform
Streak + Google Cloud Platform
 
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - ZettasetBig Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
 
Big Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud PlatformBig Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud Platform
 
Cloud Computing Services
Cloud Computing ServicesCloud Computing Services
Cloud Computing Services
 
Optimizing Bursty Hadoop on AWS - Big Data Cloud - June 3rd Meetup
Optimizing Bursty Hadoop on AWS - Big Data Cloud - June 3rd MeetupOptimizing Bursty Hadoop on AWS - Big Data Cloud - June 3rd Meetup
Optimizing Bursty Hadoop on AWS - Big Data Cloud - June 3rd Meetup
 
Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction System
 
Recommendation Engine Powered by Hadoop - Pranab Ghosh
Recommendation Engine Powered by Hadoop - Pranab GhoshRecommendation Engine Powered by Hadoop - Pranab Ghosh
Recommendation Engine Powered by Hadoop - Pranab Ghosh
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS
 
Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning PlatformsWebinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
 

Similar a Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal

Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
Fang Mac
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
inevitablecloud
 
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Sarah Aerni
 

Similar a Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal (20)

Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccion
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of HadoopBig Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life Revolution
 
Big Data
Big DataBig Data
Big Data
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
 
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
 
A Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber ThreatsA Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber Threats
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
 
Spark forplainoldjavageeks svforum_20140724
Spark forplainoldjavageeks svforum_20140724Spark forplainoldjavageeks svforum_20140724
Spark forplainoldjavageeks svforum_20140724
 

Más de BigDataCloud

Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
BigDataCloud
 
Recommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideRecommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural Guide
BigDataCloud
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud
 
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud
 

Más de BigDataCloud (9)

Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
 
Recommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideRecommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural Guide
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
 
What Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinWhat Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will Win
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
 
BigDataCloud Sept 8 2011 Meetup - Fail-Proofing Hadoop Clusters with Automati...
BigDataCloud Sept 8 2011 Meetup - Fail-Proofing Hadoop Clusters with Automati...BigDataCloud Sept 8 2011 Meetup - Fail-Proofing Hadoop Clusters with Automati...
BigDataCloud Sept 8 2011 Meetup - Fail-Proofing Hadoop Clusters with Automati...
 
BigDataCloud Sept 8 2011 Meetup - Big Data Analytics for DoddFrank Regulation...
BigDataCloud Sept 8 2011 Meetup - Big Data Analytics for DoddFrank Regulation...BigDataCloud Sept 8 2011 Meetup - Big Data Analytics for DoddFrank Regulation...
BigDataCloud Sept 8 2011 Meetup - Big Data Analytics for DoddFrank Regulation...
 
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal

  • 1. 1© Copyright 2013 Pivotal. All rights reserved. 1© Copyright 2013 Pivotal. All rights reserved. Hadoop: A Foundation for Change Milind Bhandarkar Chief Scientist, Pivotal Twitter: @techmilind
  • 2. 2© Copyright 2013 Pivotal. All rights reserved. About Me  http://www.linkedin.com/in/milindb  Founding member of Hadoop team at Yahoo! [2005-2010]  Contributor to Apache Hadoop since v0.1  Built and led Grid Solutions Team at Yahoo! [2007-2010]  Parallel Programming Paradigms [1989-today] (PhD cs.illinois.edu)  Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems, Pathscale Inc. (acquired by QLogic), Yahoo!, LinkedIn, and Pivotal (formerly EMC-Greenplum)
  • 3. 3© Copyright 2013 Pivotal. All rights reserved. First, technology is good. Then it gets bad. Then it gets stable. - Alistair Croll (http://strata.oreilly.com/2013/01/data-warefare.html)
  • 4. 4© Copyright 2013 Pivotal. All rights reserved. History (2003-2010)
  • 5. 5© Copyright 2013 Pivotal. All rights reserved. Google Papers
  • 6. 6© Copyright 2013 Pivotal. All rights reserved. Yahoo! Search + =
  • 7. 7© Copyright 2013 Pivotal. All rights reserved. W-1-W  WebMap : Graph processing for WWW  Dreadnaught: Infrastructure for WebMap  Juggernaut: Infrastructure for W-1-W  JFS, JMR, Condor: Abandoned for Hadoop
  • 8. 8© Copyright 2013 Pivotal. All rights reserved. Lucene, Nutch
  • 9. 9© Copyright 2013 Pivotal. All rights reserved. Kryptonite
  • 10. 10© Copyright 2013 Pivotal. All rights reserved. Lessons Learned  Multi-Tenancy from ground-up  Agility in lieu of Performance  Provisioning vs Procurement  “Weird” use cases as learning experience  Academic collaboration
  • 11. 11© Copyright 2013 Pivotal. All rights reserved. (From Hadoop Summit 2010) Who Uses Hadoop ?
  • 12. 12© Copyright 2013 Pivotal. All rights reserved. http://www.forbes.com/sites/davefeinleib/2012/06/19/the-big-data-landscape/ Big Data Landscape (June 2012)
  • 13. 13© Copyright 2013 Pivotal. All rights reserved. http://www.datameer.com/blog/perspectives/hadoop-ecosystem-as-of-january-2013-now-an-app.html Hadoop Ecosystem (January 2013)
  • 14. 14© Copyright 2013 Pivotal. All rights reserved.
  • 15. 15© Copyright 2013 Pivotal. All rights reserved.
  • 16. 16© Copyright 2013 Pivotal. All rights reserved.
  • 17. 17© Copyright 2013 Pivotal. All rights reserved. Hadoop Economics is Game Changer $- $20,000 $40,000 $60,000 $80,000 2008 2009 2010 2011 2012 2013 Big Data Platform Price/TB Big Data DB Hadoop
  • 18. 18© Copyright 2013 Pivotal. All rights reserved. “Typical” Hadoop Use-Case  “User” Modeling  Objective: Determine User-Interests by mining user- activities  Large dimensionality of possible user activities  Typical user has sparse activity vector  Event attributes change over time
  • 19. 19© Copyright 2013 Pivotal. All rights reserved. Domain: Retail  User = Customer  Activities – Online: Purchase, Ad click, FB Likes – Offline : Brick-and-mortar purchases, returns, coupon clipping, gift cards  Personalized Product Recommendation
  • 20. 20© Copyright 2013 Pivotal. All rights reserved. Domain: IT Infrastructure  “User” = HW & SW Components  Activities – Log messages, Metrics, connectivity, communication events  Goal: Proactive alerting of imminent failures
  • 21. 21© Copyright 2013 Pivotal. All rights reserved. Domain: Healthcare  User = Patient  Activities – Doctor Visits, Medicine refills, Medical History – 3G/WiFi-enabled Pillbox...  Goal: Prevent Hospital Readmissions
  • 22. 22© Copyright 2013 Pivotal. All rights reserved. Domain: Telecom  User: Subscriber  Activities – Calls made, duration, calls dropped, locations, ... – “social” graph, status updates  Goal: Reduce customer churn
  • 23. 23© Copyright 2013 Pivotal. All rights reserved. Domain: Ad-Supported Web  User = User :-)  Activities – Clicks on content, Likes, Repost – Search Queries, Comments, Participation  Goal: Increase Engagement, Increase Clicks on revenue-generating content (ads/premium content)
  • 24. 24© Copyright 2013 Pivotal. All rights reserved. User-Modeling Pipeline  Sessionization  Feature and Target Generation  Model Training  Offline Scoring & Evaluation  Batch Scoring & Upload to serving
  • 25. 25© Copyright 2013 Pivotal. All rights reserved. What’s Next ?
  • 26. 26© Copyright 2013 Pivotal. All rights reserved. Trough of Disillusionment ?
  • 27. 27© Copyright 2013 Pivotal. All rights reserved. Or, Hadoop Everywhere ?
  • 28. 28© Copyright 2013 Pivotal. All rights reserved. Storage Wars  HDFS  KosmosFS, LocalFS, Quantcast FS, S3  MapR  GPFS, Isilon, Atmos, Swift, NetApp  Lustre, Gluster, Ceph, PanFS, PVFS  EMC ViPR
  • 29. 29© Copyright 2013 Pivotal. All rights reserved. NoSQL = Not Yet SQL ?  Pivotal HAWQ  Cloudera Impala  Apache Drill, Spire (Drawn to Scale)  Cascading Lingual, Optiq  Hortonworks Stinger  More to come....
  • 30. 30© Copyright 2013 Pivotal. All rights reserved. Prepare for Convergence  HPC: Cache Coherence, Prefetching, Zero-copy, Low- contention locks  “Big Data”: Caching, Mirroring, Sharding (various flavors), relaxed consistency  Databases: Indexing, MVCC, Columnar storage/processing, Cost-based optimization
  • 31. 31© Copyright 2013 Pivotal. All rights reserved. Convergence  Resource Allocation, Scheduling, Lifecycle Management  Compute, Storage, and Communication isolation, Multi- tenancy, Performance SLAs  Auth & Auth, Data/System Provisioning and Management, Monitoring, Metadata Management, Metering
  • 32. 32© Copyright 2013 Pivotal. All rights reserved. Hadoop As A Service  Hadoop Platform-As-A-Service – EMR competitor proliferation – OpenStack, CloudStack, Joyent...  Application-As-A-Service (Hadoop Inside) – Cetas, Continuuity, Causata, Claritics, Tresata, Wibidata,…  Pivotal One – CloudFoundry, Hadoop, HAWQ, Analytics – Spring, Redis, RabbitMQ
  • 33. 33© Copyright 2013 Pivotal. All rights reserved. New Hardware Platforms  Mellanox - Hadoop Acceleration through Network Levitated Merge  RoCE - Brocade, Cisco, Extreme, Arista...  ARM - Low power Hadoop servers  SSD - Velobit, Violin, FusionIO, Samsung..  Niche - Compression, Encryption…
  • 34. 34© Copyright 2013 Pivotal. All rights reserved. IAAS as the new Hardware  AWS, GCE, Azure  vSphere, OpenStack  Easy Provisioning  Scalable  Elastic  Ubiquitous  Needs bundling with Data & Analytics as Services
  • 35. 35© Copyright 2013 Pivotal. All rights reserved. Big Data Platform of Future ? deploy Public Cloud Private Cloud On Premise
  • 36. 36© Copyright 2013 Pivotal. All rights reserved. Questions ?
  • 37. A NEW PLATFORM FOR A NEW ERA