SlideShare a Scribd company logo
1 of 23
Download to read offline
YAHOO &
    HADOOP
    USING AND IMPROVING
    APACHE HADOOP AT YAHOO!

                Eric Baldeschwieler
                VP, Hadoop Software




1                                     © 2011 IBM Corporation
AGENDA


             Brief Overview

             Hadoop @ Yahoo!

             Hadoop Momentum

             The Future of Hadoop




2                                   © 2011 IBM Corporation
                                                             2
what’s
        happening
                          - Big Data is here!
                          - unstructured data
                          - petabyte scale
                          - operationally critical




3   Flickr : sub_lime79                              © 2011 IBM Corporation
turning data
       into insights

            machine learning
    logic regression                            time series
          content clustering
          algorithms ad inventory modeling
                 user interest prediction
                                            factorization models
4   Flickr : NASA Goddard Photo and Video                          © 2011 IBM Corporation
making YAHOO
        relevant




5   Flickr : ogimogi   © 2011 IBM Corporation
hadoop:
        Powering
        Yahoo!
                     science + big data + insight =
                     personal relevance = VALUE




6   Flickr : DDFic                                    © 2011 IBM Corporation
WHAT IS HADOOP?
                                                                             Commodity
            Pig                             Hive                             •Computers
                                                                             •Network
                        MapReduce
                                                                             Focus on
                                                                             •Simplicity
                                                                             •Redundancy
                           HDFS
                                                                             •Scale
                                                                             •Availability


 Transforms commodity equipment into a service that:
 •HDFS – Stores peta bytes of data reliably
 •Map-Reduce – Allows huge distributed computations

  Key Attributes
  •Redundant and reliable – Doesn’t stop or loose data even as hardware fails
  •Easy to program – Our rocket scientists use it directly!
  •Very powerful – Allows the development of big data algorithms & tools                 7
7
  •Batch processing centric                                                © 2011 IBM Corporation
WHAT HADOOP ISN’T


     A replacement for relational and data
     warehouse systems
     A transactional / online / serving system
     A low latency or streaming solution




                                                               8
8                                                © 2011 IBM Corporation
HADOOP IN THE ENTERPRISE
                              Business Intelligence Applications




                     HADOOP
                    CLUSTER(S)                        RDMS         EDW        Data
                                                                              Marts




       Interactions                                 Transactions, Structured Data
       Semi-Structured or Un-Structured Data



    Web Logs, Server Logs,                              Business
    Social Media, etc…                                  Applications
9                                                                               © 2011 IBM Corporation   9
HADOOP @ YAHOO!




10                 © 2011 IBM Corporation
                                            10
HADOOP @
     YAHOO!
     “Where Science meets Data”
                                                                           PRODUCTS
                                                                           Data Analytics
                DIM
                   E   NS                                                  Content Optimization
                         ION
                            AL                                             Content Enrichment
                               D   ATA
                                                                           Yahoo! Mail Anti-Spam
             CO                                                            Advertising Products
               NT
                    EN
                       T                    HADOOP CLUSTERS                Ad Optimization
                                         Tens of thousands of servers      Ad Selection
                                                                           Big Data Processing & ETL


      DA
        TA
             PIP
                 ELI
                    NE
                         S




                                                                             APPLIED SCIENCE
Ter                                                                        User Interest Prediction
    ab
(com ytes /                                                                Ad inventory prediction
     pre Day                                                               Machine learning -
        sse
            d)                                                             search ranking
                                                                           Machine learning - ad
                                                                           targeting
                                                                           Machine learning - spam
                                                        10s of Petabytes   filtering
11                                                                                  © 2011 IBM Corporation
                                                                                                             11
FROM PROJECT TO
     CORE PLATFORM
                            90                                            250


                            80    40K+ Servers
                                  170 PB Storage                          200
                            70
                                  5M+ Monthly Jobs
                            60
     Thousands of Servers




                                                                          150
                            50




                                                                                Petabytes
                            40
                                                                          100

                            30


                            20
                                                                          50

                            10


                            0                                              0

                                 2006      2007      2008   2009   2010
12                                                                         © 2011 IBM Corporation
                                                                                                    12
HADOOP POWERS THE
     YAHOO! NETWORK



         advertising optimization data analytics
                machine learning search ranking
      advertising data systems   Yahoo! Mail anti-spam
       audience, ad and search pipelines          ad selection

      Yahoo! Homepage Content Optimization
                        ad inventory prediction
              user interest prediction

13                                                          © 2011 IBM Corporation
                                                                                     13
CASE STUDY
     YAHOO! HOMEPAGE



     Personalized
     for each visitor

twice the engagement
     Result:
     twice the engagement

                             Recommended links        News Interests          Top Searches

                            +79% clicks             +160% clicks            +43% clicks
                            vs. randomly selected   vs. one size fits all   vs. editor selected

14                                                                               © 2011 IBM Corporation
                                                                                                          14
CASE STUDY
     YAHOO! HOMEPAGE

 • Serving Maps                          SCIENCE       » Machine learning to build ever
       • Users - Interests                  HADOOP       better categorization models
                                            CLUSTER
 • Five Minute                USER                         CATEGORIZATION
   Production             BEHAVIOR                         MODELS (weekly)


 • Weekly                                PRODUCTION
   Categorization                            HADOOP    » Identify user interests using
                                             CLUSTER
   models                    SERVING                      Categorization models
                               MAPS
                    (every 5 minutes)
                                              USER
                                            BEHAVIOR



                       SERVING SYSTEMS                    ENGAGED USERS




     Build customized home pages with latest data (thousands / second)
15                                                                           © 2011 IBM Corporation
                                                                                                      15
CASE STUDY
     YAHOO! MAIL
                     Enabling quick response in the spam arms race




                                             • 450M mail boxes
                                             • 5B+ deliveries/day
           SCIENCE

                                             • Antispam models retrained
                                               every few hours on Hadoop


                                           “   40% less spam than
           PRODUCTION


                                               Hotmail and 55% less
                                                                     “
                                               spam than Gmail



16                                                                       © 2011 IBM Corporation
                                                                                                  16
YAHOO! & APACHE HADOOP
     Yahoo! has contributed 70+% of
     Apache Hadoop code to date
     Hadoop is not our business, but Hadoop is key to our business
     • Yahoo! benefits from open source eco-system around Hadoop
     • Hadoop drives revenue at Yahoo! by making our core products better

     We need Hadoop to be rock solid
     • We invest heavily in core Hadoop development
     • We focus on scalability, reliability, availability

     We fix bugs before you see them
     • We run very large clusters
     • We have a large QA effort
     • We run a huge variety of workloads

     We are good Apache Hadoop citizens
     • We contribute our work to Apache
17   • We share the exact code we run                                   © 2011 IBM Corporation
                                                                                                 17
HADOOP
     MOMENTUM




18              © 2011 IBM Corporation
                                         18
HADOOP IS GOING
     MAINSTREAM
     2007       2008   2009   2010




                                     The Datagraph Blog




19                               © 2011 IBM Corporation
                                                          19
THE PLATFORM EFFECT
     BIRTH OF AN ECOSYSTEM
                                          and other Early Adopters
                           Scale and productize Hadoop

       Apache Hadoop

             Enhance       Orgs with Internet Scale Problems
             Hadoop        Add tools / frameworks, enhance Hadoop
             Ecosystem




                           Service Providers
                           Grow ecosystem - Training, support, enhancements

Virtuous Circle!
• Investment -> Adoption
• Adoption -> Investment

                           Mainstream / Enterprise adoption
20
                           Drive further development, enhancements                           20
                                                                    © 2011 IBM Corporation
THE FUTURE OF
     HADOOP




21                   © 2011 IBM Corporation
                                              21
MAKING HADOOP ENTERPRISE-READY
     WHAT’S NEXT
     Hadoop is far from “done”
      • Current implementation is showing its age
      • Need to address several deficiencies in scalability,
        flexibility, ease of use & performance
     Yahoo! is working on Next Generation of Hadoop
      • MapReduce: Rewrite to improve performance;
        pluggable support for new programming models
      • HDFS: Adding volumes to improve scalability;
        Flush & sync support for applications that log to HDFS
     Apache should remain the hub of Hadoop ecosystem
      • Yahoo! contributes all Hadoop changes back to Apache
        Hadoop
      • Everyone benefits from shared neutral foundation

22                                                             © 2011 IBM Corporation
                                                                                        22
Questions?




23                © 2011 IBM Corporation
                                           23

More Related Content

What's hot

Nyc lunch and learn 03 15 2012 final
Nyc lunch and learn   03 15 2012 finalNyc lunch and learn   03 15 2012 final
Nyc lunch and learn 03 15 2012 finalInternap
 
Lax breakfast forum_developing_your_cloud_strategy_05_10_2012
Lax breakfast forum_developing_your_cloud_strategy_05_10_2012Lax breakfast forum_developing_your_cloud_strategy_05_10_2012
Lax breakfast forum_developing_your_cloud_strategy_05_10_2012Internap
 
Asyma E3 2012 - Impact of cloud computing - Robert Lavery
Asyma E3 2012 - Impact of cloud computing - Robert LaveryAsyma E3 2012 - Impact of cloud computing - Robert Lavery
Asyma E3 2012 - Impact of cloud computing - Robert Laveryasyma
 
Convergence of Cloud Computing & Project Management
Convergence of Cloud Computing & Project ManagementConvergence of Cloud Computing & Project Management
Convergence of Cloud Computing & Project ManagementVSR *
 
Cloudy with SaaS-Shine 18march2015
Cloudy with SaaS-Shine 18march2015Cloudy with SaaS-Shine 18march2015
Cloudy with SaaS-Shine 18march2015Simon Baker
 
IBM Cloud Service Provider Platform
IBM Cloud Service Provider PlatformIBM Cloud Service Provider Platform
IBM Cloud Service Provider PlatformHal Van Hercke
 
Cloud Computing - Beyond the Hype
Cloud Computing - Beyond the HypeCloud Computing - Beyond the Hype
Cloud Computing - Beyond the HypeRH
 
Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3Eric Rice
 
Indonesia new default short msp client presentation partnership with isv
Indonesia new default short msp client presentation   partnership with isvIndonesia new default short msp client presentation   partnership with isv
Indonesia new default short msp client presentation partnership with isvPandu W Sastrowardoyo
 
Success stories and recommendations from IBM clients
Success stories and  recommendations from IBM clientsSuccess stories and  recommendations from IBM clients
Success stories and recommendations from IBM clientsIBM India Smarter Computing
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forumbigdatawf
 
Cloud Clf 2011 12 Big Things To Know Idc Analysts 2011
Cloud Clf 2011 12 Big Things To Know Idc Analysts 2011Cloud Clf 2011 12 Big Things To Know Idc Analysts 2011
Cloud Clf 2011 12 Big Things To Know Idc Analysts 2011Job Voorhoeve
 
IBM Cloud Solutions Customer Deck
IBM Cloud Solutions Customer Deck IBM Cloud Solutions Customer Deck
IBM Cloud Solutions Customer Deck David Barry
 
UP2011 - Integrate the enterprise with the cloud
UP2011 - Integrate the enterprise with the cloudUP2011 - Integrate the enterprise with the cloud
UP2011 - Integrate the enterprise with the cloudWolfgang Schmidt
 
Mergers & Acquisitions
Mergers & AcquisitionsMergers & Acquisitions
Mergers & Acquisitionsdmurph4
 
Telecoms in the Clouds Issue 1
Telecoms in the Clouds Issue 1Telecoms in the Clouds Issue 1
Telecoms in the Clouds Issue 1Alan Quayle
 

What's hot (20)

Nyc lunch and learn 03 15 2012 final
Nyc lunch and learn   03 15 2012 finalNyc lunch and learn   03 15 2012 final
Nyc lunch and learn 03 15 2012 final
 
Al 2012 Impact of Cloud Computing on Business
Al 2012 Impact of Cloud Computing on BusinessAl 2012 Impact of Cloud Computing on Business
Al 2012 Impact of Cloud Computing on Business
 
Lax breakfast forum_developing_your_cloud_strategy_05_10_2012
Lax breakfast forum_developing_your_cloud_strategy_05_10_2012Lax breakfast forum_developing_your_cloud_strategy_05_10_2012
Lax breakfast forum_developing_your_cloud_strategy_05_10_2012
 
The Value of 'Cloud' in the Business Technology Ecosystem
The Value of 'Cloud' in the Business Technology EcosystemThe Value of 'Cloud' in the Business Technology Ecosystem
The Value of 'Cloud' in the Business Technology Ecosystem
 
Asyma E3 2012 - Impact of cloud computing - Robert Lavery
Asyma E3 2012 - Impact of cloud computing - Robert LaveryAsyma E3 2012 - Impact of cloud computing - Robert Lavery
Asyma E3 2012 - Impact of cloud computing - Robert Lavery
 
Convergence of Cloud Computing & Project Management
Convergence of Cloud Computing & Project ManagementConvergence of Cloud Computing & Project Management
Convergence of Cloud Computing & Project Management
 
Capturing The Potential Of Cloud
Capturing The Potential Of CloudCapturing The Potential Of Cloud
Capturing The Potential Of Cloud
 
Cloudy with SaaS-Shine 18march2015
Cloudy with SaaS-Shine 18march2015Cloudy with SaaS-Shine 18march2015
Cloudy with SaaS-Shine 18march2015
 
IBM Cloud Service Provider Platform
IBM Cloud Service Provider PlatformIBM Cloud Service Provider Platform
IBM Cloud Service Provider Platform
 
Cloud Computing - Beyond the Hype
Cloud Computing - Beyond the HypeCloud Computing - Beyond the Hype
Cloud Computing - Beyond the Hype
 
Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3Cloud Innovation Day - Commonwealth of PA v11.3
Cloud Innovation Day - Commonwealth of PA v11.3
 
Indonesia new default short msp client presentation partnership with isv
Indonesia new default short msp client presentation   partnership with isvIndonesia new default short msp client presentation   partnership with isv
Indonesia new default short msp client presentation partnership with isv
 
Success stories and recommendations from IBM clients
Success stories and  recommendations from IBM clientsSuccess stories and  recommendations from IBM clients
Success stories and recommendations from IBM clients
 
Sukhbir jasuja digital_trends_11
Sukhbir jasuja digital_trends_11Sukhbir jasuja digital_trends_11
Sukhbir jasuja digital_trends_11
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forum
 
Cloud Clf 2011 12 Big Things To Know Idc Analysts 2011
Cloud Clf 2011 12 Big Things To Know Idc Analysts 2011Cloud Clf 2011 12 Big Things To Know Idc Analysts 2011
Cloud Clf 2011 12 Big Things To Know Idc Analysts 2011
 
IBM Cloud Solutions Customer Deck
IBM Cloud Solutions Customer Deck IBM Cloud Solutions Customer Deck
IBM Cloud Solutions Customer Deck
 
UP2011 - Integrate the enterprise with the cloud
UP2011 - Integrate the enterprise with the cloudUP2011 - Integrate the enterprise with the cloud
UP2011 - Integrate the enterprise with the cloud
 
Mergers & Acquisitions
Mergers & AcquisitionsMergers & Acquisitions
Mergers & Acquisitions
 
Telecoms in the Clouds Issue 1
Telecoms in the Clouds Issue 1Telecoms in the Clouds Issue 1
Telecoms in the Clouds Issue 1
 

Similar to Yahoo & Hadoop

Couchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = ThreeCouchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = ThreeDipti Borkar
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsHortonworks
 
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationDataWorks Summit
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
Sap Bi OnDemand Overview
Sap Bi OnDemand OverviewSap Bi OnDemand Overview
Sap Bi OnDemand OverviewJohnMeadows_SAP
 
EMC Forum India 2011, Day 2 - Welcome Note by Manoj Chugh
EMC Forum India 2011, Day 2 - Welcome Note by Manoj ChughEMC Forum India 2011, Day 2 - Welcome Note by Manoj Chugh
EMC Forum India 2011, Day 2 - Welcome Note by Manoj ChughEMC Forum India
 
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)Will Gardella
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetupRoby Chen
 
Research ON Big Data
Research ON Big DataResearch ON Big Data
Research ON Big Datamysqlops
 
Research on big data
Research on big dataResearch on big data
Research on big dataRoby Chen
 
Enterprise Integration of Disruptive Technologies
Enterprise Integration of Disruptive TechnologiesEnterprise Integration of Disruptive Technologies
Enterprise Integration of Disruptive TechnologiesDataWorks Summit
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondTeradata Aster
 
hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohugAdam Muise
 
Netapp Evento Virtual Business Breakfast 20110616
Netapp Evento  Virtual  Business  Breakfast 20110616Netapp Evento  Virtual  Business  Breakfast 20110616
Netapp Evento Virtual Business Breakfast 20110616Bruno Banha
 
Bb3061 bess systems of record sv
Bb3061 bess systems of record svBb3061 bess systems of record sv
Bb3061 bess systems of record svCharlie Bess
 
Scalability and Availability - Without Compromise
Scalability and Availability - Without CompromiseScalability and Availability - Without Compromise
Scalability and Availability - Without CompromiseBjorn Andersson
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformHortonworks
 

Similar to Yahoo & Hadoop (20)

hadoop @ Ibmbigdata
hadoop @ Ibmbigdatahadoop @ Ibmbigdata
hadoop @ Ibmbigdata
 
Accelerate Return on Data
Accelerate Return on DataAccelerate Return on Data
Accelerate Return on Data
 
Couchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = ThreeCouchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = Three
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
 
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integration
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Sap Bi OnDemand Overview
Sap Bi OnDemand OverviewSap Bi OnDemand Overview
Sap Bi OnDemand Overview
 
EMC Forum India 2011, Day 2 - Welcome Note by Manoj Chugh
EMC Forum India 2011, Day 2 - Welcome Note by Manoj ChughEMC Forum India 2011, Day 2 - Welcome Note by Manoj Chugh
EMC Forum India 2011, Day 2 - Welcome Note by Manoj Chugh
 
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
 
Research ON Big Data
Research ON Big DataResearch ON Big Data
Research ON Big Data
 
Research on big data
Research on big dataResearch on big data
Research on big data
 
Enterprise Integration of Disruptive Technologies
Enterprise Integration of Disruptive TechnologiesEnterprise Integration of Disruptive Technologies
Enterprise Integration of Disruptive Technologies
 
2012 06 hortonworks paris hug
2012 06 hortonworks paris hug2012 06 hortonworks paris hug
2012 06 hortonworks paris hug
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
 
hadoop 101 aug 21 2012 tohug
 hadoop 101 aug 21 2012 tohug hadoop 101 aug 21 2012 tohug
hadoop 101 aug 21 2012 tohug
 
Netapp Evento Virtual Business Breakfast 20110616
Netapp Evento  Virtual  Business  Breakfast 20110616Netapp Evento  Virtual  Business  Breakfast 20110616
Netapp Evento Virtual Business Breakfast 20110616
 
Bb3061 bess systems of record sv
Bb3061 bess systems of record svBb3061 bess systems of record sv
Bb3061 bess systems of record sv
 
Scalability and Availability - Without Compromise
Scalability and Availability - Without CompromiseScalability and Availability - Without Compromise
Scalability and Availability - Without Compromise
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data Platform
 

More from Mauricio Godoy

Pund-IT: Getting Things Right—Software and IBM’s Acquisition Strategy
Pund-IT: Getting Things Right—Software and IBM’s Acquisition StrategyPund-IT: Getting Things Right—Software and IBM’s Acquisition Strategy
Pund-IT: Getting Things Right—Software and IBM’s Acquisition StrategyMauricio Godoy
 
BusinessWeek: The Presentation Secrets of Steve Jobs
BusinessWeek: The Presentation Secrets of Steve JobsBusinessWeek: The Presentation Secrets of Steve Jobs
BusinessWeek: The Presentation Secrets of Steve JobsMauricio Godoy
 
Mdr cloud 040611_v4_final
Mdr cloud 040611_v4_finalMdr cloud 040611_v4_final
Mdr cloud 040611_v4_finalMauricio Godoy
 
Ibm cloud forum managing heterogenousclouds_final
Ibm cloud forum managing heterogenousclouds_finalIbm cloud forum managing heterogenousclouds_final
Ibm cloud forum managing heterogenousclouds_finalMauricio Godoy
 
Cloud forum 2011 s poulley keynote v10
Cloud forum 2011 s poulley keynote v10Cloud forum 2011 s poulley keynote v10
Cloud forum 2011 s poulley keynote v10Mauricio Godoy
 
Cloud forum 2011 s poulley keynote v10
Cloud forum 2011 s poulley keynote v10Cloud forum 2011 s poulley keynote v10
Cloud forum 2011 s poulley keynote v10Mauricio Godoy
 
Cloud forum-lessons-learned-20110405c-final
Cloud forum-lessons-learned-20110405c-finalCloud forum-lessons-learned-20110405c-final
Cloud forum-lessons-learned-20110405c-finalMauricio Godoy
 
Ibm cloud forum april - blue insight final
Ibm cloud forum  april - blue insight finalIbm cloud forum  april - blue insight final
Ibm cloud forum april - blue insight finalMauricio Godoy
 
Security cloud forum_2011
Security cloud forum_2011Security cloud forum_2011
Security cloud forum_2011Mauricio Godoy
 
Cloud forum platform - from sap to new applications final a
Cloud forum   platform - from sap to new applications final aCloud forum   platform - from sap to new applications final a
Cloud forum platform - from sap to new applications final aMauricio Godoy
 
Marie and Beth AR Presentation - IMPACT
Marie and Beth AR Presentation - IMPACTMarie and Beth AR Presentation - IMPACT
Marie and Beth AR Presentation - IMPACTMauricio Godoy
 
Marie and Beth AR Presentation
Marie and Beth AR PresentationMarie and Beth AR Presentation
Marie and Beth AR PresentationMauricio Godoy
 
Welcome letter from phil gilbert with list of bpm customer speakers
Welcome letter from phil gilbert with list of bpm customer speakersWelcome letter from phil gilbert with list of bpm customer speakers
Welcome letter from phil gilbert with list of bpm customer speakersMauricio Godoy
 
Ibm smarter commerce announcement industry analyst march 10
Ibm smarter commerce announcement industry analyst  march 10Ibm smarter commerce announcement industry analyst  march 10
Ibm smarter commerce announcement industry analyst march 10Mauricio Godoy
 
Smart commerce brochure_3.24.11.final
Smart commerce brochure_3.24.11.finalSmart commerce brochure_3.24.11.final
Smart commerce brochure_3.24.11.finalMauricio Godoy
 
Smart commerce brochure_3.24.11.final
Smart commerce brochure_3.24.11.finalSmart commerce brochure_3.24.11.final
Smart commerce brochure_3.24.11.finalMauricio Godoy
 
Ibm smarter commerce announcement industry analyst march 10
Ibm smarter commerce announcement industry analyst  march 10Ibm smarter commerce announcement industry analyst  march 10
Ibm smarter commerce announcement industry analyst march 10Mauricio Godoy
 
Jan Jackman Cloud as a Platform for Business Innovation and Growth
Jan Jackman   Cloud as a Platform for Business Innovation and GrowthJan Jackman   Cloud as a Platform for Business Innovation and Growth
Jan Jackman Cloud as a Platform for Business Innovation and GrowthMauricio Godoy
 

More from Mauricio Godoy (20)

Pund-IT: Getting Things Right—Software and IBM’s Acquisition Strategy
Pund-IT: Getting Things Right—Software and IBM’s Acquisition StrategyPund-IT: Getting Things Right—Software and IBM’s Acquisition Strategy
Pund-IT: Getting Things Right—Software and IBM’s Acquisition Strategy
 
BusinessWeek: The Presentation Secrets of Steve Jobs
BusinessWeek: The Presentation Secrets of Steve JobsBusinessWeek: The Presentation Secrets of Steve Jobs
BusinessWeek: The Presentation Secrets of Steve Jobs
 
Mdr cloud 040611_v4_final
Mdr cloud 040611_v4_finalMdr cloud 040611_v4_final
Mdr cloud 040611_v4_final
 
Ibm cloud forum managing heterogenousclouds_final
Ibm cloud forum managing heterogenousclouds_finalIbm cloud forum managing heterogenousclouds_final
Ibm cloud forum managing heterogenousclouds_final
 
Cloud forum 2011 s poulley keynote v10
Cloud forum 2011 s poulley keynote v10Cloud forum 2011 s poulley keynote v10
Cloud forum 2011 s poulley keynote v10
 
Cloud forum 2011 s poulley keynote v10
Cloud forum 2011 s poulley keynote v10Cloud forum 2011 s poulley keynote v10
Cloud forum 2011 s poulley keynote v10
 
Cloud forum-lessons-learned-20110405c-final
Cloud forum-lessons-learned-20110405c-finalCloud forum-lessons-learned-20110405c-final
Cloud forum-lessons-learned-20110405c-final
 
Ibm cloud forum april - blue insight final
Ibm cloud forum  april - blue insight finalIbm cloud forum  april - blue insight final
Ibm cloud forum april - blue insight final
 
Security cloud forum_2011
Security cloud forum_2011Security cloud forum_2011
Security cloud forum_2011
 
Cloud forum platform - from sap to new applications final a
Cloud forum   platform - from sap to new applications final aCloud forum   platform - from sap to new applications final a
Cloud forum platform - from sap to new applications final a
 
Press releases
Press releasesPress releases
Press releases
 
Cloud Update
Cloud UpdateCloud Update
Cloud Update
 
Marie and Beth AR Presentation - IMPACT
Marie and Beth AR Presentation - IMPACTMarie and Beth AR Presentation - IMPACT
Marie and Beth AR Presentation - IMPACT
 
Marie and Beth AR Presentation
Marie and Beth AR PresentationMarie and Beth AR Presentation
Marie and Beth AR Presentation
 
Welcome letter from phil gilbert with list of bpm customer speakers
Welcome letter from phil gilbert with list of bpm customer speakersWelcome letter from phil gilbert with list of bpm customer speakers
Welcome letter from phil gilbert with list of bpm customer speakers
 
Ibm smarter commerce announcement industry analyst march 10
Ibm smarter commerce announcement industry analyst  march 10Ibm smarter commerce announcement industry analyst  march 10
Ibm smarter commerce announcement industry analyst march 10
 
Smart commerce brochure_3.24.11.final
Smart commerce brochure_3.24.11.finalSmart commerce brochure_3.24.11.final
Smart commerce brochure_3.24.11.final
 
Smart commerce brochure_3.24.11.final
Smart commerce brochure_3.24.11.finalSmart commerce brochure_3.24.11.final
Smart commerce brochure_3.24.11.final
 
Ibm smarter commerce announcement industry analyst march 10
Ibm smarter commerce announcement industry analyst  march 10Ibm smarter commerce announcement industry analyst  march 10
Ibm smarter commerce announcement industry analyst march 10
 
Jan Jackman Cloud as a Platform for Business Innovation and Growth
Jan Jackman   Cloud as a Platform for Business Innovation and GrowthJan Jackman   Cloud as a Platform for Business Innovation and Growth
Jan Jackman Cloud as a Platform for Business Innovation and Growth
 

Recently uploaded

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Recently uploaded (20)

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Yahoo & Hadoop

  • 1. YAHOO & HADOOP USING AND IMPROVING APACHE HADOOP AT YAHOO! Eric Baldeschwieler VP, Hadoop Software 1 © 2011 IBM Corporation
  • 2. AGENDA Brief Overview Hadoop @ Yahoo! Hadoop Momentum The Future of Hadoop 2 © 2011 IBM Corporation 2
  • 3. what’s happening - Big Data is here! - unstructured data - petabyte scale - operationally critical 3 Flickr : sub_lime79 © 2011 IBM Corporation
  • 4. turning data into insights machine learning logic regression time series content clustering algorithms ad inventory modeling user interest prediction factorization models 4 Flickr : NASA Goddard Photo and Video © 2011 IBM Corporation
  • 5. making YAHOO relevant 5 Flickr : ogimogi © 2011 IBM Corporation
  • 6. hadoop: Powering Yahoo! science + big data + insight = personal relevance = VALUE 6 Flickr : DDFic © 2011 IBM Corporation
  • 7. WHAT IS HADOOP? Commodity Pig Hive •Computers •Network MapReduce Focus on •Simplicity •Redundancy HDFS •Scale •Availability Transforms commodity equipment into a service that: •HDFS – Stores peta bytes of data reliably •Map-Reduce – Allows huge distributed computations Key Attributes •Redundant and reliable – Doesn’t stop or loose data even as hardware fails •Easy to program – Our rocket scientists use it directly! •Very powerful – Allows the development of big data algorithms & tools 7 7 •Batch processing centric © 2011 IBM Corporation
  • 8. WHAT HADOOP ISN’T A replacement for relational and data warehouse systems A transactional / online / serving system A low latency or streaming solution 8 8 © 2011 IBM Corporation
  • 9. HADOOP IN THE ENTERPRISE Business Intelligence Applications HADOOP CLUSTER(S) RDMS EDW Data Marts Interactions Transactions, Structured Data Semi-Structured or Un-Structured Data Web Logs, Server Logs, Business Social Media, etc… Applications 9 © 2011 IBM Corporation 9
  • 10. HADOOP @ YAHOO! 10 © 2011 IBM Corporation 10
  • 11. HADOOP @ YAHOO! “Where Science meets Data” PRODUCTS Data Analytics DIM E NS Content Optimization ION AL Content Enrichment D ATA Yahoo! Mail Anti-Spam CO Advertising Products NT EN T HADOOP CLUSTERS Ad Optimization Tens of thousands of servers Ad Selection Big Data Processing & ETL DA TA PIP ELI NE S APPLIED SCIENCE Ter User Interest Prediction ab (com ytes / Ad inventory prediction pre Day Machine learning - sse d) search ranking Machine learning - ad targeting Machine learning - spam 10s of Petabytes filtering 11 © 2011 IBM Corporation 11
  • 12. FROM PROJECT TO CORE PLATFORM 90 250 80 40K+ Servers 170 PB Storage 200 70 5M+ Monthly Jobs 60 Thousands of Servers 150 50 Petabytes 40 100 30 20 50 10 0 0 2006 2007 2008 2009 2010 12 © 2011 IBM Corporation 12
  • 13. HADOOP POWERS THE YAHOO! NETWORK advertising optimization data analytics machine learning search ranking advertising data systems Yahoo! Mail anti-spam audience, ad and search pipelines ad selection Yahoo! Homepage Content Optimization ad inventory prediction user interest prediction 13 © 2011 IBM Corporation 13
  • 14. CASE STUDY YAHOO! HOMEPAGE Personalized for each visitor twice the engagement Result: twice the engagement Recommended links News Interests Top Searches +79% clicks +160% clicks +43% clicks vs. randomly selected vs. one size fits all vs. editor selected 14 © 2011 IBM Corporation 14
  • 15. CASE STUDY YAHOO! HOMEPAGE • Serving Maps SCIENCE » Machine learning to build ever • Users - Interests HADOOP better categorization models CLUSTER • Five Minute USER CATEGORIZATION Production BEHAVIOR MODELS (weekly) • Weekly PRODUCTION Categorization HADOOP » Identify user interests using CLUSTER models SERVING Categorization models MAPS (every 5 minutes) USER BEHAVIOR SERVING SYSTEMS ENGAGED USERS Build customized home pages with latest data (thousands / second) 15 © 2011 IBM Corporation 15
  • 16. CASE STUDY YAHOO! MAIL Enabling quick response in the spam arms race • 450M mail boxes • 5B+ deliveries/day SCIENCE • Antispam models retrained every few hours on Hadoop “ 40% less spam than PRODUCTION Hotmail and 55% less “ spam than Gmail 16 © 2011 IBM Corporation 16
  • 17. YAHOO! & APACHE HADOOP Yahoo! has contributed 70+% of Apache Hadoop code to date Hadoop is not our business, but Hadoop is key to our business • Yahoo! benefits from open source eco-system around Hadoop • Hadoop drives revenue at Yahoo! by making our core products better We need Hadoop to be rock solid • We invest heavily in core Hadoop development • We focus on scalability, reliability, availability We fix bugs before you see them • We run very large clusters • We have a large QA effort • We run a huge variety of workloads We are good Apache Hadoop citizens • We contribute our work to Apache 17 • We share the exact code we run © 2011 IBM Corporation 17
  • 18. HADOOP MOMENTUM 18 © 2011 IBM Corporation 18
  • 19. HADOOP IS GOING MAINSTREAM 2007 2008 2009 2010 The Datagraph Blog 19 © 2011 IBM Corporation 19
  • 20. THE PLATFORM EFFECT BIRTH OF AN ECOSYSTEM and other Early Adopters Scale and productize Hadoop Apache Hadoop Enhance Orgs with Internet Scale Problems Hadoop Add tools / frameworks, enhance Hadoop Ecosystem Service Providers Grow ecosystem - Training, support, enhancements Virtuous Circle! • Investment -> Adoption • Adoption -> Investment Mainstream / Enterprise adoption 20 Drive further development, enhancements 20 © 2011 IBM Corporation
  • 21. THE FUTURE OF HADOOP 21 © 2011 IBM Corporation 21
  • 22. MAKING HADOOP ENTERPRISE-READY WHAT’S NEXT Hadoop is far from “done” • Current implementation is showing its age • Need to address several deficiencies in scalability, flexibility, ease of use & performance Yahoo! is working on Next Generation of Hadoop • MapReduce: Rewrite to improve performance; pluggable support for new programming models • HDFS: Adding volumes to improve scalability; Flush & sync support for applications that log to HDFS Apache should remain the hub of Hadoop ecosystem • Yahoo! contributes all Hadoop changes back to Apache Hadoop • Everyone benefits from shared neutral foundation 22 © 2011 IBM Corporation 22
  • 23. Questions? 23 © 2011 IBM Corporation 23