SlideShare una empresa de Scribd logo
1 de 22
Leveraging Open Source Big Data Stack
                                              Prasanth M Sasidharan




                 Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012
What is data?
       Data is Information in raw or unorganized form such as alphabets,
        numbers, or symbols


What is Big data?
        Big Data refers to large datasets which are difficult to store, manage and
         analyze

        Everyday, we create 2.5 trillion bytes of data–so much that 90% of the
         data in the world today has been created in the last two years alone.




                   Copyright © 2011 Flytxt B.V. All rights reserved         1/16/2012   2
Data Explosion !
Global Data Trends




            Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   4
Big Data & Distributed Computing
    Multiple servers, each working on part of job, each doing same task .
    Key Challenges:
       • Work distribution and orchestration
       • Error recovery
       • Scalability and management




                 Copyright © 2011 Flytxt B.V. All rights reserved      1/16/2012   5
FOSS in Aadhar
    Aadhaar is a 12-digit unique number which the Unique Identification Authority
    of India (UIDAI) will issue for all residents in India

    The number will be stored in a centralized database and linked to the basic
    demographics and biometric information – photograph, ten fingerprints and iris
    – of each individual.

    It is unique and robust enough to eliminate the large number of duplicate and
    fake identities in government and private databases




               Copyright © 2011 Flytxt B.V. All rights reserved           1/16/2012   6
Lets Meet a Stack!




 Application Layer




 Infrastructure
 Layer




                  Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   7
Infrastructure for Big Data Analysis
     What’s Virtualization?
                  Virtualization allows multiple operating system instances to
      run concurrently on a single computer; it is a means of separating
      hardware from a single operating system.




                  Copyright © 2011 Flytxt B.V. All rights reserved      1/16/2012   8
What’s Hypervisor?
   ◦ Also called virtual machine manager (VMM), is one of many hardware
     virtualization techniques allowing multiple operating systems, termed guests, to
     run concurrently on a host computer

   ◦ Originally developed in the 1970s as part of the IBM S/360




   Xen® hypervisor




                Copyright © 2011 Flytxt B.V. All rights reserved           1/16/2012    9
Advantages of FOSS

   Flexibility and Freedom



   Reliability


   Auditability


   Fast Deployment



   Cost




                   Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   10
Cost For Reproducing YouTube

                             Capital Expenditures                            Ann Expenses,ex HW Support
                                     ($M)                                               ($M)


   System        Hardware                  Software                 Total     Staff    Support      Total

Oracle Exadata     $147.4                    $442.0                 $589.4    $1.6      $97.4      $99.0
  Alternative
 openSource,
 commodity
   hardware        $104.2                      $0.0                 $104.2    $2.2      $12.9      $15.1




                 Copyright © 2011 Flytxt B.V. All rights reserved                                1/16/2012   11
Get Involved!
  Find out about Apache projects (http://projects.apache.org/
  Join mailing lists
  Pick up a Bug
  Suggest ideas or Fixes
  Checkout the latest code / Download releases
  Change the sourcefiles to incorporate your change or addition
  Provide appropriate source code documentation and follow project's
   coding conventions.
  Check Whether the software still compiles and runs correctly
  Run any unit or regression tests the software may have
  Send the patch for Review & committing


                Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   12
Notable Users of Hadoop
(Source: http://en.wikipedia.org/wiki/Hadoop)

    •    Adobe                                                                •   Meebo
    •    Amazon                                                               •   The New York Times
    •    AOL                                                                  •   Rackspace
    •    eBay                                                                 •   StumbleUpon
    •    Facebook                                                             •   Twitter
    •    Fox Interactive Media                                                •   Yahoo
    •    IBM
    •    Last.fm
    •    LinkedIn

References
        • Hadoop: The Definitive Guide-MapReduce for the Cloud

        • HBase: The Definitive Guide

        • Hive Wiki (http://wiki.apache.org/hadoop/Hive)

        • Pig Wiki (http://wiki.apache.org/pig/)



                           Copyright © 2011 Flytxt B.V. All rights reserved                        1/16/2012   13
Open Source Initiatives @ FlyTXT
    Customization Specific to our business lines

    Mahout Enhancements for additional Machine Learning Algorithms

    Hive Customization

    Oozie Enhancements

    Hadoop Enhancements

    We won the IEEE cloud computing challenge




                 Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   14
THANK YOU




       Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   15
Extra Slides




         Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   16
Major Contributors to Hadoop….




         Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   17
Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   18
Quantity of Global Data
                           Exabyte




   130                  2,720
                                                              7,910
  2005

                        2012

                                                               2015*




           Copyright © 2011 Flytxt B.V. All rights reserved            1/16/2012   19
Numbers behind the News!!


    Twitter produces over 230 million tweets per day


    Wal-Mart is logging one million transactions per hour


    Facebook creates over 30 billion pieces of content ranging
     from web links, news, blogs, photo


    India's mobile subscription base at 873.61 mn users


    India has a population of 1.21 billion
Lets meet the Big data Stack
 •   Oozie – Open-source workflow/coordination service to
     manage data processing jobs for Apache Hadoop™ -
     Developed at Yahoo!

 •   HBase – Column-store database based on Google’s
     BigTable. Holds extremely large data sets (Petabytes)

 •   Hive – SQL based data warehousing app with features for
     analyzing very large data sets - Developed at Facebook

 •   Zoo Keeper – Distributed consensus engine providing
     Leader election, service discovery, distributed locking /
     mutual exclusion

 •   Pig - platform for analyzing large data sets that consists of a
     high-level language for expressing data analysis steps

 •   Ganglia - a scalable distributed monitoring system for high-
     performance computing systems such as clusters and Grids

 •   Apache Mahout - Free implementations of distributed or
     otherwise scalable machine learning algorithms on
     the Hadoop platform


                      Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   21
Copyright © 2011 Flytxt B.V. All rights reserved   1/16/2012   22

Más contenido relacionado

La actualidad más candente

Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Big Data’s Big Impact on Businesses
Big Data’s Big Impact on BusinessesBig Data’s Big Impact on Businesses
Big Data’s Big Impact on BusinessesCRISIL Limited
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyNishant Gandhi
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big DataFujitsu UK
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewSivashankar Ganapathy
 
Addressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop WayAddressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop WayXoriant Corporation
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataJoey Li
 
Big data trends challenges opportunities
Big data trends challenges opportunitiesBig data trends challenges opportunities
Big data trends challenges opportunitiesMohammed Guller
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 
Big Data Overview 2013-2014
Big Data Overview 2013-2014Big Data Overview 2013-2014
Big Data Overview 2013-2014KMS Technology
 
Big data, data science & fast data
Big data, data science & fast dataBig data, data science & fast data
Big data, data science & fast dataKunal Joshi
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An OverviewArvind Kalyan
 
5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance Qubole
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035Neelam Rawat
 

La actualidad más candente (20)

Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Motivation for big data
Motivation for big dataMotivation for big data
Motivation for big data
 
Big Data’s Big Impact on Businesses
Big Data’s Big Impact on BusinessesBig Data’s Big Impact on Businesses
Big Data’s Big Impact on Businesses
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big Data
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big Data
Big DataBig Data
Big Data
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Addressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop WayAddressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop Way
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data trends challenges opportunities
Big data trends challenges opportunitiesBig data trends challenges opportunities
Big data trends challenges opportunities
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Big data storage
Big data storageBig data storage
Big data storage
 
Big Data Overview 2013-2014
Big Data Overview 2013-2014Big Data Overview 2013-2014
Big Data Overview 2013-2014
 
Big data, data science & fast data
Big data, data science & fast dataBig data, data science & fast data
Big data, data science & fast data
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
 

Destacado

Improving Collaborative Filtering Based Recommenders Using Topic Modelling
Improving Collaborative Filtering Based Recommenders Using Topic ModellingImproving Collaborative Filtering Based Recommenders Using Topic Modelling
Improving Collaborative Filtering Based Recommenders Using Topic ModellingFlytxt
 
7th prepaid mobile summit presentation by Abhay Doshi
7th prepaid mobile summit presentation by Abhay Doshi7th prepaid mobile summit presentation by Abhay Doshi
7th prepaid mobile summit presentation by Abhay DoshiFlytxt
 
Hadoop for carrier
Hadoop for carrierHadoop for carrier
Hadoop for carrierFlytxt
 
Data analytics driven customer experience programs
Data analytics driven customer experience programsData analytics driven customer experience programs
Data analytics driven customer experience programsFlytxt
 
Apache spark its place within a big data stack
Apache spark  its place within a big data stackApache spark  its place within a big data stack
Apache spark its place within a big data stackJunjun Olympia
 
Recommendation engines matching items to users
Recommendation engines matching items to usersRecommendation engines matching items to users
Recommendation engines matching items to usersFlytxt
 
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...Flytxt
 
Uniting the touchpoints - the Asia Miles story
Uniting the touchpoints - the Asia Miles storyUniting the touchpoints - the Asia Miles story
Uniting the touchpoints - the Asia Miles storyTransform magazine
 
Big data analytics and building intelligent applications
Big data analytics and building intelligent applicationsBig data analytics and building intelligent applications
Big data analytics and building intelligent applicationsFlytxt
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 
Deriving economic value for CSPs with Big Data [read-only]
Deriving economic value for CSPs with Big Data [read-only]Deriving economic value for CSPs with Big Data [read-only]
Deriving economic value for CSPs with Big Data [read-only]Flytxt
 
Roadmap to realizing the value of telco data – opportunities, challenges, use...
Roadmap to realizing the value of telco data – opportunities, challenges, use...Roadmap to realizing the value of telco data – opportunities, challenges, use...
Roadmap to realizing the value of telco data – opportunities, challenges, use...Flytxt
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellKhalid Imran
 
Transforming Customer Experience: From Moments to Journeys
Transforming Customer Experience: From Moments to JourneysTransforming Customer Experience: From Moments to Journeys
Transforming Customer Experience: From Moments to JourneysMcKinsey on Marketing & Sales
 

Destacado (16)

Topic 9: MR+
Topic 9: MR+Topic 9: MR+
Topic 9: MR+
 
Improving Collaborative Filtering Based Recommenders Using Topic Modelling
Improving Collaborative Filtering Based Recommenders Using Topic ModellingImproving Collaborative Filtering Based Recommenders Using Topic Modelling
Improving Collaborative Filtering Based Recommenders Using Topic Modelling
 
7th prepaid mobile summit presentation by Abhay Doshi
7th prepaid mobile summit presentation by Abhay Doshi7th prepaid mobile summit presentation by Abhay Doshi
7th prepaid mobile summit presentation by Abhay Doshi
 
Hadoop for carrier
Hadoop for carrierHadoop for carrier
Hadoop for carrier
 
Data analytics driven customer experience programs
Data analytics driven customer experience programsData analytics driven customer experience programs
Data analytics driven customer experience programs
 
Apache spark its place within a big data stack
Apache spark  its place within a big data stackApache spark  its place within a big data stack
Apache spark its place within a big data stack
 
Recommendation engines matching items to users
Recommendation engines matching items to usersRecommendation engines matching items to users
Recommendation engines matching items to users
 
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...
 
Uniting the touchpoints - the Asia Miles story
Uniting the touchpoints - the Asia Miles storyUniting the touchpoints - the Asia Miles story
Uniting the touchpoints - the Asia Miles story
 
Big data analytics and building intelligent applications
Big data analytics and building intelligent applicationsBig data analytics and building intelligent applications
Big data analytics and building intelligent applications
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Deriving economic value for CSPs with Big Data [read-only]
Deriving economic value for CSPs with Big Data [read-only]Deriving economic value for CSPs with Big Data [read-only]
Deriving economic value for CSPs with Big Data [read-only]
 
Multichannel Customer Journeys
Multichannel Customer JourneysMultichannel Customer Journeys
Multichannel Customer Journeys
 
Roadmap to realizing the value of telco data – opportunities, challenges, use...
Roadmap to realizing the value of telco data – opportunities, challenges, use...Roadmap to realizing the value of telco data – opportunities, challenges, use...
Roadmap to realizing the value of telco data – opportunities, challenges, use...
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
Transforming Customer Experience: From Moments to Journeys
Transforming Customer Experience: From Moments to JourneysTransforming Customer Experience: From Moments to Journeys
Transforming Customer Experience: From Moments to Journeys
 

Similar a Leveraging open source for big data stack

Harnessing hadoop for big data analytics v0.1
Harnessing hadoop for big data analytics v0.1Harnessing hadoop for big data analytics v0.1
Harnessing hadoop for big data analytics v0.1jobinwilson
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadooplamont_lockwood
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big dealeduarderwee
 
Big data and hadoop introduction
Big data and hadoop introductionBig data and hadoop introduction
Big data and hadoop introductionAjay Mittal
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Tw Technology Radar Qtb Sep11
Tw Technology Radar Qtb Sep11Tw Technology Radar Qtb Sep11
Tw Technology Radar Qtb Sep11Adrian Treacy
 
Big Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use casesBig Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use casesJeff Kelly
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitDataWorks Summit
 
Hortonworks for Financial Analysts Presentation
Hortonworks for Financial Analysts PresentationHortonworks for Financial Analysts Presentation
Hortonworks for Financial Analysts PresentationHortonworks
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014Hortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Stéphane Fréchette - Samedi SQL - Introduction to HDInsight
Stéphane Fréchette - Samedi SQL - Introduction to HDInsightStéphane Fréchette - Samedi SQL - Introduction to HDInsight
Stéphane Fréchette - Samedi SQL - Introduction to HDInsightMSDEVMTL
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 

Similar a Leveraging open source for big data stack (20)

Harnessing hadoop for big data analytics v0.1
Harnessing hadoop for big data analytics v0.1Harnessing hadoop for big data analytics v0.1
Harnessing hadoop for big data analytics v0.1
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadoop
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 
Big data and hadoop introduction
Big data and hadoop introductionBig data and hadoop introduction
Big data and hadoop introduction
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Tw Technology Radar Qtb Sep11
Tw Technology Radar Qtb Sep11Tw Technology Radar Qtb Sep11
Tw Technology Radar Qtb Sep11
 
Big Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use casesBig Data and Hadoop - key drivers, ecosystem and use cases
Big Data and Hadoop - key drivers, ecosystem and use cases
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 
Hortonworks for Financial Analysts Presentation
Hortonworks for Financial Analysts PresentationHortonworks for Financial Analysts Presentation
Hortonworks for Financial Analysts Presentation
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Stéphane Fréchette - Samedi SQL - Introduction to HDInsight
Stéphane Fréchette - Samedi SQL - Introduction to HDInsightStéphane Fréchette - Samedi SQL - Introduction to HDInsight
Stéphane Fréchette - Samedi SQL - Introduction to HDInsight
 
Introduction to Azure HDInsight
Introduction to Azure HDInsightIntroduction to Azure HDInsight
Introduction to Azure HDInsight
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 

Más de Flytxt

Flytxt corporate brochure
Flytxt corporate brochureFlytxt corporate brochure
Flytxt corporate brochureFlytxt
 
Data analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital eraData analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital eraFlytxt
 
Omni channel customer experience
Omni channel customer experienceOmni channel customer experience
Omni channel customer experienceFlytxt
 
Analytics tools drive customer experience in the digital age
Analytics tools drive customer experience in the digital ageAnalytics tools drive customer experience in the digital age
Analytics tools drive customer experience in the digital ageFlytxt
 
Enhancing Connected Customer Experience through Mobile Consumer Analytics
 Enhancing Connected Customer Experience through Mobile Consumer Analytics Enhancing Connected Customer Experience through Mobile Consumer Analytics
Enhancing Connected Customer Experience through Mobile Consumer AnalyticsFlytxt
 
Flytxt: Personalizing Engagement
Flytxt: Personalizing EngagementFlytxt: Personalizing Engagement
Flytxt: Personalizing EngagementFlytxt
 
Flytxt a unique success story in big data analytics
Flytxt a unique success story in big data analyticsFlytxt a unique success story in big data analytics
Flytxt a unique success story in big data analyticsFlytxt
 
Flytxt brochure
Flytxt brochureFlytxt brochure
Flytxt brochureFlytxt
 
Afaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Afaqs Reporter: Strategise, Leap & Lead with Mobile MarketingAfaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Afaqs Reporter: Strategise, Leap & Lead with Mobile MarketingFlytxt
 
Warid uganda big data experience
Warid uganda   big data experienceWarid uganda   big data experience
Warid uganda big data experienceFlytxt
 
Co-existence or competition - RDBMS and Hadoop
Co-existence or competition  - RDBMS and HadoopCo-existence or competition  - RDBMS and Hadoop
Co-existence or competition - RDBMS and HadoopFlytxt
 
Co existence or Competitions? RDBMS and Hadoop
Co existence or Competitions? RDBMS and HadoopCo existence or Competitions? RDBMS and Hadoop
Co existence or Competitions? RDBMS and HadoopFlytxt
 
Co existence or Competition ? - RDBMS and Hadoop
Co existence or Competition ? - RDBMS and HadoopCo existence or Competition ? - RDBMS and Hadoop
Co existence or Competition ? - RDBMS and HadoopFlytxt
 

Más de Flytxt (13)

Flytxt corporate brochure
Flytxt corporate brochureFlytxt corporate brochure
Flytxt corporate brochure
 
Data analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital eraData analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital era
 
Omni channel customer experience
Omni channel customer experienceOmni channel customer experience
Omni channel customer experience
 
Analytics tools drive customer experience in the digital age
Analytics tools drive customer experience in the digital ageAnalytics tools drive customer experience in the digital age
Analytics tools drive customer experience in the digital age
 
Enhancing Connected Customer Experience through Mobile Consumer Analytics
 Enhancing Connected Customer Experience through Mobile Consumer Analytics Enhancing Connected Customer Experience through Mobile Consumer Analytics
Enhancing Connected Customer Experience through Mobile Consumer Analytics
 
Flytxt: Personalizing Engagement
Flytxt: Personalizing EngagementFlytxt: Personalizing Engagement
Flytxt: Personalizing Engagement
 
Flytxt a unique success story in big data analytics
Flytxt a unique success story in big data analyticsFlytxt a unique success story in big data analytics
Flytxt a unique success story in big data analytics
 
Flytxt brochure
Flytxt brochureFlytxt brochure
Flytxt brochure
 
Afaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Afaqs Reporter: Strategise, Leap & Lead with Mobile MarketingAfaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Afaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
 
Warid uganda big data experience
Warid uganda   big data experienceWarid uganda   big data experience
Warid uganda big data experience
 
Co-existence or competition - RDBMS and Hadoop
Co-existence or competition  - RDBMS and HadoopCo-existence or competition  - RDBMS and Hadoop
Co-existence or competition - RDBMS and Hadoop
 
Co existence or Competitions? RDBMS and Hadoop
Co existence or Competitions? RDBMS and HadoopCo existence or Competitions? RDBMS and Hadoop
Co existence or Competitions? RDBMS and Hadoop
 
Co existence or Competition ? - RDBMS and Hadoop
Co existence or Competition ? - RDBMS and HadoopCo existence or Competition ? - RDBMS and Hadoop
Co existence or Competition ? - RDBMS and Hadoop
 

Último

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 

Último (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Leveraging open source for big data stack

  • 1. Leveraging Open Source Big Data Stack Prasanth M Sasidharan Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012
  • 2. What is data?  Data is Information in raw or unorganized form such as alphabets, numbers, or symbols What is Big data?  Big Data refers to large datasets which are difficult to store, manage and analyze  Everyday, we create 2.5 trillion bytes of data–so much that 90% of the data in the world today has been created in the last two years alone. Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 2
  • 4. Global Data Trends Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 4
  • 5. Big Data & Distributed Computing  Multiple servers, each working on part of job, each doing same task .  Key Challenges: • Work distribution and orchestration • Error recovery • Scalability and management Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 5
  • 6. FOSS in Aadhar  Aadhaar is a 12-digit unique number which the Unique Identification Authority of India (UIDAI) will issue for all residents in India  The number will be stored in a centralized database and linked to the basic demographics and biometric information – photograph, ten fingerprints and iris – of each individual.  It is unique and robust enough to eliminate the large number of duplicate and fake identities in government and private databases Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 6
  • 7. Lets Meet a Stack! Application Layer Infrastructure Layer Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 7
  • 8. Infrastructure for Big Data Analysis  What’s Virtualization? Virtualization allows multiple operating system instances to run concurrently on a single computer; it is a means of separating hardware from a single operating system. Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 8
  • 9. What’s Hypervisor? ◦ Also called virtual machine manager (VMM), is one of many hardware virtualization techniques allowing multiple operating systems, termed guests, to run concurrently on a host computer ◦ Originally developed in the 1970s as part of the IBM S/360 Xen® hypervisor Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 9
  • 10. Advantages of FOSS  Flexibility and Freedom  Reliability  Auditability  Fast Deployment  Cost Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 10
  • 11. Cost For Reproducing YouTube Capital Expenditures Ann Expenses,ex HW Support ($M) ($M) System Hardware Software Total Staff Support Total Oracle Exadata $147.4 $442.0 $589.4 $1.6 $97.4 $99.0 Alternative openSource, commodity hardware $104.2 $0.0 $104.2 $2.2 $12.9 $15.1 Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 11
  • 12. Get Involved!  Find out about Apache projects (http://projects.apache.org/  Join mailing lists  Pick up a Bug  Suggest ideas or Fixes  Checkout the latest code / Download releases  Change the sourcefiles to incorporate your change or addition  Provide appropriate source code documentation and follow project's coding conventions.  Check Whether the software still compiles and runs correctly  Run any unit or regression tests the software may have  Send the patch for Review & committing Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 12
  • 13. Notable Users of Hadoop (Source: http://en.wikipedia.org/wiki/Hadoop) • Adobe • Meebo • Amazon • The New York Times • AOL • Rackspace • eBay • StumbleUpon • Facebook • Twitter • Fox Interactive Media • Yahoo • IBM • Last.fm • LinkedIn References • Hadoop: The Definitive Guide-MapReduce for the Cloud • HBase: The Definitive Guide • Hive Wiki (http://wiki.apache.org/hadoop/Hive) • Pig Wiki (http://wiki.apache.org/pig/) Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 13
  • 14. Open Source Initiatives @ FlyTXT  Customization Specific to our business lines  Mahout Enhancements for additional Machine Learning Algorithms  Hive Customization  Oozie Enhancements  Hadoop Enhancements  We won the IEEE cloud computing challenge Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 14
  • 15. THANK YOU Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 15
  • 16. Extra Slides Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 16
  • 17. Major Contributors to Hadoop…. Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 17
  • 18. Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 18
  • 19. Quantity of Global Data Exabyte 130 2,720 7,910 2005 2012 2015* Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 19
  • 20. Numbers behind the News!! Twitter produces over 230 million tweets per day Wal-Mart is logging one million transactions per hour Facebook creates over 30 billion pieces of content ranging from web links, news, blogs, photo India's mobile subscription base at 873.61 mn users India has a population of 1.21 billion
  • 21. Lets meet the Big data Stack • Oozie – Open-source workflow/coordination service to manage data processing jobs for Apache Hadoop™ - Developed at Yahoo! • HBase – Column-store database based on Google’s BigTable. Holds extremely large data sets (Petabytes) • Hive – SQL based data warehousing app with features for analyzing very large data sets - Developed at Facebook • Zoo Keeper – Distributed consensus engine providing Leader election, service discovery, distributed locking / mutual exclusion • Pig - platform for analyzing large data sets that consists of a high-level language for expressing data analysis steps • Ganglia - a scalable distributed monitoring system for high- performance computing systems such as clusters and Grids • Apache Mahout - Free implementations of distributed or otherwise scalable machine learning algorithms on the Hadoop platform Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 21
  • 22. Copyright © 2011 Flytxt B.V. All rights reserved 1/16/2012 22

Notas del editor

  1. Exabyte is 1 billion gigabytes, 7910 is 3 times more bits of information in digital universe than stars in physical universe
  2. Indian telecom added 7.9 million new subscribers in September. The indian population can be related to Aadhar project
  3. Mahout is a person who drives an elephant – catching a taxi from airport algorithm