SlideShare una empresa de Scribd logo
1 de 32
Tackling Big Data with Hadoop and
Graphical Open Source Integration




                             Michaël Hirt
        Data Integration Product Manager
Agenda



1. What is Big Data ?
2. Talend’s Goal
3. What’s next ? Big Data Quality and Big Data management
4. Talend Open Studio for Big Data in action



© Talend 2011                                               2
What is Big Data?
What Is BIG Data?
                                            2,300 tweets
                                            per second
    "Big data"                              (June 2011)
    is information
    of extreme size,                         50 gigabytes of data
    diversity, complexity                    per person on Earth
                                             50,000,000,000
    and need for rapid                       300 exabytes total
    processing.
                                             200 billion
    Ted Friedman - Information
                                             intelligent devices
    Infrastructure and Big Data Projects
    Key Initiative Overview - July 2011      200,000,000,000
                                           2015
                                             275 exabytes
                                             of data flowing over
                                             the Internet each day
                                             275,000,000,000,000,000,000

© Talend 2011
                                           2020                            4
How to
                                                                define Big
                                                                data is….
                                         Hans Rosling – uses big data to analyze world health trends




     Key Takeaway #1
    volume, variety, velocity

© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                          5
The 6 Dimensions of BIG Data
  Primary challenges
    Volume
    Velocity
    Variety
    Complexity


  And also
    Validation
    Lineage


 © Talend 2011                 6
Key Takeaway #2
                 Forces us to think
© Talend 2011
                 differently
© Talend 2011 – Stri2y Private & Confidential   7
Traditional Data Flows


         CRM


                                                 ETL
                                                              Normalized   Traditional Data
          ERP                                    Data            Data
                                                                             Warehouse
                                                Quality

      Finance




 • Scheduled–daily or weekly,
   sometimes more frequently.                                               Business           Business
                                                                            Analyst            User
 • Volumes rarely exceed
   terabytes                                           Warehouse
                                                     Administrator
                                                                                              Executives
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                             8
The new world of big data

                                                             Social
                                                           Networking
         CRM




          ERP
                                                Big Data


      Finance




© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                           9
The new world of big data

                                                              Social
                                                            Networking
         CRM


                                                           Mobile Devices

          ERP



                                                Big Data
      Finance




© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                               10
The new world of big data

                                                               Social
                                                             Networking
         CRM


                                                           Mobile Devices

          ERP

                                                            Transactions


      Finance
                                                           Network Devices



                                                Big Data       Sensors




© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                11
Data driven business


                            enables
          data            governance




                                                         supports
                                  information                                       decisions


                                                                                          drives
  Information provides
  value to the business
  If you can't rely on your information then                                           Your
  the result can be missed opportunities, or                                         business
  higher costs.
      Matthew West and Julian Fowler (1999). Developing High Quality Data Models.
      The European Process Industries STEP Technical Liaison Executive (EPISTLE).
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                      12
BIG Data Management
           Big Data      Big Data Management
          Production

                          Big Data    Big Data         Big Data
    RDBMS               Integration   Quality        Consumption
    Analytical DB
    NoSQL DB
    ERP/CRM                                          Mining
    SaaS
    Social Media                                     Analytics
    Web Analytics
    Log Files           Storage
                        Processing                   Search
    RFID
                        Filtering
    Call Data Records
    Sensors                                          Enrichment
    Machine-Generated

                             Turn Big Data into
                            actionable information
© Talend 2011                                                      13
BIG data driven business

                            enables
     BIG data             governance




                                                         supports
                                      BIG                                            BIG
                                  information                                       decisions

                                                                                          drives
  Information provides
  value to the business
  If you can't rely on your information then
  the result can be missed opportunities, or
                                                                                     BIG
  higher costs.                                                                     business

      Matthew West and Julian Fowler (1999). Developing High Quality Data Models.
      The European Process Industries STEP Technical Liaison Executive (EPISTLE).
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                      14
Our goal
Talend – The Market Leading Unified Integration Platform

                                     Talend Enterprise


                 Data            Data
                                              MDM     ESB         BPM
                Quality       Integration

                                                                           Commercial license
                                                                           Subscription model

         Studio            Repository Deployment Execution   Monitoring



                                                                           Open source license

                           Talend Open Studio for                          Free of charge
                                                                           Optional support

                  Data             Data
                 Quality        Integration
                                              MDM     ESB




Recognized as the open source leader in each of its market
            category by all industry analysts
© Talend 2011                                                                                     16
Trying to get from this…




 © Talend 2011 – Stri2y Private & Confidential
 © Talend 2011                                   17
to this…




 Why Talend…

 ONLY Talend generates code that is executed within map reduce. This
 open approach removes the limitation of a proprietary “engine” to
 provide a truly unique and powerful set of tools for big data.




 © Talend 2011 – Stri2y Private & Confidential
 © Talend 2011                                                         18
“Big Data for the Masses”
Goal: Democratize Big Data
                                                Talend Open Studio for Big Data
                                                 “Big Data for the Masses”
                                                   Improves efficiency of big data job
                                                     design with graphic interface
                                                   Abstracts and generates code
                                                   Run transforms inside Hadoop
                                                   Native support for HDFS, Pig, HBase,
                                         Pig
                                                     Sqoop and Hive
                                                   Apache License 2.0
                                                   Embedded in Hortonworks Data Platform
       …an open source
                                                   Certifed with Cloudera, MapR and
         ecosystem
                                                     Grenplum



© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                              20
Big Data – How about Data   Quality?




© Talend 2012
Poor Data Quality + Big Data = Big Problems
Poor Data Quality * Big Data = Big Problems^2




           Key Takeaway #3
           In big data…
           poor data quality can be magnified at huge scale

© Talend 2011                                                 23
Two methods for inserting data quality into a big data job




 1. Pipelining: as part of the load process


 2. Load the cluster then implement and execute
    a data quality map reduce job




© Talend 2011                                                 24
E-T-L
      Extract – Transform - Load

© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                   25
E- DQ -L
      Extract – Improve/Cleanse - Load
© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                   26
Pipelining: data quality with big data



              CRM
                                                DQ


               ERP



                                                DQ
           Finance
                                                            Big Data

           Social
         Networking
                                                     • Use traditional data quality tools
                                                     • No new programming, no PHDs
                                                     • Once and done
      Mobile Devices



© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                               27
Big data alternative: Load and improve within the cluster



              CRM

                                                     DQ

               ERP
                                                           DQ

           Finance
                                                        Big Data

           Social
         Networking
                                                •   Load first, improve later
                                                •   Really complex to build, limited tools
                                                •   Constant on, increments
      Mobile Devices
                                                •   Insane performance


© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                                                28
Let us show you…




© Talend 2012
What’s next for Talend Big Data?




© Talend 2012
Talend Open Studio for Big Data




                  4.1: Hive &                           5.1:HCatalog
      4.0: HDFS                 4.2: Pig   5.0: Hbase
                     Sqoop                                & Oozie

© Talend 2011                                                          31
big
2012
         data                               now   Q4   2013

Talend Open Studio for Big Data
Packaged within Hortonworks Data Platform
     …Eclipse tools for HIVE, HDFS, PIG, SCOOP

     …supports Oozie, Hcatalog, Kerberos


Free to download and use under the Apache license
 …democratizing big data through intuitive tools




© Talend 2011 – Stri2y Private & Confidential
© Talend 2011                                                 32
Questions / Thanks for attending
                      mhirt_at_talend.com

Más contenido relacionado

La actualidad más candente

Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesDataWorks Summit
 
IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020Anjan Roy, PMP
 
Egress Switch Introduction
Egress Switch IntroductionEgress Switch Introduction
Egress Switch Introductionyonifine
 
ActuateOne for Utility Analytics
ActuateOne for Utility AnalyticsActuateOne for Utility Analytics
ActuateOne for Utility Analyticskatsoulis
 
Site/Location Hubs - A Hot Trend In Master Data Management (MDM)
Site/Location Hubs - A Hot Trend In Master Data Management (MDM)Site/Location Hubs - A Hot Trend In Master Data Management (MDM)
Site/Location Hubs - A Hot Trend In Master Data Management (MDM)Rhapsody Technologies, Inc.
 
Big Value from Big Data (CIO-Forum)
Big Value from Big Data (CIO-Forum)Big Value from Big Data (CIO-Forum)
Big Value from Big Data (CIO-Forum)Avanade Norway
 
The Future of ERP by Bertrand Andries
The Future of ERP by Bertrand Andries  The Future of ERP by Bertrand Andries
The Future of ERP by Bertrand Andries CONFENIS 2012
 
Presentation dell - into the cloud with dell
Presentation   dell - into the cloud with dellPresentation   dell - into the cloud with dell
Presentation dell - into the cloud with dellxKinAnx
 
Module 3 Adapative Customer Experience Final
Module 3 Adapative Customer Experience FinalModule 3 Adapative Customer Experience Final
Module 3 Adapative Customer Experience FinalVivastream
 
The Best Analytics Tools
The Best Analytics ToolsThe Best Analytics Tools
The Best Analytics ToolsDatalicious
 
Hadoop: What It Is and What It's Not
Hadoop: What It Is and What It's NotHadoop: What It Is and What It's Not
Hadoop: What It Is and What It's NotInside Analysis
 
Big Data Challenges
Big Data ChallengesBig Data Challenges
Big Data ChallengesDatalicious
 
Valuendo cyberwar and security (okt 2011) handout
Valuendo cyberwar and security (okt 2011) handoutValuendo cyberwar and security (okt 2011) handout
Valuendo cyberwar and security (okt 2011) handoutMarc Vael
 
Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop InnoTech
 
Smarter Computing in a New Era of IT - Dr. Gururaj Rao
Smarter Computing in a New Era of IT - Dr. Gururaj RaoSmarter Computing in a New Era of IT - Dr. Gururaj Rao
Smarter Computing in a New Era of IT - Dr. Gururaj RaoJyothi Satyanathan
 
SmartData - Monetizing Data Assets
SmartData - Monetizing Data AssetsSmartData - Monetizing Data Assets
SmartData - Monetizing Data AssetsEd Dodds
 

La actualidad más candente (20)

Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation Architectures
 
IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020IP&A109 Next-Generation Analytics Architecture for the Year 2020
IP&A109 Next-Generation Analytics Architecture for the Year 2020
 
Tera stream ETL
Tera stream ETLTera stream ETL
Tera stream ETL
 
101 ab 1415-1445
101 ab 1415-1445101 ab 1415-1445
101 ab 1415-1445
 
MDM - Oracle Site Hub 101
MDM - Oracle Site Hub 101MDM - Oracle Site Hub 101
MDM - Oracle Site Hub 101
 
Egress Switch Introduction
Egress Switch IntroductionEgress Switch Introduction
Egress Switch Introduction
 
ActuateOne for Utility Analytics
ActuateOne for Utility AnalyticsActuateOne for Utility Analytics
ActuateOne for Utility Analytics
 
Site/Location Hubs - A Hot Trend In Master Data Management (MDM)
Site/Location Hubs - A Hot Trend In Master Data Management (MDM)Site/Location Hubs - A Hot Trend In Master Data Management (MDM)
Site/Location Hubs - A Hot Trend In Master Data Management (MDM)
 
Big Value from Big Data (CIO-Forum)
Big Value from Big Data (CIO-Forum)Big Value from Big Data (CIO-Forum)
Big Value from Big Data (CIO-Forum)
 
The Future of ERP by Bertrand Andries
The Future of ERP by Bertrand Andries  The Future of ERP by Bertrand Andries
The Future of ERP by Bertrand Andries
 
Presentation dell - into the cloud with dell
Presentation   dell - into the cloud with dellPresentation   dell - into the cloud with dell
Presentation dell - into the cloud with dell
 
Module 3 Adapative Customer Experience Final
Module 3 Adapative Customer Experience FinalModule 3 Adapative Customer Experience Final
Module 3 Adapative Customer Experience Final
 
The Best Analytics Tools
The Best Analytics ToolsThe Best Analytics Tools
The Best Analytics Tools
 
Hadoop: What It Is and What It's Not
Hadoop: What It Is and What It's NotHadoop: What It Is and What It's Not
Hadoop: What It Is and What It's Not
 
Big Data Challenges
Big Data ChallengesBig Data Challenges
Big Data Challenges
 
Valuendo cyberwar and security (okt 2011) handout
Valuendo cyberwar and security (okt 2011) handoutValuendo cyberwar and security (okt 2011) handout
Valuendo cyberwar and security (okt 2011) handout
 
Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop
 
Smarter Computing in a New Era of IT - Dr. Gururaj Rao
Smarter Computing in a New Era of IT - Dr. Gururaj RaoSmarter Computing in a New Era of IT - Dr. Gururaj Rao
Smarter Computing in a New Era of IT - Dr. Gururaj Rao
 
Taming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel ArchitectureTaming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel Architecture
 
SmartData - Monetizing Data Assets
SmartData - Monetizing Data AssetsSmartData - Monetizing Data Assets
SmartData - Monetizing Data Assets
 

Destacado

OWF12/PAUG Conf Days Android system development, maxime ripard, free electrons
OWF12/PAUG Conf Days Android system development, maxime ripard, free electronsOWF12/PAUG Conf Days Android system development, maxime ripard, free electrons
OWF12/PAUG Conf Days Android system development, maxime ripard, free electronsParis Open Source Summit
 
OWF13 - Catalyzing the discovery, analysis and adoption of OSS community-ba...
OWF13 - Catalyzing the discovery, analysis and adoption of   OSS community-ba...OWF13 - Catalyzing the discovery, analysis and adoption of   OSS community-ba...
OWF13 - Catalyzing the discovery, analysis and adoption of OSS community-ba...Paris Open Source Summit
 
EOLE / OWF 12 - Foss licences before courts in europe-philippe laurent (eole2...
EOLE / OWF 12 - Foss licences before courts in europe-philippe laurent (eole2...EOLE / OWF 12 - Foss licences before courts in europe-philippe laurent (eole2...
EOLE / OWF 12 - Foss licences before courts in europe-philippe laurent (eole2...Paris Open Source Summit
 
OWF14 - Plenary Session : Patrice Bertrand, President, CNLL
OWF14 - Plenary Session : Patrice Bertrand, President, CNLLOWF14 - Plenary Session : Patrice Bertrand, President, CNLL
OWF14 - Plenary Session : Patrice Bertrand, President, CNLLParis Open Source Summit
 
OWF14 - Plenary Session : Jean-Baptiste KEMPF, Président VLC
OWF14 - Plenary Session : Jean-Baptiste KEMPF, Président VLCOWF14 - Plenary Session : Jean-Baptiste KEMPF, Président VLC
OWF14 - Plenary Session : Jean-Baptiste KEMPF, Président VLCParis Open Source Summit
 

Destacado (7)

OWF12/PAUG Conf Days Android system development, maxime ripard, free electrons
OWF12/PAUG Conf Days Android system development, maxime ripard, free electronsOWF12/PAUG Conf Days Android system development, maxime ripard, free electrons
OWF12/PAUG Conf Days Android system development, maxime ripard, free electrons
 
OWF13 - Catalyzing the discovery, analysis and adoption of OSS community-ba...
OWF13 - Catalyzing the discovery, analysis and adoption of   OSS community-ba...OWF13 - Catalyzing the discovery, analysis and adoption of   OSS community-ba...
OWF13 - Catalyzing the discovery, analysis and adoption of OSS community-ba...
 
EOLE / OWF 12 - Foss licences before courts in europe-philippe laurent (eole2...
EOLE / OWF 12 - Foss licences before courts in europe-philippe laurent (eole2...EOLE / OWF 12 - Foss licences before courts in europe-philippe laurent (eole2...
EOLE / OWF 12 - Foss licences before courts in europe-philippe laurent (eole2...
 
OWF12/Java Sacha labourey
OWF12/Java Sacha laboureyOWF12/Java Sacha labourey
OWF12/Java Sacha labourey
 
OWF14 - Plenary Session : Patrice Bertrand, President, CNLL
OWF14 - Plenary Session : Patrice Bertrand, President, CNLLOWF14 - Plenary Session : Patrice Bertrand, President, CNLL
OWF14 - Plenary Session : Patrice Bertrand, President, CNLL
 
Practica
PracticaPractica
Practica
 
OWF14 - Plenary Session : Jean-Baptiste KEMPF, Président VLC
OWF14 - Plenary Session : Jean-Baptiste KEMPF, Président VLCOWF14 - Plenary Session : Jean-Baptiste KEMPF, Président VLC
OWF14 - Plenary Session : Jean-Baptiste KEMPF, Président VLC
 

Similar a Tackling Big Data with Hadoop and Graphical Open Source Integration

Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forumbigdatawf
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessTeradata Aster
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshowAccenture
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxData Science London
 
EDF2013 - Richard Benjamins: Big Data – Big opportunities – Big risks? And ...
EDF2013 - Richard Benjamins: Big Data –  Big opportunities –  Big risks? And ...EDF2013 - Richard Benjamins: Big Data –  Big opportunities –  Big risks? And ...
EDF2013 - Richard Benjamins: Big Data – Big opportunities – Big risks? And ...European Data Forum
 
Big Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentBig Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentStrategy 2 Market, Inc,
 
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data AdvantageWebinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data AdvantageCloudera, Inc.
 
Scenari evolutivi nello snellimento dei sistemi informativi
Scenari evolutivi nello snellimento dei sistemi informativiScenari evolutivi nello snellimento dei sistemi informativi
Scenari evolutivi nello snellimento dei sistemi informativiFondazione CUOA
 
Big Data and Analytics
Big Data and AnalyticsBig Data and Analytics
Big Data and Analyticsdmurph4
 
PromptCloud Nasscom Emerge 50 Presentation
PromptCloud Nasscom Emerge 50 PresentationPromptCloud Nasscom Emerge 50 Presentation
PromptCloud Nasscom Emerge 50 PresentationPromptCloud
 
Infor i: Setting The Scene. Infor is the largest IBM i ISV in the World.
Infor i: Setting The Scene. Infor is the largest IBM i ISV in the World.Infor i: Setting The Scene. Infor is the largest IBM i ISV in the World.
Infor i: Setting The Scene. Infor is the largest IBM i ISV in the World.Inforsystemi
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...Vladimir Bacvanski, PhD
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...DATAVERSITY
 
Big data appliances for BI on Cloud
Big data appliances for BI on CloudBig data appliances for BI on Cloud
Big data appliances for BI on Cloudtdwiindia
 
CII Panel Discussion on Cloud Computing
CII Panel Discussion on Cloud ComputingCII Panel Discussion on Cloud Computing
CII Panel Discussion on Cloud ComputingAnand Deshpande
 
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Cloudera, Inc.
 

Similar a Tackling Big Data with Hadoop and Graphical Open Source Integration (20)

Big Data a big deal?
Big Data a big deal?Big Data a big deal?
Big Data a big deal?
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forum
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the Business
 
Hortonworks roadshow
Hortonworks roadshowHortonworks roadshow
Hortonworks roadshow
 
Accelerate Return on Data
Accelerate Return on DataAccelerate Return on Data
Accelerate Return on Data
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists Toolbox
 
EDF2013 - Richard Benjamins: Big Data – Big opportunities – Big risks? And ...
EDF2013 - Richard Benjamins: Big Data –  Big opportunities –  Big risks? And ...EDF2013 - Richard Benjamins: Big Data –  Big opportunities –  Big risks? And ...
EDF2013 - Richard Benjamins: Big Data – Big opportunities – Big risks? And ...
 
Big Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentBig Data: A Big Trap for Product Development
Big Data: A Big Trap for Product Development
 
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data AdvantageWebinar | Using Hadoop Analytics to Gain a Big Data Advantage
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
 
Scenari evolutivi nello snellimento dei sistemi informativi
Scenari evolutivi nello snellimento dei sistemi informativiScenari evolutivi nello snellimento dei sistemi informativi
Scenari evolutivi nello snellimento dei sistemi informativi
 
Big Data and Analytics
Big Data and AnalyticsBig Data and Analytics
Big Data and Analytics
 
PromptCloud Nasscom Emerge 50 Presentation
PromptCloud Nasscom Emerge 50 PresentationPromptCloud Nasscom Emerge 50 Presentation
PromptCloud Nasscom Emerge 50 Presentation
 
Infor i: Setting The Scene. Infor is the largest IBM i ISV in the World.
Infor i: Setting The Scene. Infor is the largest IBM i ISV in the World.Infor i: Setting The Scene. Infor is the largest IBM i ISV in the World.
Infor i: Setting The Scene. Infor is the largest IBM i ISV in the World.
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
 
Big data appliances for BI on Cloud
Big data appliances for BI on CloudBig data appliances for BI on Cloud
Big data appliances for BI on Cloud
 
CII Panel Discussion on Cloud Computing
CII Panel Discussion on Cloud ComputingCII Panel Discussion on Cloud Computing
CII Panel Discussion on Cloud Computing
 
101 ab 1445-1515
101 ab 1445-1515101 ab 1445-1515
101 ab 1445-1515
 
101 ab 1445-1515
101 ab 1445-1515101 ab 1445-1515
101 ab 1445-1515
 
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
 

Más de Paris Open Source Summit

#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...
#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...
#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...Paris Open Source Summit
 
#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...
#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...
#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...Paris Open Source Summit
 
#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...
#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...
#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...Paris Open Source Summit
 
#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino
#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino
#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, ArduinoParis Open Source Summit
 
#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...
#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...
#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...Paris Open Source Summit
 
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...Paris Open Source Summit
 
#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix
#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix
#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, ZabbixParis Open Source Summit
 
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, InriaParis Open Source Summit
 
#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...
#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...
#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...Paris Open Source Summit
 
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...Paris Open Source Summit
 
#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...
#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...
#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...Paris Open Source Summit
 
#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...
#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...
#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...Paris Open Source Summit
 
#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...
#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...
#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...Paris Open Source Summit
 
#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...
#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...
#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...Paris Open Source Summit
 
#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...
#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...
#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...Paris Open Source Summit
 
#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...
#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...
#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...Paris Open Source Summit
 
#OSSPARIS19 - Table ronde : souveraineté des données
#OSSPARIS19 - Table ronde : souveraineté des données #OSSPARIS19 - Table ronde : souveraineté des données
#OSSPARIS19 - Table ronde : souveraineté des données Paris Open Source Summit
 
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...Paris Open Source Summit
 
#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...
#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...
#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...Paris Open Source Summit
 
#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...
#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...
#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...Paris Open Source Summit
 

Más de Paris Open Source Summit (20)

#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...
#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...
#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...
 
#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...
#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...
#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...
 
#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...
#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...
#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...
 
#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino
#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino
#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino
 
#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...
#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...
#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...
 
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
 
#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix
#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix
#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix
 
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria
 
#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...
#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...
#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...
 
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
 
#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...
#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...
#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...
 
#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...
#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...
#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...
 
#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...
#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...
#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...
 
#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...
#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...
#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...
 
#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...
#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...
#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...
 
#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...
#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...
#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...
 
#OSSPARIS19 - Table ronde : souveraineté des données
#OSSPARIS19 - Table ronde : souveraineté des données #OSSPARIS19 - Table ronde : souveraineté des données
#OSSPARIS19 - Table ronde : souveraineté des données
 
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
 
#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...
#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...
#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...
 
#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...
#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...
#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...
 

Tackling Big Data with Hadoop and Graphical Open Source Integration

  • 1. Tackling Big Data with Hadoop and Graphical Open Source Integration Michaël Hirt Data Integration Product Manager
  • 2. Agenda 1. What is Big Data ? 2. Talend’s Goal 3. What’s next ? Big Data Quality and Big Data management 4. Talend Open Studio for Big Data in action © Talend 2011 2
  • 3. What is Big Data?
  • 4. What Is BIG Data? 2,300 tweets per second "Big data" (June 2011) is information of extreme size, 50 gigabytes of data diversity, complexity per person on Earth 50,000,000,000 and need for rapid 300 exabytes total processing. 200 billion Ted Friedman - Information intelligent devices Infrastructure and Big Data Projects Key Initiative Overview - July 2011 200,000,000,000 2015 275 exabytes of data flowing over the Internet each day 275,000,000,000,000,000,000 © Talend 2011 2020 4
  • 5. How to define Big data is…. Hans Rosling – uses big data to analyze world health trends Key Takeaway #1 volume, variety, velocity © Talend 2011 – Stri2y Private & Confidential © Talend 2011 5
  • 6. The 6 Dimensions of BIG Data Primary challenges  Volume  Velocity  Variety  Complexity And also  Validation  Lineage © Talend 2011 6
  • 7. Key Takeaway #2 Forces us to think © Talend 2011 differently © Talend 2011 – Stri2y Private & Confidential 7
  • 8. Traditional Data Flows CRM ETL Normalized Traditional Data ERP Data Data Warehouse Quality Finance • Scheduled–daily or weekly, sometimes more frequently. Business Business Analyst User • Volumes rarely exceed terabytes Warehouse Administrator Executives © Talend 2011 – Stri2y Private & Confidential © Talend 2011 8
  • 9. The new world of big data Social Networking CRM ERP Big Data Finance © Talend 2011 – Stri2y Private & Confidential © Talend 2011 9
  • 10. The new world of big data Social Networking CRM Mobile Devices ERP Big Data Finance © Talend 2011 – Stri2y Private & Confidential © Talend 2011 10
  • 11. The new world of big data Social Networking CRM Mobile Devices ERP Transactions Finance Network Devices Big Data Sensors © Talend 2011 – Stri2y Private & Confidential © Talend 2011 11
  • 12. Data driven business enables data governance supports information decisions drives Information provides value to the business If you can't rely on your information then Your the result can be missed opportunities, or business higher costs. Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process Industries STEP Technical Liaison Executive (EPISTLE). © Talend 2011 – Stri2y Private & Confidential © Talend 2011 12
  • 13. BIG Data Management Big Data Big Data Management Production Big Data Big Data Big Data RDBMS Integration Quality Consumption Analytical DB NoSQL DB ERP/CRM Mining SaaS Social Media Analytics Web Analytics Log Files Storage Processing Search RFID Filtering Call Data Records Sensors Enrichment Machine-Generated Turn Big Data into actionable information © Talend 2011 13
  • 14. BIG data driven business enables BIG data governance supports BIG BIG information decisions drives Information provides value to the business If you can't rely on your information then the result can be missed opportunities, or BIG higher costs. business Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process Industries STEP Technical Liaison Executive (EPISTLE). © Talend 2011 – Stri2y Private & Confidential © Talend 2011 14
  • 16. Talend – The Market Leading Unified Integration Platform Talend Enterprise Data Data MDM ESB BPM Quality Integration  Commercial license  Subscription model Studio Repository Deployment Execution Monitoring  Open source license Talend Open Studio for  Free of charge  Optional support Data Data Quality Integration MDM ESB Recognized as the open source leader in each of its market category by all industry analysts © Talend 2011 16
  • 17. Trying to get from this… © Talend 2011 – Stri2y Private & Confidential © Talend 2011 17
  • 18. to this… Why Talend… ONLY Talend generates code that is executed within map reduce. This open approach removes the limitation of a proprietary “engine” to provide a truly unique and powerful set of tools for big data. © Talend 2011 – Stri2y Private & Confidential © Talend 2011 18
  • 19. “Big Data for the Masses”
  • 20. Goal: Democratize Big Data Talend Open Studio for Big Data  “Big Data for the Masses”  Improves efficiency of big data job design with graphic interface  Abstracts and generates code  Run transforms inside Hadoop  Native support for HDFS, Pig, HBase, Pig Sqoop and Hive  Apache License 2.0  Embedded in Hortonworks Data Platform …an open source  Certifed with Cloudera, MapR and ecosystem Grenplum © Talend 2011 – Stri2y Private & Confidential © Talend 2011 20
  • 21. Big Data – How about Data Quality? © Talend 2012
  • 22. Poor Data Quality + Big Data = Big Problems Poor Data Quality * Big Data = Big Problems^2 Key Takeaway #3 In big data… poor data quality can be magnified at huge scale © Talend 2011 23
  • 23. Two methods for inserting data quality into a big data job 1. Pipelining: as part of the load process 2. Load the cluster then implement and execute a data quality map reduce job © Talend 2011 24
  • 24. E-T-L Extract – Transform - Load © Talend 2011 – Stri2y Private & Confidential © Talend 2011 25
  • 25. E- DQ -L Extract – Improve/Cleanse - Load © Talend 2011 – Stri2y Private & Confidential © Talend 2011 26
  • 26. Pipelining: data quality with big data CRM DQ ERP DQ Finance Big Data Social Networking • Use traditional data quality tools • No new programming, no PHDs • Once and done Mobile Devices © Talend 2011 – Stri2y Private & Confidential © Talend 2011 27
  • 27. Big data alternative: Load and improve within the cluster CRM DQ ERP DQ Finance Big Data Social Networking • Load first, improve later • Really complex to build, limited tools • Constant on, increments Mobile Devices • Insane performance © Talend 2011 – Stri2y Private & Confidential © Talend 2011 28
  • 28. Let us show you… © Talend 2012
  • 29. What’s next for Talend Big Data? © Talend 2012
  • 30. Talend Open Studio for Big Data 4.1: Hive & 5.1:HCatalog 4.0: HDFS 4.2: Pig 5.0: Hbase Sqoop & Oozie © Talend 2011 31
  • 31. big 2012 data now Q4 2013 Talend Open Studio for Big Data Packaged within Hortonworks Data Platform …Eclipse tools for HIVE, HDFS, PIG, SCOOP …supports Oozie, Hcatalog, Kerberos Free to download and use under the Apache license …democratizing big data through intuitive tools © Talend 2011 – Stri2y Private & Confidential © Talend 2011 32
  • 32. Questions / Thanks for attending mhirt_at_talend.com

Notas del editor

  1. This is a classic diagram that maps how business and data are related. Nothing is new. This never changes. In fact in becomes even more important today.
  2. We accomplish this innovation by offering two editions of our products.  The Talend Open Studio, at the bottom of this diagram, is a set of free open source products for Data Quality, Data Integration, Master Data Management, Enterprise Service Bus and Business Process Management. And when you are ready to deploy, you can purchase a Talend Enterprise commercial license, which includes the features found in world-class integration solutions such as extreme scalability, high availability and 24x7 mission-critical support – all backed by a large services and partner ecosystem. Unlike competitors “non-integrated” integration products, Talend’s uniqueness is in the unification of our products – they are built from the same unified platform, maximizing your productivity and providing greater software reuse and repeatability. An analogy would be the user experience you see with the integration of the iPod, iPad and iPhone. As shown in this picture, our products leverage the same studio, repository, and deployment, execution and monitoring tools to maximize your productivity. As modular products, you can buy what you need when you need it, or easily combine them to solve more comprehensive integration problems.
  3. For instance, this is a SIMPLE drawing of how the map reduce features work. This is abstract and does not reflect the complexity of code. Still pretty complex.
  4. Big data has an OPERATION DI challenge. This is the core of what talend was built on and part of our DNA. We simplify the process of implementation to speed projects and increase adoption.Note: I am trying to get a recording that can be embedded in the slide that will build a HDFS load as you speak. It is so simple that it was completed in the time it took for me to present this slide!
  5. Finally, the entire big data world has been built as an open source ecosystem. This all makes sense… talend is the open source leader.To this end we will introduce the first compelte set of tools that will democratize big data. Talend Open Studio for Big Data
  6. However, with big data comes significant challenges. For example, poor data quality can be magnified at huge scale. Consider a small company with 100 customers. Assume they had a bad address for three customers and sent a mailer out to their list. Three mailers would be returned and they would have wasted about 5 dollars or so. Now imagine the world of big data where this number of customers expands across business lines and companies and partners to millions. The costs are big. Even more interesting is the ability to not only use the data but to analyze. Across your customer base, how could you monitor and analyze every interaction they ever had with you (social media, web, stores, etc). This is large amounts of data. A small problem with the data can lead to very LARGE issues with analysis, invalidating the entire reason for big data. Data quality is KEY for big data – it is a core tenant of our strategy.
  7. demo