SlideShare una empresa de Scribd logo
1 de 29
June 2012




IBM Big Data
The Marriage of Hadoop and Data Warehousing


James Kobielus
Senior Program Director, Product Marketing, Big Data, IBM




                                                            © 2012 IBM Corporation
Hadoop and DW are
    fast being joined into a
    new platform paradigm:
       the Hadoop DW


2                              © 2012 IBM Corporation
Agenda




     Big Data: “3 Vs” and myriad use cases
     Big Data: diverse workloads
     Big Data: emergence of the “Hadoop DW”




3                                              © 2012 IBM Corporation
Agenda




     Big Data: “3 Vs” and myriad use cases
     Big Data: diverse workloads
     Big Data: emergence of the “Hadoop DW”




4                                              © 2012 IBM Corporation
Scalability Imperative: 3 “Vs” Drive Big Data Everywhere




       Information               Radical                     Extreme
    from Everywhere             Flexibility                 Scalability




    Volume                     Velocity                  Variety



5
    12            terabytes
     of Tweets created daily
                               5      million
                               trade events per second
                                                         100’s
                                                         from surveillance cameras
                                                                                  video
                                                                                  feeds

                                                                        © 2012 IBM Corporation
More Business Use Cases for Big Data Across Enterprise




6                                                    © 2012 IBM Corporation
More Mission-Critical Apps Ride on Big Data Platforms


        Advanced Analytic Applications
                                                        Integrate and manage the full variety, velocity
                                                         and volume of data

                                                        Apply advanced analytics to information in its
                                                         native form

                  Big Data Platform                     Visualize all available data for ad-hoc analysis
         Process and analyze any type of data            and discovery
                     Accelerators
                                                        Development environment for building new
                                                         analytic applications

                                                        Integration and deploy applications with
                                                         enterprise grade availability, manageability,
                                                         security, and performance
    •   Analyze data in motion   • Visualization and
    •   MapReduce / noSQL          exploration
    •   Machine Learning         • Scalability
    •   Text Analytics           • Hardware
    •   Text Search                acceleration
    •   Data Discovery           • Stream computing


7                                                                                            © 2012 IBM Corporation
Big Data: Business Crucible for Practical Data Science


                            Business and IT Identify
                         Information Sources Available




      New insights                                             IT Delivers a
    drive integration                                          Platform that
      to traditional                                         enables creative
       technology                                            exploration of all
                                                            available data and
                                                                  content



                           Business determines what
                        questions to ask by exploring the
                             data and relationships


8                                                                        © 2012 IBM Corporation
Big Data Initiatives: Fueled by Practical Data Science
                                      Analyze a Variety of Information
                                      Novel analytics on a broad set of mixed
                                      information that could not be analyzed before



                                      Analyze Information in Motion
                                      Streaming data analysis
                                      Large volume data bursts and ad-hoc analysis


                                      Analyze Extreme Volumes of Information
                                      Cost-efficiently process and analyze PBs of
                                      information
                                      Manage & analyze high volumes of structured,
                                      relational data


                                      Discover and Experiment
                                      Ad-hoc analytics, data discovery and
                                      experimentation



                                      Manage and Plan
                                      Enforce data structure, integrity and control to
9                                     ensure consistency for repeatable queries IBM Corporation
                                                                           © 2012
Big Data: Marriage of Established & Emerging Approaches


                 Established Approach                             Emerging Approaches
                  Structured, analytical, logical           Creative, holistic thought, intuition




                                    DW                        Hadoop, etc.
       Transaction Data                                                                         Web Logs


     Internal App Data                                                                             Social Data
                    Structured                                             Unstructured
                        Structured                  Enterprise       Exploratory
                    Repeatable                                             Exploratory
     Mainframe Data
                        Repeatable
                        Linear
                                                    Integration
                                                                      Iterative
                                                                           Iterative   Text Data: emails
                                  Linear
                Monthly sales reports                                          Brand sentiment
                 Profitability analysis                                        Product strategy
       OLTP System Data                                                                    Sensor data: images
                  Customer surveys                                             Maximum asset utilization


           ERP data               Traditional                         New                        RFID
                                   Sources                           Sources




10                                                                                                      © 2012 IBM Corporation
Agenda




      Big Data: “3 Vs” and myriad use cases
      Big Data: diverse workloads
      Big Data: emergence of the “Hadoop DW”




11                                              © 2012 IBM Corporation
Continuous Social Media Monitoring and Analytics




                       Data Set                        Information extracted
                       •   1.1B tweets                 •   Buzz and sentiment
                       •   5.7M blog and forum posts   •   Gender, Location and Occupation
                       •   3.5M relevant messages      •   Fans
                       •   97K referencing Product A   •   Intent to in purchase
                       •   18K referencing Product B   •   Specific attributes of products




12                                                                               © 2012 IBM Corporation
Content mining, natural language processing, & classification


      How it works                                        Unstructured text (document, email, etc)
       – Parses text and detects meaning with extractors
                                                           Football World Cup 2010, one team
       – Understands the context in which the text is
                                                           distinguished themselves well, losing to
         analyzed
                                                           the eventual champions 1-0 in the Final.
       – Hundreds of pre-built extractors for names,
         addresses, phone numbers, organizations,
                                                           Early in the second half, Netherlands’
         URL, Datetime, etc.                               striker, Arjen Robben, had a breakaway,
                                                           but the keeper for Spain, Iker Casillas
      Accuracy
                                                           made the save. Winger Andres Iniesta
       – Highly accurate in deriving meaning from          scored for Spain for the win.
         complex text



      Performance
       – AQL language optimized for MapReduce                         Classification and Insight
                                                             World Cup 2010 Highlights




13                                                                                            © 2012 IBM Corporation
Entity Extraction and Integration




14                                  © 2012 IBM Corporation
Statistical Analysis, Predictive Modeling, & Machine Learning

          Enables Machine learning (ML) on massive datasets
            R and Matlab-like syntax for smooth adoption
            Optimizations to generate low-level executions plans
            Out-of-box and write-your-own analytic algorithms, e.g. Regression, Clustering,
             Classification, Pattern Mining, Ranking, etc.
            Scale to massively parallel clusters from 10s to 1000s of machines and from
             Terabytes to Petabytes



     What are people
     talking about in social
     media about a
     product?




     15

15                                                                                      © 2012 IBM Corporation
Targeted E-Commerce and Next Best Action




16                                         © 2012 IBM Corporation
Predictive Complex Event Processing




17                                    © 2012 IBM Corporation
Intent and Sentiment Analysis

                      Online flow: Data-in-motion analysis

     Data Sources     Stream Computing and Analytics                                                Timely
                                                                                                   Decisions

                                                                      Entity        Predictive
                               Data Ingest       Text Analytics:     Analytics:     Analytics:
                                and Prep         Timely Insights      Profile         Action
                                                                     Resolution    Determination
                                                                                                   Dashboard




                      Hadoop System and Analytics

                                                                   Comprehensive
                                                      Entity
          Social Media and                                          Social Media    Predictive      Customer
                             Text Analytics       Analytics and
           Enterprise Data                                           Customer       Analytics        Models
                                                   Integration
                                                                      Profiles


                      Offline flow: Data-at-rest analysis                                           Reports




18                                                                                                  © 2012 IBM Corporation
Agenda




      Big Data: “3 Vs” and myriad use cases
      Big Data: diverse workloads
      Big Data: emergence of the “Hadoop DW”




19                                              © 2012 IBM Corporation
Big Data: DW & Hadoop are Married in Spirit



                                             Cloud-facing
                                             architectures
               models                         Massively
                        policies
          metadata aggregates                   parallel
      DQ MDM hubs             marts           processing
                           cubes
  ETL databases

              DW
                               views
    storage                                  In-database
                                   memory
 staging
          production cache in-database
                                               analytics
 nodes
          tables              analytics
                  operational
                  data stores
                                            Mixed workload
                                             management
                                            Hybrid storage
                                               layers


20                                                           © 2012 IBM Corporation
Hadoop is Core of Next-Gen Big Data DW


      Vendor-agnostic framework for
       massively parallel processing of
       advanced analytics against
       polystructured information
      Leverages extensible framework for
       building advanced analytics and data
       management functions
      Evolving rapidly in new directions
      Being commercialized and adopted
       rapidly in enterprises
      Vibrant open-source community and
       industry


21                                            © 2012 IBM Corporation
Hadoop, DW, and other Databases Co-Exist in Big Data
Ecosystem



              Hadoop &                                  In-memory
               NoSQL
                                   DW RDBMS
                                                         Columnar


                                                           OLAP



          Big Data staging,
              ETL, and         Big Data SVOT and    Big Data access
          preprocessing tier     governance tier   and interaction tier




22                                                                        © 2012 IBM Corporation
How Hadoop and DW Complement Each Other




23                                        © 2012 IBM Corporation
Single Version of Big Data: Where Hadoop DW Will Excel




     Monetizable intent to see a                               Monetizable intent to buy
     Kinda feel like going to movies tonight… Any              I need a new digital camera for my food pictures, and
     recommendations? @Texas Angelika Texas                    recommendations around 300?

     I don’t think anyone understands how much I like          What should I buy?? A mini laptop with Windows 7 OR a Apply
     watching movies. My 3rd trip to the threatre in 3 days.   MacBook!??!

                                                               Life Events
     Location announcements                                    College: Off to Standard for my MBA! Bbye chicago!
     I’m at Starbucks Parque Tezontle http://4sq.com/fYReSj
                                                               Looks like we’ll be moving to New Orleans sooner than I thought.
24                                                                                                                  © 2012 IBM Corporation
Hadoop DW Integration: What to Look For
                                                                           models
      Hadoop distro functional depth                                                 policies
                                                                       metadata aggregates
      EDW HDFS connector                                          DQ MDM hubs                marts
                                                                                         cubes
      Software, appliance, and cloud form factors for       ETL databases

       Hadoop offerings
                                                               storage
                                                            staging
                                                            nodes
                                                                           DW
                                                                     production
                                                                                               views
                                                                                                  memory
                                                                                   cache in-database
      Pluggable storage layer for Hadoop offerings                  tables
                                                                              operational
                                                                                            analytics

      Bundled data management and analytics                                  data stores

       offerings integrated with Hadoop solutions
      Modeling, management, acceleration, and
       optimization tools
      Real-time/low-latency capabilities integrated into
       Hadoop offerings
      Robust availability, security, and workload
       management tools integrated with Hadoop
       offerings
      And many more, focused on EDW-grade
       robustness, scalability, and flexibility!


25                                                                                          © 2012 IBM Corporation
Consider Big Data Platform Accelerators

                  Telecommunications                              Retail Customer
                  CDR streaming analytics                         Intelligence
                  Deep Network Analytics                          Customer Behavior and Lifetime
                                                                  Value Analysis


                  Finance                                         Social Media Analytics
                  Streaming options trading                       Sentiment Analytics, Intent to
                  Insurance and banking DW                        purchase
                  models


                  Public transportation                           Data mining
                  Real-time monitoring and                        Streaming statistical analysis
                  routing optimization




     Over 100 sample    User Defined          Standard Toolkits     Industry Data Models
       applications       Toolkits                                     Banking, Insurance, Telco,
                                                                          Healthcare, Retail
26                                                                                  © 2012 IBM Corporation
How Will You Do MDM on Your Hadoop DW?

     (A1) Unstructured Entity Integration (on BigInsights)
       – Complex analytics to populate master data set
       –   Text Analytics: Rule language (AQL) for extracting
           entities, events, relationships from text and html
           documents                                                          MDM DaaS
       –   Entity Integration: Rule language (HIL) to express &               Applications
           customize the integration, cleansing, and aggregation of            and Views
           the master entities
     (A2) Entity Repository (on MDM)
       – BigInsights Bridge: Generation of the MDM model for                                                   select cik, Officers, Directors
           public master entities, from the BigInsights model; and                                             from Company
           bulk-loading of master entities                                   Data services                     where name = ‘Citigroup’

       –   Query-based Application Development: Supports the
           generation of custom queries for individual applications
                                                                                                                                   Tooling based
                                                                                Queries                                            on entity model

                                                                      A2
 External data
 subscriptions
 (e.g., Acxiom)
                                                 A1                         Relational tables   SELECT *
                                                                                                FROM

                                                                              with master
                                                                                                (SELECT t2.CIK as CIK, t2.NAME as NAME, t2.IS_FORMER_OFFICER as IS_FORMER_OFFICER,
                                                                                                      t2.IS_IMPORTANT_OFFICER as IS_IMPORTANT_OFFICER, t2.POSITION_NAME as
                                                                                                POSITION_NAME,

                                            Text Analytics                       entities       FROM
                                                                                                      tp.EARLIEST_DATE as EARLIEST_DATE, tp.IS_EARLIEST_EXACT as IS_EARLIEST_EXACT,
                                                                                                      tp.LATEST_DATE as LATEST_DATE, tp.IS_LATEST_EXACT as IS_LATEST_EXACT


 External public data                             and                                            (SELECT t1.CIK as CIK, t1.NAME as NAME,t1.IS_FORMER_OFFICER as IS_FORMER_OFFICER,
                                                                                                             t1.IS_IMPORTANT_OFFICER as IS_IMPORTANT_OFFICER, p.NAME as POSITION_NAME,
                                                                                                             p.POSITIONSPK_ID as POSITIONSPK_ID
 sources                                   Entity Integration                                     FROM
                                                                                                    (SELECT o.CIK as CIK, o.NAME as NAME, o.IS_FORMER_OFFICER as IS_FORMER_OFFICER,
                                                                                                          o.IS_IMPORTANT_OFFICER as IS_IMPORTANT_OFFICER, o.OFFICERSPK_ID as OFFICERSPK_ID
                                                                                                     FROM DB2ADMIN.OFFICERS o
 (e.g., SEC/FDIC,                                                                                    WHERE o.OFFICER_OF = 567830643756635868
                                                                                                    ) as t1
 Twitter, Blogs,                              BigInsights                  InfoSphere MDM           left outer join DB2ADMIN.POSITIONS p on t1.OFFICERSPK_ID= p.POSITIONOF
                                                                                                  ) as t2

 Facebook)                                                                                        left outer join D2ADMIN.RANGEOFKNOWNDATES tp

                                                                           with Extensions      UNION
                                                                                                            on t2.POSITIONSPK_ID = tp.RANGE_OF_KNOWN_DATES_FOR_POS )

                                                                                                               // ( OUTER UNION)

                                                                                                …



27                                                                                                                                   © 2012 IBM Corporation
IBM Big Data Platform

New analytic applications drive the                       Analytic Applications
requirements for a big data platform          BI /    Exploration / Functional Industry Predictive Content
                                            Reporting Visualization   App        App
                                                                                                   BI /
                                                                                        Analytics Analytics
                                                                                                   Reporting



   • Integrate and manage the full          IBM Big Data Platform
     variety, velocity and volume of data
                                              Visualization         Application          Systems
   • Apply advanced analytics to              & Discovery          Development          Management
     information in its native form
   • Visualize all available data for ad-                             Accelerators
     hoc analysis
   • Development environment for                 Hadoop              Stream                Data
                                                 System             Computing            Warehouse
     building new analytic applications
   • Workload optimization and
     scheduling
   • Security and Governance                           Information Integration & Governance



                                                                                           © 2012 IBM Corporation
Thank You!




29                © 2012 IBM Corporation

Más contenido relacionado

La actualidad más candente

Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data WarehousingThomas Kejser
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxData Science London
 
IBM Storage Strategy in the Era of Smarter Computing
IBM Storage Strategy in the Era of Smarter ComputingIBM Storage Strategy in the Era of Smarter Computing
IBM Storage Strategy in the Era of Smarter ComputingTony Pearson
 
NextGen Infrastructure for Big Data
NextGen Infrastructure for Big DataNextGen Infrastructure for Big Data
NextGen Infrastructure for Big DataEd Dodds
 
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMETHE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMEGigaom
 
Big Data: A CIO’s Cut Out and Keep Guide
Big Data: A CIO’s Cut Out and Keep Guide Big Data: A CIO’s Cut Out and Keep Guide
Big Data: A CIO’s Cut Out and Keep Guide EMC
 
Embedded Analytics: The Next Mega-Wave of Innovation
Embedded Analytics: The Next Mega-Wave of InnovationEmbedded Analytics: The Next Mega-Wave of Innovation
Embedded Analytics: The Next Mega-Wave of InnovationInside Analysis
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Donghui Zhang
 
Agile Data Rationalization for Operational Intelligence
Agile Data Rationalization for Operational IntelligenceAgile Data Rationalization for Operational Intelligence
Agile Data Rationalization for Operational IntelligenceInside Analysis
 
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...Cloudera, Inc.
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Presentation dell - into the cloud with dell
Presentation   dell - into the cloud with dellPresentation   dell - into the cloud with dell
Presentation dell - into the cloud with dellxKinAnx
 
Progress with confidence into next generation IT
Progress with confidence into next generation ITProgress with confidence into next generation IT
Progress with confidence into next generation ITPaul Muller
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
The Big Picture: Big Data for the New Wave of Analytics
The Big Picture: Big Data for the New Wave of AnalyticsThe Big Picture: Big Data for the New Wave of Analytics
The Big Picture: Big Data for the New Wave of AnalyticsInside Analysis
 
Using Advanced Analytics for Data-Driven Decision Making
Using Advanced Analytics for Data-Driven Decision MakingUsing Advanced Analytics for Data-Driven Decision Making
Using Advanced Analytics for Data-Driven Decision MakingBooz Allen Hamilton
 
Hw09 Data Processing In The Enterprise
Hw09   Data Processing In The EnterpriseHw09   Data Processing In The Enterprise
Hw09 Data Processing In The EnterpriseCloudera, Inc.
 

La actualidad más candente (20)

Sybase IQ Big Data
Sybase IQ Big DataSybase IQ Big Data
Sybase IQ Big Data
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists Toolbox
 
IBM Storage Strategy in the Era of Smarter Computing
IBM Storage Strategy in the Era of Smarter ComputingIBM Storage Strategy in the Era of Smarter Computing
IBM Storage Strategy in the Era of Smarter Computing
 
NextGen Infrastructure for Big Data
NextGen Infrastructure for Big DataNextGen Infrastructure for Big Data
NextGen Infrastructure for Big Data
 
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMETHE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
 
Big Data: A CIO’s Cut Out and Keep Guide
Big Data: A CIO’s Cut Out and Keep Guide Big Data: A CIO’s Cut Out and Keep Guide
Big Data: A CIO’s Cut Out and Keep Guide
 
Embedded Analytics: The Next Mega-Wave of Innovation
Embedded Analytics: The Next Mega-Wave of InnovationEmbedded Analytics: The Next Mega-Wave of Innovation
Embedded Analytics: The Next Mega-Wave of Innovation
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017
 
Agile Data Rationalization for Operational Intelligence
Agile Data Rationalization for Operational IntelligenceAgile Data Rationalization for Operational Intelligence
Agile Data Rationalization for Operational Intelligence
 
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Presentation dell - into the cloud with dell
Presentation   dell - into the cloud with dellPresentation   dell - into the cloud with dell
Presentation dell - into the cloud with dell
 
Progress with confidence into next generation IT
Progress with confidence into next generation ITProgress with confidence into next generation IT
Progress with confidence into next generation IT
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
The Big Picture: Big Data for the New Wave of Analytics
The Big Picture: Big Data for the New Wave of AnalyticsThe Big Picture: Big Data for the New Wave of Analytics
The Big Picture: Big Data for the New Wave of Analytics
 
Using Advanced Analytics for Data-Driven Decision Making
Using Advanced Analytics for Data-Driven Decision MakingUsing Advanced Analytics for Data-Driven Decision Making
Using Advanced Analytics for Data-Driven Decision Making
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Hw09 Data Processing In The Enterprise
Hw09   Data Processing In The EnterpriseHw09   Data Processing In The Enterprise
Hw09 Data Processing In The Enterprise
 

Destacado

THE DATA WAREHOUSING DEBATE: TRADITIONAL vs. OPEN SOURCE from Structure:Data ...
THE DATA WAREHOUSING DEBATE: TRADITIONAL vs. OPEN SOURCE from Structure:Data ...THE DATA WAREHOUSING DEBATE: TRADITIONAL vs. OPEN SOURCE from Structure:Data ...
THE DATA WAREHOUSING DEBATE: TRADITIONAL vs. OPEN SOURCE from Structure:Data ...Gigaom
 
Social Media Bootcamp
Social Media BootcampSocial Media Bootcamp
Social Media Bootcampdavemerwin
 
Introductory Psychology Textbooks: The Roles of Online vs. Print and Open v...
Introductory Psychology Textbooks: The Roles of Online vs. Print and Open v...Introductory Psychology Textbooks: The Roles of Online vs. Print and Open v...
Introductory Psychology Textbooks: The Roles of Online vs. Print and Open v...Rajiv Jhangiani
 
Digital Sex: Pornography and the Pornification of Society
Digital Sex: Pornography and the Pornification of SocietyDigital Sex: Pornography and the Pornification of Society
Digital Sex: Pornography and the Pornification of SocietyColm Walsh
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packagesAjay Ohri
 
Offshoring 101 For Statisticians Techies And A
Offshoring 101 For Statisticians Techies And AOffshoring 101 For Statisticians Techies And A
Offshoring 101 For Statisticians Techies And AAjay Ohri
 
Analytics what to look for sustaining your growing business-
Analytics   what to look for sustaining your growing business-Analytics   what to look for sustaining your growing business-
Analytics what to look for sustaining your growing business-Ajay Ohri
 
Modeling science
Modeling scienceModeling science
Modeling scienceAjay Ohri
 
Summer School with DecisionStats brochure
Summer School with DecisionStats brochureSummer School with DecisionStats brochure
Summer School with DecisionStats brochureAjay Ohri
 
Introduction to sas
Introduction to sasIntroduction to sas
Introduction to sasAjay Ohri
 
Analyze this
Analyze thisAnalyze this
Analyze thisAjay Ohri
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big AnalyticsAjay Ohri
 

Destacado (20)

THE DATA WAREHOUSING DEBATE: TRADITIONAL vs. OPEN SOURCE from Structure:Data ...
THE DATA WAREHOUSING DEBATE: TRADITIONAL vs. OPEN SOURCE from Structure:Data ...THE DATA WAREHOUSING DEBATE: TRADITIONAL vs. OPEN SOURCE from Structure:Data ...
THE DATA WAREHOUSING DEBATE: TRADITIONAL vs. OPEN SOURCE from Structure:Data ...
 
Social Media Bootcamp
Social Media BootcampSocial Media Bootcamp
Social Media Bootcamp
 
Introductory Psychology Textbooks: The Roles of Online vs. Print and Open v...
Introductory Psychology Textbooks: The Roles of Online vs. Print and Open v...Introductory Psychology Textbooks: The Roles of Online vs. Print and Open v...
Introductory Psychology Textbooks: The Roles of Online vs. Print and Open v...
 
Digital Sex: Pornography and the Pornification of Society
Digital Sex: Pornography and the Pornification of SocietyDigital Sex: Pornography and the Pornification of Society
Digital Sex: Pornography and the Pornification of Society
 
R stata
R stataR stata
R stata
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 
About us ppt
About us pptAbout us ppt
About us ppt
 
Offshoring 101 For Statisticians Techies And A
Offshoring 101 For Statisticians Techies And AOffshoring 101 For Statisticians Techies And A
Offshoring 101 For Statisticians Techies And A
 
Easy R
Easy REasy R
Easy R
 
1 basics
1 basics1 basics
1 basics
 
Analytics what to look for sustaining your growing business-
Analytics   what to look for sustaining your growing business-Analytics   what to look for sustaining your growing business-
Analytics what to look for sustaining your growing business-
 
Modeling science
Modeling scienceModeling science
Modeling science
 
C1 t1,t2,t3,t4 complete
C1 t1,t2,t3,t4 completeC1 t1,t2,t3,t4 complete
C1 t1,t2,t3,t4 complete
 
Summer School with DecisionStats brochure
Summer School with DecisionStats brochureSummer School with DecisionStats brochure
Summer School with DecisionStats brochure
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Introduction to sas
Introduction to sasIntroduction to sas
Introduction to sas
 
Analyze this
Analyze thisAnalyze this
Analyze this
 
Big Data Usecases
Big Data UsecasesBig Data Usecases
Big Data Usecases
 
R basics
R basicsR basics
R basics
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 

Similar a The Marriage of Hadoop and Data Warehousing: How Big Data Platforms are Emerging

Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing DataWorks Summit
 
01 im overview high level
01 im overview high level01 im overview high level
01 im overview high levelJames Findlay
 
Analyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast DataAnalyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast DataEMC
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forumbigdatawf
 
Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案Etu Solution
 
IBM Big Data Platform, 2012
IBM Big Data Platform, 2012IBM Big Data Platform, 2012
IBM Big Data Platform, 2012Rob Thomas
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureOdinot Stanislas
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Robert LeBlanc - Why Big Data? Why Now?
Robert LeBlanc - Why Big Data? Why Now?Robert LeBlanc - Why Big Data? Why Now?
Robert LeBlanc - Why Big Data? Why Now?Mauricio Godoy
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureOdinot Stanislas
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantStuart Miniman
 
OSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - TechnicalOSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - TechnicalAccenture the Netherlands
 
Information Management and Analytics
Information Management and Analytics Information Management and Analytics
Information Management and Analytics AKAGroup
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big dataNathan Bijnens
 
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Calpont Corporation
 
Infochimps #1 Big Data Platform for the Cloud
Infochimps #1 Big Data Platform for the CloudInfochimps #1 Big Data Platform for the Cloud
Infochimps #1 Big Data Platform for the CloudBrian Krpec
 
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...Cloudera, Inc.
 

Similar a The Marriage of Hadoop and Data Warehousing: How Big Data Platforms are Emerging (20)

Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 
The New Enterprise Data Platform
The New Enterprise Data PlatformThe New Enterprise Data Platform
The New Enterprise Data Platform
 
01 im overview high level
01 im overview high level01 im overview high level
01 im overview high level
 
Analyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast DataAnalyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast Data
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forum
 
Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案
 
IBM Big Data Platform, 2012
IBM Big Data Platform, 2012IBM Big Data Platform, 2012
IBM Big Data Platform, 2012
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Robert LeBlanc - Why Big Data? Why Now?
Robert LeBlanc - Why Big Data? Why Now?Robert LeBlanc - Why Big Data? Why Now?
Robert LeBlanc - Why Big Data? Why Now?
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
 
OSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - TechnicalOSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - Technical
 
Information Management and Analytics
Information Management and Analytics Information Management and Analytics
Information Management and Analytics
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big data
 
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
 
Infochimps #1 Big Data Platform for the Cloud
Infochimps #1 Big Data Platform for the CloudInfochimps #1 Big Data Platform for the Cloud
Infochimps #1 Big Data Platform for the Cloud
 
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
 
Accelerate Return on Data
Accelerate Return on DataAccelerate Return on Data
Accelerate Return on Data
 
Secure Big Data Analytics - Hadoop & Intel
Secure Big Data Analytics - Hadoop & IntelSecure Big Data Analytics - Hadoop & Intel
Secure Big Data Analytics - Hadoop & Intel
 

Más de Ajay Ohri

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay OhriAjay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RAjay Ohri
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionAjay Ohri
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for freeAjay Ohri
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10Ajay Ohri
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri ResumeAjay Ohri
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...Ajay Ohri
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data ScienceAjay Ohri
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data ScientistsAjay Ohri
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in PythonAjay Ohri
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen OomsAjay Ohri
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsAjay Ohri
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha Ajay Ohri
 
Summer school python in spanish
Summer school python in spanishSummer school python in spanish
Summer school python in spanishAjay Ohri
 

Más de Ajay Ohri (20)

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 Election
 
Pyspark
PysparkPyspark
Pyspark
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for free
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
 
Tradecraft
Tradecraft   Tradecraft
Tradecraft
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data Scientists
 
Craps
CrapsCraps
Craps
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen Ooms
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports Analytics
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha
 
Summer school python in spanish
Summer school python in spanishSummer school python in spanish
Summer school python in spanish
 

Último

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Último (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

The Marriage of Hadoop and Data Warehousing: How Big Data Platforms are Emerging

  • 1. June 2012 IBM Big Data The Marriage of Hadoop and Data Warehousing James Kobielus Senior Program Director, Product Marketing, Big Data, IBM © 2012 IBM Corporation
  • 2. Hadoop and DW are fast being joined into a new platform paradigm: the Hadoop DW 2 © 2012 IBM Corporation
  • 3. Agenda  Big Data: “3 Vs” and myriad use cases  Big Data: diverse workloads  Big Data: emergence of the “Hadoop DW” 3 © 2012 IBM Corporation
  • 4. Agenda  Big Data: “3 Vs” and myriad use cases  Big Data: diverse workloads  Big Data: emergence of the “Hadoop DW” 4 © 2012 IBM Corporation
  • 5. Scalability Imperative: 3 “Vs” Drive Big Data Everywhere Information Radical Extreme from Everywhere Flexibility Scalability Volume Velocity Variety 5 12 terabytes of Tweets created daily 5 million trade events per second 100’s from surveillance cameras video feeds © 2012 IBM Corporation
  • 6. More Business Use Cases for Big Data Across Enterprise 6 © 2012 IBM Corporation
  • 7. More Mission-Critical Apps Ride on Big Data Platforms Advanced Analytic Applications  Integrate and manage the full variety, velocity and volume of data  Apply advanced analytics to information in its native form Big Data Platform  Visualize all available data for ad-hoc analysis Process and analyze any type of data and discovery Accelerators  Development environment for building new analytic applications  Integration and deploy applications with enterprise grade availability, manageability, security, and performance • Analyze data in motion • Visualization and • MapReduce / noSQL exploration • Machine Learning • Scalability • Text Analytics • Hardware • Text Search acceleration • Data Discovery • Stream computing 7 © 2012 IBM Corporation
  • 8. Big Data: Business Crucible for Practical Data Science Business and IT Identify Information Sources Available New insights IT Delivers a drive integration Platform that to traditional enables creative technology exploration of all available data and content Business determines what questions to ask by exploring the data and relationships 8 © 2012 IBM Corporation
  • 9. Big Data Initiatives: Fueled by Practical Data Science Analyze a Variety of Information Novel analytics on a broad set of mixed information that could not be analyzed before Analyze Information in Motion Streaming data analysis Large volume data bursts and ad-hoc analysis Analyze Extreme Volumes of Information Cost-efficiently process and analyze PBs of information Manage & analyze high volumes of structured, relational data Discover and Experiment Ad-hoc analytics, data discovery and experimentation Manage and Plan Enforce data structure, integrity and control to 9 ensure consistency for repeatable queries IBM Corporation © 2012
  • 10. Big Data: Marriage of Established & Emerging Approaches Established Approach Emerging Approaches Structured, analytical, logical Creative, holistic thought, intuition DW Hadoop, etc. Transaction Data Web Logs Internal App Data Social Data Structured Unstructured Structured Enterprise Exploratory Repeatable Exploratory Mainframe Data Repeatable Linear Integration Iterative Iterative Text Data: emails Linear Monthly sales reports Brand sentiment Profitability analysis Product strategy OLTP System Data Sensor data: images Customer surveys Maximum asset utilization ERP data Traditional New RFID Sources Sources 10 © 2012 IBM Corporation
  • 11. Agenda  Big Data: “3 Vs” and myriad use cases  Big Data: diverse workloads  Big Data: emergence of the “Hadoop DW” 11 © 2012 IBM Corporation
  • 12. Continuous Social Media Monitoring and Analytics Data Set Information extracted • 1.1B tweets • Buzz and sentiment • 5.7M blog and forum posts • Gender, Location and Occupation • 3.5M relevant messages • Fans • 97K referencing Product A • Intent to in purchase • 18K referencing Product B • Specific attributes of products 12 © 2012 IBM Corporation
  • 13. Content mining, natural language processing, & classification  How it works Unstructured text (document, email, etc) – Parses text and detects meaning with extractors Football World Cup 2010, one team – Understands the context in which the text is distinguished themselves well, losing to analyzed the eventual champions 1-0 in the Final. – Hundreds of pre-built extractors for names, addresses, phone numbers, organizations, Early in the second half, Netherlands’ URL, Datetime, etc. striker, Arjen Robben, had a breakaway, but the keeper for Spain, Iker Casillas  Accuracy made the save. Winger Andres Iniesta – Highly accurate in deriving meaning from scored for Spain for the win. complex text  Performance – AQL language optimized for MapReduce Classification and Insight World Cup 2010 Highlights 13 © 2012 IBM Corporation
  • 14. Entity Extraction and Integration 14 © 2012 IBM Corporation
  • 15. Statistical Analysis, Predictive Modeling, & Machine Learning Enables Machine learning (ML) on massive datasets  R and Matlab-like syntax for smooth adoption  Optimizations to generate low-level executions plans  Out-of-box and write-your-own analytic algorithms, e.g. Regression, Clustering, Classification, Pattern Mining, Ranking, etc.  Scale to massively parallel clusters from 10s to 1000s of machines and from Terabytes to Petabytes What are people talking about in social media about a product? 15 15 © 2012 IBM Corporation
  • 16. Targeted E-Commerce and Next Best Action 16 © 2012 IBM Corporation
  • 17. Predictive Complex Event Processing 17 © 2012 IBM Corporation
  • 18. Intent and Sentiment Analysis Online flow: Data-in-motion analysis Data Sources Stream Computing and Analytics Timely Decisions Entity Predictive Data Ingest Text Analytics: Analytics: Analytics: and Prep Timely Insights Profile Action Resolution Determination Dashboard Hadoop System and Analytics Comprehensive Entity Social Media and Social Media Predictive Customer Text Analytics Analytics and Enterprise Data Customer Analytics Models Integration Profiles Offline flow: Data-at-rest analysis Reports 18 © 2012 IBM Corporation
  • 19. Agenda  Big Data: “3 Vs” and myriad use cases  Big Data: diverse workloads  Big Data: emergence of the “Hadoop DW” 19 © 2012 IBM Corporation
  • 20. Big Data: DW & Hadoop are Married in Spirit Cloud-facing architectures models Massively policies metadata aggregates parallel DQ MDM hubs marts processing cubes ETL databases DW views storage In-database memory staging production cache in-database analytics nodes tables analytics operational data stores Mixed workload management Hybrid storage layers 20 © 2012 IBM Corporation
  • 21. Hadoop is Core of Next-Gen Big Data DW  Vendor-agnostic framework for massively parallel processing of advanced analytics against polystructured information  Leverages extensible framework for building advanced analytics and data management functions  Evolving rapidly in new directions  Being commercialized and adopted rapidly in enterprises  Vibrant open-source community and industry 21 © 2012 IBM Corporation
  • 22. Hadoop, DW, and other Databases Co-Exist in Big Data Ecosystem Hadoop & In-memory NoSQL DW RDBMS Columnar OLAP Big Data staging, ETL, and Big Data SVOT and Big Data access preprocessing tier governance tier and interaction tier 22 © 2012 IBM Corporation
  • 23. How Hadoop and DW Complement Each Other 23 © 2012 IBM Corporation
  • 24. Single Version of Big Data: Where Hadoop DW Will Excel Monetizable intent to see a Monetizable intent to buy Kinda feel like going to movies tonight… Any I need a new digital camera for my food pictures, and recommendations? @Texas Angelika Texas recommendations around 300? I don’t think anyone understands how much I like What should I buy?? A mini laptop with Windows 7 OR a Apply watching movies. My 3rd trip to the threatre in 3 days. MacBook!??! Life Events Location announcements College: Off to Standard for my MBA! Bbye chicago! I’m at Starbucks Parque Tezontle http://4sq.com/fYReSj Looks like we’ll be moving to New Orleans sooner than I thought. 24 © 2012 IBM Corporation
  • 25. Hadoop DW Integration: What to Look For models  Hadoop distro functional depth policies metadata aggregates  EDW HDFS connector DQ MDM hubs marts cubes  Software, appliance, and cloud form factors for ETL databases Hadoop offerings storage staging nodes DW production views memory cache in-database  Pluggable storage layer for Hadoop offerings tables operational analytics  Bundled data management and analytics data stores offerings integrated with Hadoop solutions  Modeling, management, acceleration, and optimization tools  Real-time/low-latency capabilities integrated into Hadoop offerings  Robust availability, security, and workload management tools integrated with Hadoop offerings  And many more, focused on EDW-grade robustness, scalability, and flexibility! 25 © 2012 IBM Corporation
  • 26. Consider Big Data Platform Accelerators Telecommunications Retail Customer CDR streaming analytics Intelligence Deep Network Analytics Customer Behavior and Lifetime Value Analysis Finance Social Media Analytics Streaming options trading Sentiment Analytics, Intent to Insurance and banking DW purchase models Public transportation Data mining Real-time monitoring and Streaming statistical analysis routing optimization Over 100 sample User Defined Standard Toolkits Industry Data Models applications Toolkits Banking, Insurance, Telco, Healthcare, Retail 26 © 2012 IBM Corporation
  • 27. How Will You Do MDM on Your Hadoop DW? (A1) Unstructured Entity Integration (on BigInsights) – Complex analytics to populate master data set – Text Analytics: Rule language (AQL) for extracting entities, events, relationships from text and html documents MDM DaaS – Entity Integration: Rule language (HIL) to express & Applications customize the integration, cleansing, and aggregation of and Views the master entities (A2) Entity Repository (on MDM) – BigInsights Bridge: Generation of the MDM model for select cik, Officers, Directors public master entities, from the BigInsights model; and from Company bulk-loading of master entities Data services where name = ‘Citigroup’ – Query-based Application Development: Supports the generation of custom queries for individual applications Tooling based Queries on entity model A2 External data subscriptions (e.g., Acxiom) A1 Relational tables SELECT * FROM with master (SELECT t2.CIK as CIK, t2.NAME as NAME, t2.IS_FORMER_OFFICER as IS_FORMER_OFFICER, t2.IS_IMPORTANT_OFFICER as IS_IMPORTANT_OFFICER, t2.POSITION_NAME as POSITION_NAME, Text Analytics entities FROM tp.EARLIEST_DATE as EARLIEST_DATE, tp.IS_EARLIEST_EXACT as IS_EARLIEST_EXACT, tp.LATEST_DATE as LATEST_DATE, tp.IS_LATEST_EXACT as IS_LATEST_EXACT External public data and (SELECT t1.CIK as CIK, t1.NAME as NAME,t1.IS_FORMER_OFFICER as IS_FORMER_OFFICER, t1.IS_IMPORTANT_OFFICER as IS_IMPORTANT_OFFICER, p.NAME as POSITION_NAME, p.POSITIONSPK_ID as POSITIONSPK_ID sources Entity Integration FROM (SELECT o.CIK as CIK, o.NAME as NAME, o.IS_FORMER_OFFICER as IS_FORMER_OFFICER, o.IS_IMPORTANT_OFFICER as IS_IMPORTANT_OFFICER, o.OFFICERSPK_ID as OFFICERSPK_ID FROM DB2ADMIN.OFFICERS o (e.g., SEC/FDIC, WHERE o.OFFICER_OF = 567830643756635868 ) as t1 Twitter, Blogs, BigInsights InfoSphere MDM left outer join DB2ADMIN.POSITIONS p on t1.OFFICERSPK_ID= p.POSITIONOF ) as t2 Facebook) left outer join D2ADMIN.RANGEOFKNOWNDATES tp with Extensions UNION on t2.POSITIONSPK_ID = tp.RANGE_OF_KNOWN_DATES_FOR_POS ) // ( OUTER UNION) … 27 © 2012 IBM Corporation
  • 28. IBM Big Data Platform New analytic applications drive the Analytic Applications requirements for a big data platform BI / Exploration / Functional Industry Predictive Content Reporting Visualization App App BI / Analytics Analytics Reporting • Integrate and manage the full IBM Big Data Platform variety, velocity and volume of data Visualization Application Systems • Apply advanced analytics to & Discovery Development Management information in its native form • Visualize all available data for ad- Accelerators hoc analysis • Development environment for Hadoop Stream Data System Computing Warehouse building new analytic applications • Workload optimization and scheduling • Security and Governance Information Integration & Governance © 2012 IBM Corporation
  • 29. Thank You! 29 © 2012 IBM Corporation

Notas del editor

  1. Now if you recall, I talked about the EDW not going away and the Big Data system working with it. Just a couple of slides ago I talked about the IBM Big Data platform and I included commentary about IBM Information Server for integration and that’s what this slide is showing here. We know that we are now faced with two complementary analytical approaches – we have this traditional approach, we have this new approach – and when we bring these together, we need some help to figure out a way to get from the left sphere to the right sphere and that’s going to be enterprise integration. So IBM provides that; for example IIS has readers for HDFS and natively within DB2 is a UDF that can call a MapReduce program, and more. If you look at this slide, you can see that if you live in the SQL world, you can talk to the Big Data world, and vice versa.
  2. Key Points IBM research developed a sophisticated text analytics engine – similar technology to what was demonstrated in Watson Its purpose is to identify meaning within text We have pre-built 100s of rules (annotators) that understand textual meaning – names (e.g., what is a first name v a last name), addresses (what is a street, apartment) among others. The annotators are context sensitive and discover the relationship between terms even if they are separate by text – for example, it discovers that Iker Casillas is a “keeper” even though the phrase “for Spain” is in between them Accuracy – our text analytics engine is very accurate and we’ve done testing that indicates it is 2-3x more accurate than some alternatives It is also highly performant – it is designed for use in Big Data and map reduce parallel processing
  3. Confidential IBM/Expedia 9/13/11 Confidential IBM/Expedia 9/13/11
  4. Key Points - Integrate v3 – the point is to have one platform to manage all of the data – there’s no point in having separate silos of data, each creating separate silos of insight. From the customer POV (a solution POV) big data has to be bigger than just one technology Analyze v3 – very important point – we see big data as a viable place to analyze and store data. New technology is not just a pre-processor to get data into a structured DW for analysis. Significant area of value add by IBM – and the game has changed – unlike DBs/SQL, the market is asking who gets the better answer and therefore sophistication and accuracy of the analytics matters Visualization – need to bring big data to the users – spreadsheet metaphor is the key to doing son Development – need sophisticated development tools for the engines and across them to enable the market to develop analytic applications Workload optimization – improvements upon open source for efficient processing and storage Security and Governance – many are rushing into big data like the wild west. But there is sensitive data that needs to be protected, retention policies need to be determined – all of the maturity of governance for the structured world can benefit the big data world