SlideShare una empresa de Scribd logo
1 de 81
Kai Wähner
   YOU ARE NOT FACEBOOK OR GOOGLE?
WHY YOU SHOULD STILL CARE ABOUT BIG DATA
Kai Wähner

                                                                                                     Main Tasks
                                                                                          Requirements Engineering
                                                                                 Enterprise Architecture Management
                                                                                      Business Process Management
                                                                         Architecture and Development of Applications
                                                                                        Service-oriented Architecture
                                                                                    Integration of Legacy Applications
                                                                                                   Cloud Computing
                                                                                                          Big Data


                                                                                                         Contact
                Consulting                                                            Email: kontakt@kai-waehner.de
                Developing
                                                                                     Blog: www.kai-waehner.de/blog
                 Coaching
                 Speaking                                                                      Twitter: @KaiWaehner
                  Writing                                                             Social Networks: Xing, LinkedIn

© Talend 2013       “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Key messages




You have to care about big data to be competitive in the future!
Start your big data projects business driven!
Big data from a technical perspective is no (longer) rocket science!

© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Agenda

            • Big data paradigm shift
            • Use cases for SMEs                                                             Why?
            • Challenges of big data
            • Technology perspective
            • Getting started                                                               How?
            • Live demo




© Talend 2013      “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Agenda

            • Big data paradigm shift
            • Use cases for SMEs
            • Challenges of big data
            • Technology perspective
            • Getting started
            • Live demo




© Talend 2013      “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Why you should care about big data?




        “If you can't measure it,
          you can't manage it.”

                                                                                                   William Edwards Deming
                                                                                                          (1900 –1993)
                                                                                                 American statistician, professor,
                                                                                                 author, lecturer and consultant



© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Why you should care about big data?



      „Silence the HiPPOs“ (highest-paid person‘s opinion)
      Being able to interpret unimaginable large data
       stream, the gut feeling is no longer justified!




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Data should drive decisions (again)!




    “Accessing data is now [again] the critical path in making good decisions!”

                http://www.inforbix.com/friday-data-stories-big-data-driven-decision-making/


© Talend 2013        “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
What big data NEED NOT to be?




         Big data NEED NOT to be analytics (data warehouse, data
         marts)

         Big data NEED NOT to be big volume!

         Big data NEED NOT to be expensive (open source, commodity
         hardware)!
© Talend 2013       “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
What is big data?

                                                       Changing Scale




  Changing Expectations



                                                                                                             Cloud



         Changing Interactions
© Talend 2013     “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
What is big data? The Vs of big data


                  Volume
                (terabytes,
                                                           Velocity
                                                      (realtime or near-
                                                                                                                Value
                petabytes)                               realtime)




                             Variety
                        (social networks,
                         blog posts, logs,
                          sensors, etc.)


© Talend 2013        “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Big data tasks to solve
  Big Data Integration
           – Land data in a Big Data cluster
           – Implement or generate parallel processes


 Big Data Manipulation
           – Simplify manipulation, such as sort and filter
           – Computational expensive functions



  Big Data Quality & Governance
           – Identify linkages and duplicates, validate big data
           – Match component, execute basic quality features



  Big Data Project Management
           – Place frameworks around big data projects
           – Common Repository, scheduling, monitoring

© Talend 2013             “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Big data for SMEs


     ➜     Despite popular belief, there are many benefits for companies of any size, to
           maximizing the opportunities from regular, full and thorough analysis of big
           data insight. Big data can give firms a competitive advantage, lead to better
           client understanding, improve business operations and enhance the
           accessibility of their brand.
     ➜     Arguably, the process of collecting data from target markets is even more
           important for smaller firms than for larger enterprises. SMEs and start-ups
           do not always have the security of a widespread following nor guaranteed
           business revenues and therefore, should be making the most of the
           information the web provides in the form of big data sets.

                http://www.sentronex.com/2012/11/big-data-and-smes




© Talend 2013                  “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Agenda

            • Big data paradigm shift
            • Use cases for SMEs
            • Challenges of big data
            • Technology perspective
            • Getting started
            • Live demo




© Talend 2013      “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Use cases for SMEs

     ➜ Replacing ETL Jobs
     ➜ Forecast

     ➜ Logistics

     ➜ Fraud Protection

     ➜ Storage




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Replacing ETL jobs




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Replacing ETL jobs




  http://www.itbusinessedge.com/blogs/integration/
  better-data-integration-with-hadoop-its-possible.html




                                                                                         http://www.enterpriseirregulars.com/33687/hadoop-is-many-things-
                                                                                         including-the-ideal-etl-tool-for-big-data-analytics/

© Talend 2013                    “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Replacing ETL jobs: Text files




      “The advantage of their new system is that they can now look at their data
      [from their log processing system] in anyway they want:
      ➜ Nightly MapReduce jobs collect statistics about their mail system such as
          spam counts by domain, bytes transferred and number of logins.
      ➜ When they wanted to find out which part of the world their customers
          logged in from, a quick [ad hoc] MapReduce job was created and they had
          the answer within a few hours. Not really possible in your typical ETL
          system.”

      http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data


© Talend 2013              “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Replacing ETL jobs: Binary files




                http://www.slideshare.net/brocknoland/common-and-unique-use-cases-for-apache-hadoop



© Talend 2013        “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Forecast

  “Forecasting is the process of making statements about
  events whose actual outcomes (typically) have not yet
  been observed.”
  [Wikipedia]




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Forecast: Recommendation engine




                          Is there no use case for your business
                           (shop, community, items, etc.) ???
© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Forecast: Preferred digital media



     Deutsche Welle is Germany's international broadcaster. The service is aimed at
     the overseas market. It broadcasts news and information on shortwave,
     internet and satellite radio in 30 languages (DW Radio). It has a satellite
     television service (DW TV), which is available in four languages, and there is
     also an online news site.

     Deutsche Welle collects all usage data of over 50.000 videos at platforms such
     as YouTube, Dailymotion, Sevenload or MyVideo hourly. Until today, over
     500.000.000 data records were imported, aggregated and analyzed to improve
     content for future broadcasts.

     http://www.unbelievable-machine.com/um/2012/09/20/case-study-von-deutsche-welle-und-um-im-bitkom-big-data-leitfaden/

© Talend 2013              “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Forecast: Risk management

     Flu Epidemie before it starts
       thanks to Google Flu Trends

     Cholera in Haiti in 2010:
       Early information due to
       analyzing Twitter messages

        Research: John Hopkins School of Medicine




                                                                                          http://www.crn.com/slide-shows/applications-os/240147224/
                                                                                          8-ways-big-data-will-change-our-lives.htm?pgno=3

© Talend 2013                “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Forecast: Risk management

                                                                                               Deduce
                                                                                              Customer
                                                                                              Defections




                         http://hkotadia.com/archives/5021




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Forecast: Risk management




   ➜     Real estate bubble
   ➜     So if your business has to do anything with real estates, then
         this information is really important ASAP, right?
   ➜     “The Future of Prediction: How Google Searches Foreshadow
         Housing Prices and Sales”
                      http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2022293

© Talend 2013      “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Logistics




     “Logistics is the management of the flow of resources
     between the point of origin and the point of destination in
     order to meet some requirements.”
                                                                                                           Wikipedia

© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Logistics: Time planning



➜   Use case: Guessing Aiport Flight arrival times

➜   Problem: Every minute of a delay costs a lot of money (crew,
    parking, delays of other machines, etc.)

➜   Solution: Computation of flight arrival with public and private
    data (weather forecast, flight plans, own radar stations,
    statistics of past flights

       Andrew Mcafee, Erik BRYNJOLFSSON, Harvard Business Manager November 2012

© Talend 2013              “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Logistics: Resource management




                http://www.blue-yonder.com/en/customers/overview/dm-drogerie-markt-en.html

© Talend 2013                 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Logistics: Flexible pricing




   ➜      With revenue of almost USD 30 billion and a network of
          800 locations, Macy's is considered the largest store operator in the
          USA
   ➜      Daily price check analysis of its 10,000 articles in less than two hours
   ➜      Whenever a neighboring competitor anywhere between New York
          and Los Angeles goes for aggressive price reductions, Macy's follows
          its example
   ➜      If there is no market competitor, the prices remain unchanged
     http://www.t-systems.com/about-t-systems/examples-of-successes-companies-analyze-big-data-in-record-time-l-t-systems/1029702



© Talend 2013               “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Logistics: Flexible pricing




                                                                                     http://hkotadia.com/archives/5021

© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Fraud detection




                                                               “Fraud is a million dollar business
                                                               and it is increasing every year. The
                                                               PwC global economic crime
                                                               survey suggests that close to 30%
                                                               of companies worldwide have
                                                               reported being victims of fraud in
                                                               the past year.”
                                                                            http://www.pwc.com/gx/en/economic-crime-survey




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Fraud detection: Anti-counterfeiting




     Solution: A scalable process to automatically monitor the internet
     for any mention of the company’s brand name products.
     Websites, message boards, trade boards, and social media are
     automatically monitored.
       http://www.brightplanet.com/2013/01/pharmaceutical-fraud-and-counterfeiting-finding-a-deep-web-solution/

© Talend 2013               “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Fraud detection: Fraud mining




       Content manipulation
       at a traveling portal –
      Which customer reviews
          are trustworthy?

   Jean-Paul Schmetz, Harvard Business Manager, November 2012

© Talend 2013              “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Storage

     “being stored somewhere until needed”
     http://www.macmillandictionary.com




© Talend 2013              “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Storage: Compliance


     Global Parcel Service

   ➜ A lot of data must be stored „forever“
   ➜ Numbers increase exponentially

   ➜ Goal: As cheap as possible

   ➜ Problem: (Fast) queries must still be possible

   ➜ Solution: Commodity servers and „Hadoop querying“

   http://archive.org/stream/BigDataImPraxiseinsatz-SzenarienBeispieleEffekte/Big_Data_BITKOM-Leitfaden_Sept.2012#page/n0/mode/2up



© Talend 2013              “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Storage: Revenue creation




                                                 http://wikibon.org/wiki/v/Case_Study:_PhotoBox_and_Big-Data
© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Gaming: Analyze players




         All this data work enables Riot to ask questions and receive meaningful answers. Which game
         champions (or the higher-scoring players) and skins (character costumes) are popular in
         particular geographic regions? What are the win rates of champions? “We had lots of
         unexpected results when we first started doing this analysis,” Livingston said. “One of the
         benefits of having all this data is we can be more scientific about it, and we can now check
         everything.”
                http://slashdot.org/topic/bi/for-riot-games-big-data-is-serious-business/
© Talend 2013                 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Agenda

            • Big data paradigm shift
            • Use cases for SMEs
            • Challenges of big data
            • Technology perspective
            • Getting started
            • Live demo




© Talend 2013      “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Limited big data experts




                                                                       This is your
                                                                        company
Big Data Geek
 © Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Big data tool selection (technical perspective)


                Looking for ‚your‘ required big data product?
                             Support your data from scratch?
                                                Good luck! 




© Talend 2013     “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Big data tool selection (business perspective)

     ➜ Wanna buy a big data solution for your industry?
     ➜ Maybe a competitor has a big data solution which
       adds business value?
     ➜ The competitor will never publish it (rat-race)!




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Data quality




          Big Data + Poor Data Quality = Big Problems

© Talend 2013    “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
How to solve these big data challenges?




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Be no expert! Be simple!

   “[Often] simple models and
     big data trump more-elaborate
     [and complex] analytics approaches”

   “Often someone coming from
     outside an industry can spot
     a better way to use big data
     than an insider”


     Erik Brynjolfsson / Lynn Wu
     http://alfredopassos.tumblr.com/post/32461599327/big-data-the-management-revolution-by-andrew-mcafee

© Talend 2013              “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Be creative!


                                                           Look at use cases of others
                                                            (SMU, but also large companies)

                                                           How can you do something similar
                                                            with your data?

                                                           You have different data sources?
                                                            Use it! Combine it! Play with it!


© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
What is your Big Data process?



                Step 1                                                Step 2                                            Step 3




  1) Do not begin with the data, think about business opportunities
  2) Choose the right data (combine different data sources)
  3) Use easy tooling

           http://hbr.org/2012/10/making-advanced-analytics-work-for-you



© Talend 2013                “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Agenda

            • Big data paradigm shift
            • Use cases for SMEs
            • Challenges of Big data
            • Technology perspective
            • Getting started
            • Live demo




© Talend 2013      “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Technology perspective




         How to process big data?
© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
How to process big data?




     The critical flaw in parallel ETL tools is the fact that the data is almost never local to the processing
     nodes. This means that every time a large job is run, the data has to first be read from the source,
     split N ways and then delivered to the individual nodes. Worse, if the partition key of the source
     doesn’t match the partition key of the target, data has to be constantly exchanged among the
     nodes. In essence, parallel ETL treats the network as if it were a physical I/O subsystem. The
     network, which is always the slowest part of the process, becomes the weakest link in the
     performance chain.


     http://blog.syncsort.com/2012/08/parallel-etl-tools-are-dead



© Talend 2013                “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
How to process big data?




        Slides: http://www.slideshare.net/pavlobaron/100-big-data-0-hadoop-0-java

        Video: http://www.infoq.com/presentations/Big-Data-Hadoop-Java



© Talend 2013             “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
How to process big data?




                The defacto standard for big data processing




© Talend 2013        “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
How to process big data?



                                                                                            “A big part of [the
                                                                                            company’s strategy]
                                                                                            includes wiring SQL Server
                                                                                            2012 (formerly known by
                                                                                            the codename “Denali”) to
                                                                                            the Hadoop distributed
                                                                                            computing platform, and
                                                                                            bringing Hadoop to
                                                                                            Windows Server and Azure”



Even Microsoft (the .NET house) relies on Hadoop since 2011
© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
What is Hadoop?




     Apache Hadoop, an open-source software library, is a
     framework that allows for the distributed processing of
     large data sets across clusters of commodity hardware
     using simple programming models. It is designed to scale
     up from single servers to thousands of machines, each
     offering local computation and storage.

© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Hadoop architecture




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Map (Shuffle) Reduce

Simple example
• Input: (very large) text files with lists of strings, such as:
    „318, 0043012650999991949032412004...0500001N9+01111+99999999999...“
• We are interested just in some content: year and temperate (marked in red)
• The Map Reduce function has to compute the maximum temperature for every year




                       Example from the book “Hadoop: The Definitive Guide, 3rd Edition”



© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Map (Shuffle) Reduce
     MapReduce is a distributed data processing model and execution environment that
     runs on large clusters of commodity machines. MapReduce is a programming model
     for data processing. The model is simple, yet not too simple to express useful
     programs in.

     A MapReduce job is a unit of work that the client wants to be performed: it consists of
     the input data, the MapReduce program, and configuration information. Hadoop runs
     the job by dividing it into tasks, of which there are two types: map tasks and reduce
     tasks.

     There are two types of nodes that control the job execution process: a jobtracker and
     a number of tasktrackers. The jobtracker coordinates all the jobs run on the system by
     scheduling tasks to run on tasktrackers. Tasktrackers run tasks and send progress
     reports to the jobtracker, which keeps a record of the overall progress of each job. If a
     task fails, the jobtracker can reschedule it on a different tasktracker.

     Hadoop divides the input to a MapReduce job into fixed-size pieces called input splits,
     or just splits. Hadoop creates one map task for each split, which runs the user defined
     map function for each record in the split.
© Talend 2013         “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
How to process big data?




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Agenda

            • Big data paradigm shift
            • Use cases for SMEs
            • Challenges of Big data
            • Technology perspective
            • Getting started
            • Live demo




© Talend 2013      “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Hadoop alternatives




                              Hadoop                                                        Big Data Suite
        Apache              Distribution
        Hadoop
                                                                                                                    Features
                                                                                                                    included
     few                                                                                                     many




© Talend 2013    “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Hadoop alternatives




                              Hadoop                                                        Big Data Suite
        Apache              Distribution
        Hadoop
                                                                                                                    Features
                                                                                                                    included
     few                                                                                                     many




    MapReduce
      HDFS
    Ecosystem
© Talend 2013    “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Apache Hadoop (MapReduce code)




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Genealogy of elephants




                                                                                                                          A little bit outdated
                                                                                                                        (from February 2012),
                                                                                                                          but you understand
                                                                                                                             the problem ?!




                https://blogs.apache.org/bigtop/entry/all_you_wanted_to_know


© Talend 2013                “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Hadoop alternatives




                              Hadoop                                                        Big Data Suite
        Apache              Distribution
        Hadoop
                                                                                                                    Features
                                                                                                                    included
     few                                                                                                     many




    MapReduce                 Packaging
      HDFS
    Ecosystem
                 +        Deployment-Tooling
                               Support
© Talend 2013    “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Hadoop distributions




                                                                                            (… some more arising)

© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Hortonworks ecosystem




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Cloudera Manager




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
MapR – the different alternative


                                                                                           Replaced (Apache) HDFS
                                                                                           with own implementation



     MapR Direct Access NFS makes Hadoop radically easier and less expensive to use. Unlike
     the write-once system found in other Hadoop distributions, MapR allows files to be
     modified, overwritten, and enables multiple concurrent reads and writes on any file.

     Users can simply browse files, automatically open associated applications with a mouse
     click, or drag-and-drop files and directories into and out of the cluster. Additionally,
     standard command-line tools and UNIX applications and utilities (such as grep, tar, sort,
     or tail) can be used directly on data in the cluster. With other Hadoop distributions, the
     user must copy the data out of the cluster in order to use standard tools.

© Talend 2013        “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Hadoop alternatives




                              Hadoop                                                        Big Data Suite
        Apache              Distribution
        Hadoop
                                                                                                                    Features
                                                                                                                    included
     few                                                                                                     many




                              Packaging                                                    Tooling / Modeling
    MapReduce
      HDFS
    Ecosystem
                 +        Deployment-Tooling
                               Support
                                                                              +             Code Generation
                                                                                              Scheduling
                                                                                               Integration
© Talend 2013    “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Big data suites




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Example: Talend Open Studio for Big Data




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Trying to get from this…




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
to this…




      No coding…

   … use a modeler and code generator instead!




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Hadoop Alternatives

            • Big data paradigm shift
            • Use cases for SMEs
            • Challenges of Big data
            • Technology perspective
            • Getting started
            • Live demo




© Talend 2013      “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Vision: Democratize big data

                                                                   Talend Open Studio for Big Data

                                                                   •      Improves efficiency of big data job design with
                                                                          graphic interface


                                                                   •      Generates Hadoop code and run transforms
                                                                          inside Hadoop

                                  Pig
                                                                   •      Native support for HDFS, Pig, Hbase, Hcatalog,
                                                                          Sqoop and Hive


                                                                   •      100% open source under an Apache License

                …an open source
                                                                   •      Standards based
                  ecosystem
© Talend 2013          “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Vision: Democratize big data

                                                                   Talend Platform for Big Data

                                                                   •      Builds on Talend Open Studio for Big Data


                                                                   •      Adds data quality, advanced scalability and
                                                                          management functions
                                                                            •      MapReduce massively parallel data
                                                                                   processing
                                  Pig                                       •      Shared Repository and remote deployment
                                                                            •      Data quality and profiling
                                                                            •      Data cleansing
                                                                            •      Reporting and dashboards

                …an open source
                                                                   •      Commercial support, warranty/IP indemnity
                  ecosystem                                               under a subscription license
© Talend 2013          “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Live demo




     „Talend Open Studio for Big Data“ in action...
© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
How to choose the right tooling?




                                Hadoop                                                        Big Data Suite
        Apache                Distribution
        Hadoop
                                                                                                                      Features
                                                                                                                      included
     few                                                                                                       many


        Simplicity (lightweight, setup, coding, deployment, usage)
        Prevalence (open standards, community, extendibility)
        Features (Hadoop ecosystem, connectors, monitoring)
        Pitfalls (data-driven costs, native Hadoop code, just ETL or more)

© Talend 2013      “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Did you get the key message?




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Key messages




You have to care about big data to be competitive in the future!
Start your big data projects business driven!
Big data from a technical perspective is no (longer) rocket science!

© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Did you get the key message?




© Talend 2013   “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
Thank you for your attention.
                Questions?

               KAI WÄHNER
          kwaehner@talend.com
            www.kai-waehner.de
                LinkedIn / Xing
                 @KaiWaehner

Más contenido relacionado

La actualidad más candente

Snowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySnowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySamanthaBerlant
 
Enabling digital business with governed data lake
Enabling digital business with governed data lakeEnabling digital business with governed data lake
Enabling digital business with governed data lakeKaran Sachdeva
 
Best Practices for Building a Warehouse Quickly
Best Practices for Building a Warehouse QuicklyBest Practices for Building a Warehouse Quickly
Best Practices for Building a Warehouse QuicklyWhereScape
 
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and TableauAnalyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and TableauDATAVERSITY
 
IBM Governed Data Lake
IBM Governed Data LakeIBM Governed Data Lake
IBM Governed Data LakeKaran Sachdeva
 
Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA Kai Wähner
 
When you need more data in less time...
When you need more data in less time...When you need more data in less time...
When you need more data in less time...Bálint Horváth
 
Tools and techniques for predictive analytics
Tools and techniques for predictive analyticsTools and techniques for predictive analytics
Tools and techniques for predictive analyticsRohanKumarJumnani
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Webinar: Customer Experience in Banking - a CTO's Perspective
Webinar: Customer Experience in Banking - a CTO's PerspectiveWebinar: Customer Experience in Banking - a CTO's Perspective
Webinar: Customer Experience in Banking - a CTO's PerspectiveDataStax
 
Big Data Platform and Architecture Recommendation
Big Data Platform and Architecture RecommendationBig Data Platform and Architecture Recommendation
Big Data Platform and Architecture RecommendationSofyan Hadi AHmad
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudDataStax
 
Advance Data Visualization and Storytelling Virtual Workshop
Advance Data Visualization and Storytelling Virtual WorkshopAdvance Data Visualization and Storytelling Virtual Workshop
Advance Data Visualization and Storytelling Virtual WorkshopCCG
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...Capgemini
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform DATAVERSITY
 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeCloudera, Inc.
 

La actualidad más candente (20)

Snowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySnowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the Ugly
 
Enabling digital business with governed data lake
Enabling digital business with governed data lakeEnabling digital business with governed data lake
Enabling digital business with governed data lake
 
Best Practices for Building a Warehouse Quickly
Best Practices for Building a Warehouse QuicklyBest Practices for Building a Warehouse Quickly
Best Practices for Building a Warehouse Quickly
 
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and TableauAnalyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
 
IBM Governed Data Lake
IBM Governed Data LakeIBM Governed Data Lake
IBM Governed Data Lake
 
Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA
 
[XConf Brasil 2020] Data mesh
[XConf Brasil 2020] Data mesh[XConf Brasil 2020] Data mesh
[XConf Brasil 2020] Data mesh
 
When you need more data in less time...
When you need more data in less time...When you need more data in less time...
When you need more data in less time...
 
Tools and techniques for predictive analytics
Tools and techniques for predictive analyticsTools and techniques for predictive analytics
Tools and techniques for predictive analytics
 
How to Streamline DataOps on AWS
How to Streamline DataOps on AWSHow to Streamline DataOps on AWS
How to Streamline DataOps on AWS
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Webinar: Customer Experience in Banking - a CTO's Perspective
Webinar: Customer Experience in Banking - a CTO's PerspectiveWebinar: Customer Experience in Banking - a CTO's Perspective
Webinar: Customer Experience in Banking - a CTO's Perspective
 
Big Data Platform and Architecture Recommendation
Big Data Platform and Architecture RecommendationBig Data Platform and Architecture Recommendation
Big Data Platform and Architecture Recommendation
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
Advance Data Visualization and Storytelling Virtual Workshop
Advance Data Visualization and Storytelling Virtual WorkshopAdvance Data Visualization and Storytelling Virtual Workshop
Advance Data Visualization and Storytelling Virtual Workshop
 
Semantic Data Management
Semantic Data ManagementSemantic Data Management
Semantic Data Management
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform 
 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural Change
 

Similar a You are not Facebook or Google? Why you should still care about Big Data and Apache Hadoop - 33rd Degree 2013

JAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop IntegrationJAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop Integrationjazoon13
 
Big Data: A CIO’s Cut Out and Keep Guide
Big Data: A CIO’s Cut Out and Keep Guide Big Data: A CIO’s Cut Out and Keep Guide
Big Data: A CIO’s Cut Out and Keep Guide EMC
 
Scandev / SDC2013 - Spoilt for Choice: Which Integration Framework to use – A...
Scandev / SDC2013 - Spoilt for Choice: Which Integration Framework to use – A...Scandev / SDC2013 - Spoilt for Choice: Which Integration Framework to use – A...
Scandev / SDC2013 - Spoilt for Choice: Which Integration Framework to use – A...Kai Wähner
 
Day 2 aziz apj aziz_big_datakeynote_press
Day 2 aziz apj aziz_big_datakeynote_pressDay 2 aziz apj aziz_big_datakeynote_press
Day 2 aziz apj aziz_big_datakeynote_pressIntelAPAC
 
Democratizing Big Data
Democratizing Big DataDemocratizing Big Data
Democratizing Big DataJeff Kelly
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Cloud Migration Strategies that Ensure Greater Value for the Business
Cloud Migration Strategies that Ensure Greater Value for the BusinessCloud Migration Strategies that Ensure Greater Value for the Business
Cloud Migration Strategies that Ensure Greater Value for the BusinessDenodo
 
Big Data Jujitsu Walkthru Client x Client
Big Data Jujitsu Walkthru Client x ClientBig Data Jujitsu Walkthru Client x Client
Big Data Jujitsu Walkthru Client x ClientClient X Client
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your businessAcunu
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachDATAVERSITY
 
From Paris Hilton to Walmart: welcome to the Big Data Revolution
From Paris Hilton to Walmart: welcome to the Big Data RevolutionFrom Paris Hilton to Walmart: welcome to the Big Data Revolution
From Paris Hilton to Walmart: welcome to the Big Data RevolutionWilliam Visterin
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
 
DAS Slides: Best Practices in Metadata Management
DAS Slides: Best Practices in Metadata ManagementDAS Slides: Best Practices in Metadata Management
DAS Slides: Best Practices in Metadata ManagementDATAVERSITY
 
Data Governance in the Big Data Era
Data Governance in the Big Data EraData Governance in the Big Data Era
Data Governance in the Big Data EraPieter De Leenheer
 
Connecting and Exploiting Big Data
Connecting and Exploiting Big DataConnecting and Exploiting Big Data
Connecting and Exploiting Big DataR A Akerkar
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopCCG
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersDatameer
 

Similar a You are not Facebook or Google? Why you should still care about Big Data and Apache Hadoop - 33rd Degree 2013 (20)

JAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop IntegrationJAZOON'13 - Kai Waehner - Hadoop Integration
JAZOON'13 - Kai Waehner - Hadoop Integration
 
Big Data: A CIO’s Cut Out and Keep Guide
Big Data: A CIO’s Cut Out and Keep Guide Big Data: A CIO’s Cut Out and Keep Guide
Big Data: A CIO’s Cut Out and Keep Guide
 
Scandev / SDC2013 - Spoilt for Choice: Which Integration Framework to use – A...
Scandev / SDC2013 - Spoilt for Choice: Which Integration Framework to use – A...Scandev / SDC2013 - Spoilt for Choice: Which Integration Framework to use – A...
Scandev / SDC2013 - Spoilt for Choice: Which Integration Framework to use – A...
 
Day 2 aziz apj aziz_big_datakeynote_press
Day 2 aziz apj aziz_big_datakeynote_pressDay 2 aziz apj aziz_big_datakeynote_press
Day 2 aziz apj aziz_big_datakeynote_press
 
The value of our data
The value of our dataThe value of our data
The value of our data
 
Democratizing Big Data
Democratizing Big DataDemocratizing Big Data
Democratizing Big Data
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Cloud Migration Strategies that Ensure Greater Value for the Business
Cloud Migration Strategies that Ensure Greater Value for the BusinessCloud Migration Strategies that Ensure Greater Value for the Business
Cloud Migration Strategies that Ensure Greater Value for the Business
 
Big Data Jujitsu Walkthru Client x Client
Big Data Jujitsu Walkthru Client x ClientBig Data Jujitsu Walkthru Client x Client
Big Data Jujitsu Walkthru Client x Client
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your business
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected Approach
 
From Paris Hilton to Walmart: welcome to the Big Data Revolution
From Paris Hilton to Walmart: welcome to the Big Data RevolutionFrom Paris Hilton to Walmart: welcome to the Big Data Revolution
From Paris Hilton to Walmart: welcome to the Big Data Revolution
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
DAS Slides: Best Practices in Metadata Management
DAS Slides: Best Practices in Metadata ManagementDAS Slides: Best Practices in Metadata Management
DAS Slides: Best Practices in Metadata Management
 
Data Governance in the Big Data Era
Data Governance in the Big Data EraData Governance in the Big Data Era
Data Governance in the Big Data Era
 
Connecting and Exploiting Big Data
Connecting and Exploiting Big DataConnecting and Exploiting Big Data
Connecting and Exploiting Big Data
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
 
BIG DATA.pptx
BIG DATA.pptxBIG DATA.pptx
BIG DATA.pptx
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business Managers
 

Más de Kai Wähner

Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Kai Wähner
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?Kai Wähner
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKai Wähner
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareKai Wähner
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Kai Wähner
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureKai Wähner
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Kai Wähner
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryData Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryKai Wähner
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryKai Wähner
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryKai Wähner
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail IndustryKai Wähner
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Kai Wähner
 
Apache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingApache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingKai Wähner
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesKai Wähner
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Kai Wähner
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Kai Wähner
 

Más de Kai Wähner (20)

Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryData Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
 
Apache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingApache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and Manufacturing
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
 

Último

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Último (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

You are not Facebook or Google? Why you should still care about Big Data and Apache Hadoop - 33rd Degree 2013

  • 1. Kai Wähner YOU ARE NOT FACEBOOK OR GOOGLE? WHY YOU SHOULD STILL CARE ABOUT BIG DATA
  • 2. Kai Wähner Main Tasks Requirements Engineering Enterprise Architecture Management Business Process Management Architecture and Development of Applications Service-oriented Architecture Integration of Legacy Applications Cloud Computing Big Data Contact Consulting Email: kontakt@kai-waehner.de Developing Blog: www.kai-waehner.de/blog Coaching Speaking Twitter: @KaiWaehner Writing Social Networks: Xing, LinkedIn © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 3. Key messages You have to care about big data to be competitive in the future! Start your big data projects business driven! Big data from a technical perspective is no (longer) rocket science! © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 4. Agenda • Big data paradigm shift • Use cases for SMEs Why? • Challenges of big data • Technology perspective • Getting started How? • Live demo © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 5. Agenda • Big data paradigm shift • Use cases for SMEs • Challenges of big data • Technology perspective • Getting started • Live demo © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 6. Why you should care about big data? “If you can't measure it, you can't manage it.” William Edwards Deming (1900 –1993) American statistician, professor, author, lecturer and consultant © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 7. Why you should care about big data?  „Silence the HiPPOs“ (highest-paid person‘s opinion)  Being able to interpret unimaginable large data stream, the gut feeling is no longer justified! © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 8. Data should drive decisions (again)! “Accessing data is now [again] the critical path in making good decisions!” http://www.inforbix.com/friday-data-stories-big-data-driven-decision-making/ © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 9. What big data NEED NOT to be? Big data NEED NOT to be analytics (data warehouse, data marts) Big data NEED NOT to be big volume! Big data NEED NOT to be expensive (open source, commodity hardware)! © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 10. What is big data? Changing Scale Changing Expectations Cloud Changing Interactions © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 11. What is big data? The Vs of big data Volume (terabytes, Velocity (realtime or near- Value petabytes) realtime) Variety (social networks, blog posts, logs, sensors, etc.) © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 12. Big data tasks to solve Big Data Integration – Land data in a Big Data cluster – Implement or generate parallel processes Big Data Manipulation – Simplify manipulation, such as sort and filter – Computational expensive functions Big Data Quality & Governance – Identify linkages and duplicates, validate big data – Match component, execute basic quality features Big Data Project Management – Place frameworks around big data projects – Common Repository, scheduling, monitoring © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 13. Big data for SMEs ➜ Despite popular belief, there are many benefits for companies of any size, to maximizing the opportunities from regular, full and thorough analysis of big data insight. Big data can give firms a competitive advantage, lead to better client understanding, improve business operations and enhance the accessibility of their brand. ➜ Arguably, the process of collecting data from target markets is even more important for smaller firms than for larger enterprises. SMEs and start-ups do not always have the security of a widespread following nor guaranteed business revenues and therefore, should be making the most of the information the web provides in the form of big data sets. http://www.sentronex.com/2012/11/big-data-and-smes © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 14. Agenda • Big data paradigm shift • Use cases for SMEs • Challenges of big data • Technology perspective • Getting started • Live demo © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 15. Use cases for SMEs ➜ Replacing ETL Jobs ➜ Forecast ➜ Logistics ➜ Fraud Protection ➜ Storage © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 16. Replacing ETL jobs © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 17. Replacing ETL jobs http://www.itbusinessedge.com/blogs/integration/ better-data-integration-with-hadoop-its-possible.html http://www.enterpriseirregulars.com/33687/hadoop-is-many-things- including-the-ideal-etl-tool-for-big-data-analytics/ © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 18. Replacing ETL jobs: Text files “The advantage of their new system is that they can now look at their data [from their log processing system] in anyway they want: ➜ Nightly MapReduce jobs collect statistics about their mail system such as spam counts by domain, bytes transferred and number of logins. ➜ When they wanted to find out which part of the world their customers logged in from, a quick [ad hoc] MapReduce job was created and they had the answer within a few hours. Not really possible in your typical ETL system.” http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 19. Replacing ETL jobs: Binary files http://www.slideshare.net/brocknoland/common-and-unique-use-cases-for-apache-hadoop © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 20. Forecast “Forecasting is the process of making statements about events whose actual outcomes (typically) have not yet been observed.” [Wikipedia] © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 21. Forecast: Recommendation engine  Is there no use case for your business (shop, community, items, etc.) ??? © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 22. Forecast: Preferred digital media Deutsche Welle is Germany's international broadcaster. The service is aimed at the overseas market. It broadcasts news and information on shortwave, internet and satellite radio in 30 languages (DW Radio). It has a satellite television service (DW TV), which is available in four languages, and there is also an online news site. Deutsche Welle collects all usage data of over 50.000 videos at platforms such as YouTube, Dailymotion, Sevenload or MyVideo hourly. Until today, over 500.000.000 data records were imported, aggregated and analyzed to improve content for future broadcasts. http://www.unbelievable-machine.com/um/2012/09/20/case-study-von-deutsche-welle-und-um-im-bitkom-big-data-leitfaden/ © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 23. Forecast: Risk management  Flu Epidemie before it starts thanks to Google Flu Trends  Cholera in Haiti in 2010: Early information due to analyzing Twitter messages Research: John Hopkins School of Medicine http://www.crn.com/slide-shows/applications-os/240147224/ 8-ways-big-data-will-change-our-lives.htm?pgno=3 © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 24. Forecast: Risk management Deduce Customer Defections http://hkotadia.com/archives/5021 © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 25. Forecast: Risk management ➜ Real estate bubble ➜ So if your business has to do anything with real estates, then this information is really important ASAP, right? ➜ “The Future of Prediction: How Google Searches Foreshadow Housing Prices and Sales” http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2022293 © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 26. Logistics “Logistics is the management of the flow of resources between the point of origin and the point of destination in order to meet some requirements.” Wikipedia © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 27. Logistics: Time planning ➜ Use case: Guessing Aiport Flight arrival times ➜ Problem: Every minute of a delay costs a lot of money (crew, parking, delays of other machines, etc.) ➜ Solution: Computation of flight arrival with public and private data (weather forecast, flight plans, own radar stations, statistics of past flights Andrew Mcafee, Erik BRYNJOLFSSON, Harvard Business Manager November 2012 © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 28. Logistics: Resource management http://www.blue-yonder.com/en/customers/overview/dm-drogerie-markt-en.html © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 29. Logistics: Flexible pricing ➜ With revenue of almost USD 30 billion and a network of 800 locations, Macy's is considered the largest store operator in the USA ➜ Daily price check analysis of its 10,000 articles in less than two hours ➜ Whenever a neighboring competitor anywhere between New York and Los Angeles goes for aggressive price reductions, Macy's follows its example ➜ If there is no market competitor, the prices remain unchanged http://www.t-systems.com/about-t-systems/examples-of-successes-companies-analyze-big-data-in-record-time-l-t-systems/1029702 © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 30. Logistics: Flexible pricing http://hkotadia.com/archives/5021 © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 31. Fraud detection “Fraud is a million dollar business and it is increasing every year. The PwC global economic crime survey suggests that close to 30% of companies worldwide have reported being victims of fraud in the past year.” http://www.pwc.com/gx/en/economic-crime-survey © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 32. Fraud detection: Anti-counterfeiting Solution: A scalable process to automatically monitor the internet for any mention of the company’s brand name products. Websites, message boards, trade boards, and social media are automatically monitored. http://www.brightplanet.com/2013/01/pharmaceutical-fraud-and-counterfeiting-finding-a-deep-web-solution/ © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 33. Fraud detection: Fraud mining Content manipulation at a traveling portal – Which customer reviews are trustworthy? Jean-Paul Schmetz, Harvard Business Manager, November 2012 © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 34. Storage “being stored somewhere until needed” http://www.macmillandictionary.com © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 35. Storage: Compliance Global Parcel Service ➜ A lot of data must be stored „forever“ ➜ Numbers increase exponentially ➜ Goal: As cheap as possible ➜ Problem: (Fast) queries must still be possible ➜ Solution: Commodity servers and „Hadoop querying“ http://archive.org/stream/BigDataImPraxiseinsatz-SzenarienBeispieleEffekte/Big_Data_BITKOM-Leitfaden_Sept.2012#page/n0/mode/2up © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 36. Storage: Revenue creation http://wikibon.org/wiki/v/Case_Study:_PhotoBox_and_Big-Data © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 37. Gaming: Analyze players All this data work enables Riot to ask questions and receive meaningful answers. Which game champions (or the higher-scoring players) and skins (character costumes) are popular in particular geographic regions? What are the win rates of champions? “We had lots of unexpected results when we first started doing this analysis,” Livingston said. “One of the benefits of having all this data is we can be more scientific about it, and we can now check everything.” http://slashdot.org/topic/bi/for-riot-games-big-data-is-serious-business/ © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 38. Agenda • Big data paradigm shift • Use cases for SMEs • Challenges of big data • Technology perspective • Getting started • Live demo © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 39. Limited big data experts This is your company Big Data Geek © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 40. Big data tool selection (technical perspective) Looking for ‚your‘ required big data product? Support your data from scratch? Good luck!  © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 41. Big data tool selection (business perspective) ➜ Wanna buy a big data solution for your industry? ➜ Maybe a competitor has a big data solution which adds business value? ➜ The competitor will never publish it (rat-race)! © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 42. Data quality Big Data + Poor Data Quality = Big Problems © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 43. How to solve these big data challenges? © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 44. Be no expert! Be simple!  “[Often] simple models and big data trump more-elaborate [and complex] analytics approaches”  “Often someone coming from outside an industry can spot a better way to use big data than an insider” Erik Brynjolfsson / Lynn Wu http://alfredopassos.tumblr.com/post/32461599327/big-data-the-management-revolution-by-andrew-mcafee © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 45. Be creative!  Look at use cases of others (SMU, but also large companies)  How can you do something similar with your data?  You have different data sources? Use it! Combine it! Play with it! © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 46. What is your Big Data process? Step 1 Step 2 Step 3 1) Do not begin with the data, think about business opportunities 2) Choose the right data (combine different data sources) 3) Use easy tooling http://hbr.org/2012/10/making-advanced-analytics-work-for-you © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 47. Agenda • Big data paradigm shift • Use cases for SMEs • Challenges of Big data • Technology perspective • Getting started • Live demo © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 48. Technology perspective How to process big data? © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 49. How to process big data? The critical flaw in parallel ETL tools is the fact that the data is almost never local to the processing nodes. This means that every time a large job is run, the data has to first be read from the source, split N ways and then delivered to the individual nodes. Worse, if the partition key of the source doesn’t match the partition key of the target, data has to be constantly exchanged among the nodes. In essence, parallel ETL treats the network as if it were a physical I/O subsystem. The network, which is always the slowest part of the process, becomes the weakest link in the performance chain. http://blog.syncsort.com/2012/08/parallel-etl-tools-are-dead © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 50. How to process big data? Slides: http://www.slideshare.net/pavlobaron/100-big-data-0-hadoop-0-java Video: http://www.infoq.com/presentations/Big-Data-Hadoop-Java © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 51. How to process big data? The defacto standard for big data processing © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 52. How to process big data? “A big part of [the company’s strategy] includes wiring SQL Server 2012 (formerly known by the codename “Denali”) to the Hadoop distributed computing platform, and bringing Hadoop to Windows Server and Azure” Even Microsoft (the .NET house) relies on Hadoop since 2011 © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 53. What is Hadoop? Apache Hadoop, an open-source software library, is a framework that allows for the distributed processing of large data sets across clusters of commodity hardware using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 54. Hadoop architecture © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 55. Map (Shuffle) Reduce Simple example • Input: (very large) text files with lists of strings, such as: „318, 0043012650999991949032412004...0500001N9+01111+99999999999...“ • We are interested just in some content: year and temperate (marked in red) • The Map Reduce function has to compute the maximum temperature for every year Example from the book “Hadoop: The Definitive Guide, 3rd Edition” © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 56. Map (Shuffle) Reduce MapReduce is a distributed data processing model and execution environment that runs on large clusters of commodity machines. MapReduce is a programming model for data processing. The model is simple, yet not too simple to express useful programs in. A MapReduce job is a unit of work that the client wants to be performed: it consists of the input data, the MapReduce program, and configuration information. Hadoop runs the job by dividing it into tasks, of which there are two types: map tasks and reduce tasks. There are two types of nodes that control the job execution process: a jobtracker and a number of tasktrackers. The jobtracker coordinates all the jobs run on the system by scheduling tasks to run on tasktrackers. Tasktrackers run tasks and send progress reports to the jobtracker, which keeps a record of the overall progress of each job. If a task fails, the jobtracker can reschedule it on a different tasktracker. Hadoop divides the input to a MapReduce job into fixed-size pieces called input splits, or just splits. Hadoop creates one map task for each split, which runs the user defined map function for each record in the split. © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 57. How to process big data? © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 58. Agenda • Big data paradigm shift • Use cases for SMEs • Challenges of Big data • Technology perspective • Getting started • Live demo © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 59. Hadoop alternatives Hadoop Big Data Suite Apache Distribution Hadoop Features included few many © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 60. Hadoop alternatives Hadoop Big Data Suite Apache Distribution Hadoop Features included few many MapReduce HDFS Ecosystem © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 61. Apache Hadoop (MapReduce code) © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 62. Genealogy of elephants A little bit outdated (from February 2012), but you understand the problem ?! https://blogs.apache.org/bigtop/entry/all_you_wanted_to_know © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 63. Hadoop alternatives Hadoop Big Data Suite Apache Distribution Hadoop Features included few many MapReduce Packaging HDFS Ecosystem + Deployment-Tooling Support © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 64. Hadoop distributions (… some more arising) © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 65. Hortonworks ecosystem © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 66. Cloudera Manager © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 67. MapR – the different alternative Replaced (Apache) HDFS with own implementation MapR Direct Access NFS makes Hadoop radically easier and less expensive to use. Unlike the write-once system found in other Hadoop distributions, MapR allows files to be modified, overwritten, and enables multiple concurrent reads and writes on any file. Users can simply browse files, automatically open associated applications with a mouse click, or drag-and-drop files and directories into and out of the cluster. Additionally, standard command-line tools and UNIX applications and utilities (such as grep, tar, sort, or tail) can be used directly on data in the cluster. With other Hadoop distributions, the user must copy the data out of the cluster in order to use standard tools. © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 68. Hadoop alternatives Hadoop Big Data Suite Apache Distribution Hadoop Features included few many Packaging Tooling / Modeling MapReduce HDFS Ecosystem + Deployment-Tooling Support + Code Generation Scheduling Integration © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 69. Big data suites © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 70. Example: Talend Open Studio for Big Data © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 71. Trying to get from this… © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 72. to this… No coding… … use a modeler and code generator instead! © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 73. Hadoop Alternatives • Big data paradigm shift • Use cases for SMEs • Challenges of Big data • Technology perspective • Getting started • Live demo © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 74. Vision: Democratize big data Talend Open Studio for Big Data • Improves efficiency of big data job design with graphic interface • Generates Hadoop code and run transforms inside Hadoop Pig • Native support for HDFS, Pig, Hbase, Hcatalog, Sqoop and Hive • 100% open source under an Apache License …an open source • Standards based ecosystem © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 75. Vision: Democratize big data Talend Platform for Big Data • Builds on Talend Open Studio for Big Data • Adds data quality, advanced scalability and management functions • MapReduce massively parallel data processing Pig • Shared Repository and remote deployment • Data quality and profiling • Data cleansing • Reporting and dashboards …an open source • Commercial support, warranty/IP indemnity ecosystem under a subscription license © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 76. Live demo „Talend Open Studio for Big Data“ in action... © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 77. How to choose the right tooling? Hadoop Big Data Suite Apache Distribution Hadoop Features included few many  Simplicity (lightweight, setup, coding, deployment, usage)  Prevalence (open standards, community, extendibility)  Features (Hadoop ecosystem, connectors, monitoring)  Pitfalls (data-driven costs, native Hadoop code, just ETL or more) © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 78. Did you get the key message? © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 79. Key messages You have to care about big data to be competitive in the future! Start your big data projects business driven! Big data from a technical perspective is no (longer) rocket science! © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 80. Did you get the key message? © Talend 2013 “You are not Facebook or Google? Why you should still care about Big Data” by Kai Wähner
  • 81. Thank you for your attention. Questions? KAI WÄHNER kwaehner@talend.com www.kai-waehner.de LinkedIn / Xing @KaiWaehner