SlideShare una empresa de Scribd logo
1 de 36
BIG DATA ANALYTICS
 IN A HETEROGENEOUS WORLD



JOYDEEP DAS
DIRECTOR, ANALYTICS PRODUCT MANAGEMENT
SYBASE INC, AN SAP COMPANY

FEBRUARY 16, 2012
AGENDA
 The real world means business


Change is afoot – Myriad solution trends


Building bridges across a heterogeneous world


 Summary
BIG DATA ANALYTICS
REAL WORLD MEANS BUSINESS
BIG DATA ANALYTICS ISSUES
DEALING WITH VOLUME, VARIETY, VELOCITY, COSTS, SKILLS

                                     Volume
                                  Managing and
                                harnessing massive
                                    data sets
        Skills                                              Variety
   Lack of adequate               BIG                 Harmonizing silos of
   skills for popular                                   structured and
           APIs
                                  DATA                 unstructured data
                                ANALYTICS

                        Costs                   Velocity
                 Too expensive to            Keeping up with
                 acquire, operate,          unpredictable data
                   and expand                and query flows
BIG DATA ANALYTICS MATURITY
FROM JARGON TO TRANSFORMATIONAL BUSINESS VALUE*




                                                                                       New Strategies &
                                                                                       Business Models
                                        Column Store


             Hadoop
  Big data




                      NoSQL   In memory
                                                                                              Business
              data            MPP
                                                                                               Value*

                                                                      Operational                                    Revenue
                                                                      Efficiencies                                   Growth




 *A McKinsey study titled “Big Data: Next frontier for innovation, competition, and productivity”, May 2011, has found huge potential for Big Data
  Analytics with metrics as impressive as 60% improvements in Retail operating margins, 8% reduction in (US) national healthcare expenditures, and $150M
  savings in operational efficiencies in European economies
BIG DATA ANALYTICS IN THE REAL WORLD
PREVALENT IN DATA INTENSIVE VERTICALS AND FUNCTIONAL AREAS

                                   BIG DATA
           Verticals              ANALYTICS           Functional

         Banking                              • Marketing Analytics
                                               Digital channels
                                               Track visits, discover best channel mix:
         Telcom,                               email, social media, search
                                              • Sales Analytics
         Global Capital Markets                Deep correlations
                                               Predict risks based on deal DNA (emails,
         Retail                                meetings) pattern match
                                              • Operational Analytics
         Government                            Atomic machine data
                                               Analyze RFIDs, weblogs, SMS, sensors —
                                               continuous operational inefficiency
         Healthcare                           • Financial Analytics
                                               Detailed simulations
         Information Providers                 Liquidity, portfolio simulations —
                                               Stress tests, error margins
CHANGE IS AFOOT
MYRIAD OF BIG DATA ANALYTICS SOLUTIONS
CAUSAL LINKS: VARIETY, VELOCITY, VOLUME


   Events data

                     Transactional data




                                                    µSeconds
 Multi-media data
                             eCommerce data   Continuous and/or Bursts   Routinely Petabytes
                 x


        w        a       y


                 z

            Graph data


             Variety                                Velocity                   Volume
GROWING USER COMMUNITIES


  Data Scientists   Business Analysts     Developers/Programmers




                    Administrators




  Business users     External consumers      Business Processes
HARDWARE IS SUPERIOR



     Small Server farms – Scale out                    Larger Servers with partitions – Scale up




                                       Spinning disks to SSDs              SSD
                                       1.2x to 2x speed up


              SSD                     SSDs to Main Memory

                                       4x to 200x speed up


                                  Main Memory to CPU caches
                                                                         CPU Caches
                                        2x to 6x speed up
SOFTWARE EXPECTATIONS HAVE CHANGED




                           Intelligence & Automation
     Execution
   Characteristics



                                                       Performance & Scalability




      Results
   Characteristics


                     Traditional                                         Contemporary
EXECUTION CHARACTERISTICS
PERFORMANCE FOCUS


         1 2 3 4 5 6 7 8 9…
    r1
    r2
    r3
    r4
    r5



           Columnar access                     MPP: Shared Nothing, Shared Everything




         Algorithms




         Computations close to data:
  InDB Analytics (MapReduce), FPGA filtering                  In-memory processing
EXECUTION CHARACTERISTICS
SCALABILITY FOCUS


                                   1 2 3 4 5 6 7 8 9…
                              r1
 Data Compression             r2
                              r3
                              r4
                              r5

                              Natural Compression                  Compression Techniques               Hybrid Columnar
                             Column Store Databases                 Row Store Databases              Compression Databases



                                                                        SAN

  Distributed File Systems
                                                        DAS                               NAS




                                                              Stream Processing Engines
   Data Filtering
                                            Pre-processing Engines              Transformation Engines
EXECUTION CHARACTERISTICS
INTELLIGENCE FOCUS




    Query & Load Optimization                On-demand systems: Virtualization and provisioning




                                    CPU Caches

                           CPU Cache Conscious Computations
EXECUTION CHARACTERISTICS
AUTOMATION FOCUS




     Data conscious federation                  Automatic Workload Balancer/Mixer




                      User community focused collaborative services
RESULTS CHARACTERISTICS
ACCURACY TOLERANCE FOCUS


                            Complex schemas
                           Multiple applications
                           Write on schema
                            Atomic level locking
                            Consistency guarantees across system losses
                            Declarative API
                            Interactive
                            Does encapsulate elements of CAP
      Traditional           Associated with SQL




                           Simple read on schemas
                            Single application
                           Batch oriented
                            Snapshot isolations
                            Eventual consistency guarantees
                            Procedural APIs
                            Does encapsulate elements of ACID
    Contemporary           Associated with NoSQL
BUILDING BIG DATA BRIDGES
ACROSS A HETEROGENEOUS WORLD
COMPREHENSIVE 3-TIER FRAMEWORK
COMMERCIAL AND/OR OPEN SOURCE



                                 Eco-System
     Business Intelligence Tools, Data Integration Tools, DBA Tools, Packaged Apps




                         Application Services
      In-Database Analytics, Multi-lingual Client APIs, Federation, Web Enabled




                            Data Management
                  High Performance, Highly Scalable, Cloud Enabled
RELIABLE DATA MANAGEMENT
                                  Full Mesh High Speed Interconnect



   Data
Management




              Can handle high performance, compression, batch, ad-hoc analysis

              Can routinely scale to Petabyte class problems, thousands of concurrent jobs

              Typical characteristics
                   Massively parallel processing of complex queries
                   In-memory and on-disk optimizations
                   Elastic resources for user communities
                   ACID guarantees
                   Data variety
                   Information lifecycle management
                   User friendly automation tools
                   File systems (schema free) and/or DBMS structures (schema specific)
DATA MANAGEMENT INFRASTRUCTURE
ROBUST, SCALABLE, HIGH PERFORMANCE



                           Data Discovery     Application Modeling   Reports/Dashboards    Business Decisions
                          (Data Scientists)    (Business Analysts)    (BI Programmers)    (Business End Users)

 Infrastructure
 Management                                      Full Mesh High Speed Interconnect
     (DBAs)




                  • Dynamic, elastic MPP grid
                         – Grow, shrink, provision on-demand
                         – Heavy parallelization
                  • Load, prepare, mine, report in a workflow
                         – Privacy through isolation of resources
                         – Collaboration through sharing of results/data via sharing of resources
VERSATILE APPLICATION SERVICES
                       Python      ADO.NET   PERL
                                                      Programming      PHP   Ruby   Java   C++
                                                          APIs


                                                    Web Services API
Application Services
                           In-Database Analytics Plug-Ins: SQL, PMML, C++, JAVA, …




                                 Comprehensive declarative and procedural APIs
                                 In-Database Analytics Plug-In APIs
                                 In-Database Web Services
                                 Query and data federation APIs
                                 Multi-lingual Client APIs
VERSATILE APPLICATION SERVICES
RICH ALGORITHMS CLOSE TO DATA
                                                                                   Sybase IQ Process
                                                                                      In Memory
            Sybase IQ Process




                                                                                            RPC CALLS
               In Memory


    User’s DLL “A”      User’s DLL “B”

                                                                                 Library Access Process




                                LOAD
                                                                           User’s DLL “A”               User’s DLL “B”




                                                                                                              LOAD
                         User’s DLL “B”


                                                                                                        User’s DLL “B”

   In-database + In-process
                                             Multi-lingual APIs
 • In-process dynamically loaded                                              In-database + Out-process
                                          Scalar to Scalar
 shared libraries
                                                                           • Out of process shared library
                                          Scalar sets to Aggregate
 • Highest possible performance
                                          Scalar sets to Dimensional       • Lower security risks
 • Incurs security risks, but             Aggregates
 manageable via privileges                                                 • Lower robustness risks
                                          Scalar sets to Multi-attribute
                                          (bulk)
 • Incurs robustness risks, but                                            • Lower performance than in-
                                          Multi-attribute (bulk) to
 manageable via multiplex                                                  process but better than out of
                                          Multi-attribute (bulk)
                                                                           database
VERSATILE APPLICATION SERVICES
NATIVE MAPREDUCE

  For stocks in enterprise software sector, find max relative strength of a stock for a trading day*

 Key (k1)                  Value (v1)                          Key (k2)                                Value (v2)
                                                               Ticker     30-min interval        Weighted variance = (A given stock’s variance
 30-min           Ticker   TickValu     TickValue
                                                               Symbol     time                   / Average Variance across All “N” stocks)
 interval time   Symbol    e Day 1        Day 2
                                                               SAP        9:30 am                +1.4 / (SUM (+1.4-2.8-0.7….)/”N” stocks)
 9:30 am          SAP        51           52.4                 SAP        10:00 am               +2.2 / (SUM (+2.2-2.3-1.1 ….)/”N” stocks)
 9:30 am          ORCL       31           28.2      Map        SAP        ……                     ……
 9:30 am          TDC        22           21.3       Fn        ORCL       9:30 am                -2.8 / (SUM (+1.4-2.8-0.7….)/”N” stocks)
                                                               ORCL       10:00 am               -2.3 / (SUM (+2.2-2.3-1.1 ….)/”N” stocks)
 10:00 am         SAP       50.9          53.1
                                                               ORCL       …….                    …..
 10:00 am         TDC       21.8          20.9                 TDC        9:30 am                -0.7 / (SUM (+1.4-2.8-0.7….)/”N” stocks)
 10:00 am         ORCL      29.4          27.1                 TDC        10:00 am               -1.1/ (SUM (+2.2-2.3-1.1 ….)/”N” stocks)
 …..              ORCL      ……             …..                 TDC        …..                    ……


                                                                                Reduce
                                                                                  Fn


                                                                                Value (v3)

                                                               Ticker Symbol    Max Absolute Weighted Variance (v3)
                                                               SAP              Max (ABS(9:30 Wt Var), ABS(10:00 Wt Var), …..)
                                                               ORCL             Max (ABS(9:30 Wt Var), ABS(10:00 Wt Var), …..)
                                                               TDC              Max (ABS(9:30 Wt Var), ABS(10:00 Wt Var), …..)




 *Calculate max variance for the day by comparing each 30-min interval tick values across two days: the trading day & the
  day before, weighted by average variance of all stocks for each 30-min interval
VERSATILE APPLICATION SERVICES
NATIVE MAPREDUCE – DECLARATIVE WAY
 For stocks in enterprise software sector, find max relative strength of a stock for a trading day

  • Map function declaration: CREATE PROCEDURE MapVarTPF (IN XY TABLE (a1 char, a2 datetime, a3 float, a4 float)
                             RESULT SET YZ (b1 char, b2 datetime, b3 float)
  • Reduce function declaration: CREATE PROCEDURE RedMaxVarTPF (IN XY TABLE (a1 char, a2 datetime, a3 float)
                             RESULTE SET YZ (b1 char, b2 float)
  • Query: SELECT RedMaxVarTPF.TickSymb, RedMaxVarTPF.MaxVar,
               FROM RedMaxVarTPF (TABLE (SELECT MapVarTPF.TickSymb, MapVarTPF.30MinIntTime, MapVarTPF.Var
                           FROM MapVarTPF (TABLE ( SELECT TickDataTab.TickSymb, TickDataTab.30MinIntTime,
                                                      TickDataTab.30MinValDay1, TickDataTab.30MinValDay2)
                                         OVER (PARTITION BY TickDataTab.30MinInt)))
                           OVER (PARTITION BY MapVarTPF.TickSymb))
              ORDER BY RedMaxVarTPF.TickSymb
  • Native MapReduce parallel execution workflow:

    MapVarTPF (Partitioned to 15 parallel instances)   RedMaxVarTPF (Partitioned to 25 parallel instances)   SQL Query collates output using 1 node

                                    …….                                                …….                                         …..

                      SAN Fabric                                         SAN Fabric                                   SAN Fabric




  • Native MapReduce with unstructured data: Native MapReduce using can easily be applied to unstructured data also e.g.
    text, multi-media, … stored in DBMS or to unstructured data brought into DBMS during execution time from external files
RICH ECO-SYSTEM
              Source                                                               Answers
                       Data preparation                                 Data Usage


 Eco-System
                                               DBMS /
                                              Filesystem

                          Event Processing   Data Federation        Business Intelligence



                             Data Modeling / Database Design Tool



                        Business Intelligence Tools

                        Data Integration Tools

                        Data Mining Tools

                        Application Tools

                        DBA Tools
RICH ECO-SYSTEM
DBMS <–> HADOOP BRIDGE I

           Feature                           Characteristics                  Big Data Use Cases

                                      • Client tool capable of querying   •Ideal for bringing together Big Data
                                      DBMS and Hadoop                     Analytics pre-computations from
                                                                          different domains
                                      • Better performance when results
                                      from sources are pre-                • Example – In Telecommunication: DBMS
  Client Side Federation: Join data   computed/pre-aggregated
                                                                           has aggregated customer loyalty data &
                                                                           Hadoop with aggregated network
from DBMS AND Hadoop at a client
                                                                           utilization data; Quest Toad for Cloud can
           application level                                               bring data from both sources, linking
                                                                           customer loyalty to network utilization or
                                                                           network faults (e.g. dropped calls)




                                                                                             Quest
                                                                                          Toad for Cloud




                                                                                 DBMS                      Hadoop/Hive
RICH ECO-SYSTEM
DBMS <–> HADOOP BRIDGE II

           Feature                           Characteristics                        Big Data Use Cases
                                     • Extract & load subsets of HDFS
                                     data into DBMS store
                                       • Raw data from HDFS
                                       • Results of Hadoop MR jobs              • Ideal for combining subsets of HDFS
              ETL                                                               unstructured data or summary of
                                     • HDFS Data stored in DBMS is              HDFS data into DBMS for mid to long
                                     treated like other DBMS data               term usage in business reports
  Load Hadoop Data into DBMS           • Gets ACID properties of a DBMS
column store: Extract, Transform,      • Can be indexed, joined, parallelized    • Example – In eCommerce: clickstream data
                                       • Can be queried in an ad-hoc way         from weblogs stored in HDFS and outputs of
  Load data from HDFS (Hadoop                                                    MR jobs on that data (to study browsing
Distributed File System) into DBMS   • Visible to BI and other client tools      behavior) ETL’d into DBMS. The transactional
             schemas                                                             sales data in DBMS joined with clickstream data
                                     via DBMS ANSI SQL API only                  to understand and predict customer browsing
                                                                                 to buying behavior
                                     • Currently, the bulk data transfer
                                     utility SQOOP (built by Cloudera) is
                                     can be used provide this ETL
                                     capability
                                                                                     Clickstream
                                                                                     Data                      Sales Data


                                                                                Hadoop/Hive        SQOOP        DBMS
RICH ECO-SYSTEM
DBMS <–> HADOOP BRIDGE III

           Feature                            Characteristics                                 Big Data Use Cases


                                      • Scan and fetch specified data
                                                                                        • Ideal for combining subsets of HDFS
                                      subsets from HDFS via table UDF
                                                                                        data with DBMS data for operational
                                        • Can read and fetch HDFS data subsets
                                        • Called as part of SQL query
                                                                                        (transient) business reports
                                        • Output joinable with DBMS data
Join HDFS data with DBMS data on        • Multiple, simultaneous UDF calls possible       • Example –   In Retail: Point Of Sale (POS)
 the fly: Fetch and join subsets of     • Sample UDFs provided in JAVA, C++               detailed data stored in HDFS. DBMS EDW
                                                                                          fetches POS data at fixed intervals from HDFS of
 HDFS data on-demand using SQL
     queries from DBMS(Data           • HDFS data not stored in DBMS                      specific hot selling SKUs, combines with
                                                                                          inventory data in DBMS to predict and prevent
                                        • Fetched into DBMS In-memory tables              inventory “stockouts”.
       Federation technique)            • ACID properties not applicable
                                          • Repeated use: put fetched data in tables

                                      • Visible to BI/other client tools via
                                      ANSI SQL API only



                                                                                                                          Inventory
                                                                                              POS Data                      Data

                                                                                       Hadoop/HDFS        UDF Bridge       DBMS
RICH ECO-SYSTEM
DBMS <–> HADOOP BRIDGE IV

            Feature                             Characteristics                               Big Data Use Cases

                                       • Trigger and fetch Hadoop MR job                  • Ideal for combining results of
                                       results via table UDF                              Hadoop MR job results with DBMS
                                          • Can trigger Hadoop MR jobs
                                          • Called as part of Sybase IQ SQL query         data for operational (transient)
                                          • Output joinable with Sybase IQ data           business reports
                                          • No multiple, simultaneous UDF calls
                                          • Sample UDFs provided in JAVA only
Combine results of Hadoop MR jobs                                                          • Example –   In Utilities: Smart meter and
 with DBMS data on the fly: Initiate                                                       smart grid data can be combined for load
 and Join results of Hadoop MR jobs    • HDFS data not stored in DBMS                      monitoring and demand forecast. Smart grid
 on-demand using SQL queries from         • Fetched into DBMS In-memory tables             transmission quality data (multi-attribute time
                                          • ACID properties not applicable                 series data) stored in HDFS can be computed
   DBMS data (Query Federation              • Repeated use: put fetched data in tables     via Hadoop MR jobs triggered from DBMS and
             technique)                                                                    combined with Smart meter data stored in
                                                                                           DBMS to analyze demand and workload.
                                       • Visible to BI and other client tools
                                       via DBMS ANSI SQL API only



                                                                                             Smart Grid                Smart Meter
                                                                                             Transmission Data         consumption data


                                                                                         Hadoop/HDFS         UDF Bridge        DBMS
RICH ECO-SYSTEM
DBMS <–> PREDICTIVE TOOLS BRIDGE
               Express Complex Computations In Industry Standard Predictive Modeling
                Markup Language (PMML), Plug In Models Close To data for execution


                                                           Database Server
                                                         DBMS


                           SQL
       Applications                                                  Bridge
                                                                                  Universal
                        Predictions
                                                                                   Plug-In




                                                                                       PMML
                                                        UDFs
        PMML
         PMML
          PMML
       (models)                                          PMML Preprocessor
        (models)
         (models)                                        (convert & validate)
RICH ECO-SYSTEM
FUNDAMENTALS OF STREAMS TECHNOLOGY

                                     Process data without storing it



 Input Streams
 Events arrive on input streams
                                                                                Derived Streams, Windows
                                                                                  Apply continuous query
                                                                                  operators to one or more
                                                                                  input streams to produce
                                                                                  a new stream




         Continuous Queries create a new                           Windows can Have State
         “derived” stream or window                                • Retention rules define how many or how
                                                                     long events are kept
               SELECT FROM one or more input
                                                                   • Opcodes in events can indicate
               streams/windows
                                                                     insert/update/delete and can be
               WHERE…
                                                                     automatically applied to the window
               GROUP BY…
RICH ECO-SYSTEM
STREAMS DATA PROCESSING VS TRADITIONAL DATA PROCESESSING



             SQL                              CCL
                                         Windows on
            Tables                      Event Streams
             Rows                           Events

           Columns                           Fields

      On-Demand: query                  Event-Driven:
      runs when information            query updates when
            is needed                   information arrives
RICH ECO-SYSTEM
STREAMS PRE-PROCESSING
   Why store Big Data when you can deal with Small Data – Pre-filter un-necessary data on the fly with Streams technologies



                                                    ESP Engine



                                                                                          Alerts Actions

               Updates



               Memory


                 Disk




                                      Hadoop/HDFS                      DBMS
SUMMARY
3-LAYER LOGICAL INTEGRATION
   STREAM PROCESSING <-> NoSQL <-> DBMS




              BI TOOLS      DI TOOLS    DBA TOOLS    DATA MINING TOOLS
Eco-System
                                                                                    Unstructured
                                                                                    Data
     App                                                         Ingest + Persist   (Hadoop,
 Services      Web 2.0       Java       C/C++        SQL             Federation
                                                                                    Content Mgmt)



                                                                                    Structured Data
                                                                                    (DBMS)


   DMBS
                                                                                    Streaming Data
                                                                                    (ESP)




             The heterogeneous world will require co-existence and playing nice!
Q&A

Learn More: http://www.sybase.com/sybaseiqbigdata
Contact: 1-800-SYBASE5 (792.2735)

Más contenido relacionado

La actualidad más candente

HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
End-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraEnd-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraJeremy Hanna
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopEdureka!
 
Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017HashedIn Technologies
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology LandscapeShivanandaVSeeri
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: RevealedSachin Holla
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irdatastack
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentationArvind Kumar
 
Hadoop Architecture
Hadoop Architecture Hadoop Architecture
Hadoop Architecture Ganesh B
 

La actualidad más candente (20)

Apache Hadoop at 10
Apache Hadoop at 10Apache Hadoop at 10
Apache Hadoop at 10
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
End-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache CassandraEnd-to-end Analytics with Apache Cassandra
End-to-end Analytics with Apache Cassandra
 
JOSA TechTalks - Big Data on Hadoop
JOSA TechTalks - Big Data on HadoopJOSA TechTalks - Big Data on Hadoop
JOSA TechTalks - Big Data on Hadoop
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017Redis Modules - Redis India Tour - 2017
Redis Modules - Redis India Tour - 2017
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
מיכאל
מיכאלמיכאל
מיכאל
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Hadoop Family and Ecosystem
Hadoop Family and EcosystemHadoop Family and Ecosystem
Hadoop Family and Ecosystem
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Hadoop Architecture
Hadoop Architecture Hadoop Architecture
Hadoop Architecture
 

Destacado

Apache Spark ile Twitter’ı izlemek
Apache Spark ile Twitter’ı izlemekApache Spark ile Twitter’ı izlemek
Apache Spark ile Twitter’ı izlemekMehmet Uluer, MSc.
 
Büyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi GörmekBüyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi Görmekideaport
 
BUYUK VERI ILE RISK YONETIMI
BUYUK VERI ILE RISK YONETIMIBUYUK VERI ILE RISK YONETIMI
BUYUK VERI ILE RISK YONETIMIKutlu MERİH
 
MEF Üniversitesi - IoT & Data Dersi
MEF Üniversitesi - IoT & Data DersiMEF Üniversitesi - IoT & Data Dersi
MEF Üniversitesi - IoT & Data Dersiİbrahim KIVANÇ
 
Hadoop,Pig,Hive ve Oozie ile Büyük Veri Analizi
Hadoop,Pig,Hive ve Oozie ile Büyük Veri AnaliziHadoop,Pig,Hive ve Oozie ile Büyük Veri Analizi
Hadoop,Pig,Hive ve Oozie ile Büyük Veri AnaliziSerkan Sakınmaz
 
Büyük veri(bigdata)
Büyük veri(bigdata)Büyük veri(bigdata)
Büyük veri(bigdata)Hülya Soylu
 
Büyük Veri, Hadoop Ekosistemi ve Veri Bilimi
Büyük Veri, Hadoop Ekosistemi ve Veri BilimiBüyük Veri, Hadoop Ekosistemi ve Veri Bilimi
Büyük Veri, Hadoop Ekosistemi ve Veri BilimiAnkara Big Data Meetup
 
Büyük Veri ve Risk Yönetimi
Büyük Veri ve Risk YönetimiBüyük Veri ve Risk Yönetimi
Büyük Veri ve Risk YönetimiFatma ÇINAR
 
Big Data (Büyük Veri) Nedir?
Big Data (Büyük Veri) Nedir?Big Data (Büyük Veri) Nedir?
Big Data (Büyük Veri) Nedir?Renerald
 
Yapay Zeka, Deep Learning and Machine Learning
Yapay Zeka, Deep Learning and Machine LearningYapay Zeka, Deep Learning and Machine Learning
Yapay Zeka, Deep Learning and Machine LearningAlper Nebi Kanlı
 
Sosyal mühendislik saldırıları
Sosyal mühendislik saldırılarıSosyal mühendislik saldırıları
Sosyal mühendislik saldırılarıAlper Başaran
 
Garnizon dns guvenligi
Garnizon dns guvenligiGarnizon dns guvenligi
Garnizon dns guvenligiAlper Başaran
 
İstSec'14 - Hamza Şamlıoğlu - Sosyal Medya ve Siber Riskler
İstSec'14 - Hamza Şamlıoğlu - Sosyal Medya ve Siber RisklerİstSec'14 - Hamza Şamlıoğlu - Sosyal Medya ve Siber Riskler
İstSec'14 - Hamza Şamlıoğlu - Sosyal Medya ve Siber RisklerBGA Cyber Security
 
RECOVERY: Olay sonrası sistemleri düzeltmek
RECOVERY: Olay sonrası sistemleri düzeltmekRECOVERY: Olay sonrası sistemleri düzeltmek
RECOVERY: Olay sonrası sistemleri düzeltmekAlper Başaran
 
Siber güvenlik ve hacking
Siber güvenlik ve hackingSiber güvenlik ve hacking
Siber güvenlik ve hackingAlper Başaran
 

Destacado (20)

Apache Spark ile Twitter’ı izlemek
Apache Spark ile Twitter’ı izlemekApache Spark ile Twitter’ı izlemek
Apache Spark ile Twitter’ı izlemek
 
Büyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi GörmekBüyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi Görmek
 
BUYUK VERI ILE RISK YONETIMI
BUYUK VERI ILE RISK YONETIMIBUYUK VERI ILE RISK YONETIMI
BUYUK VERI ILE RISK YONETIMI
 
APT Eğitimi Sunumu
APT Eğitimi Sunumu APT Eğitimi Sunumu
APT Eğitimi Sunumu
 
Pentest almak
Pentest almakPentest almak
Pentest almak
 
Bulutta Büyük Veri Yönetimi
Bulutta Büyük Veri YönetimiBulutta Büyük Veri Yönetimi
Bulutta Büyük Veri Yönetimi
 
MEF Üniversitesi - IoT & Data Dersi
MEF Üniversitesi - IoT & Data DersiMEF Üniversitesi - IoT & Data Dersi
MEF Üniversitesi - IoT & Data Dersi
 
Hadoop,Pig,Hive ve Oozie ile Büyük Veri Analizi
Hadoop,Pig,Hive ve Oozie ile Büyük Veri AnaliziHadoop,Pig,Hive ve Oozie ile Büyük Veri Analizi
Hadoop,Pig,Hive ve Oozie ile Büyük Veri Analizi
 
Büyük veri(bigdata)
Büyük veri(bigdata)Büyük veri(bigdata)
Büyük veri(bigdata)
 
Büyük Veri, Hadoop Ekosistemi ve Veri Bilimi
Büyük Veri, Hadoop Ekosistemi ve Veri BilimiBüyük Veri, Hadoop Ekosistemi ve Veri Bilimi
Büyük Veri, Hadoop Ekosistemi ve Veri Bilimi
 
Büyük Veri ve Risk Yönetimi
Büyük Veri ve Risk YönetimiBüyük Veri ve Risk Yönetimi
Büyük Veri ve Risk Yönetimi
 
Big Data (Büyük Veri) Nedir?
Big Data (Büyük Veri) Nedir?Big Data (Büyük Veri) Nedir?
Big Data (Büyük Veri) Nedir?
 
Yapay Zeka, Deep Learning and Machine Learning
Yapay Zeka, Deep Learning and Machine LearningYapay Zeka, Deep Learning and Machine Learning
Yapay Zeka, Deep Learning and Machine Learning
 
Sosyal mühendislik saldırıları
Sosyal mühendislik saldırılarıSosyal mühendislik saldırıları
Sosyal mühendislik saldırıları
 
Garnizon dns guvenligi
Garnizon dns guvenligiGarnizon dns guvenligi
Garnizon dns guvenligi
 
IOT Güvenliği
IOT GüvenliğiIOT Güvenliği
IOT Güvenliği
 
İstSec'14 - Hamza Şamlıoğlu - Sosyal Medya ve Siber Riskler
İstSec'14 - Hamza Şamlıoğlu - Sosyal Medya ve Siber RisklerİstSec'14 - Hamza Şamlıoğlu - Sosyal Medya ve Siber Riskler
İstSec'14 - Hamza Şamlıoğlu - Sosyal Medya ve Siber Riskler
 
APT Saldırıları
APT SaldırılarıAPT Saldırıları
APT Saldırıları
 
RECOVERY: Olay sonrası sistemleri düzeltmek
RECOVERY: Olay sonrası sistemleri düzeltmekRECOVERY: Olay sonrası sistemleri düzeltmek
RECOVERY: Olay sonrası sistemleri düzeltmek
 
Siber güvenlik ve hacking
Siber güvenlik ve hackingSiber güvenlik ve hacking
Siber güvenlik ve hacking
 

Similar a Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase

Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureOdinot Stanislas
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London SeminarHortonworks
 
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Cana Ko
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data European Data Forum
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyHitachi Vantara
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data SolutionsMark Kromer
 
Information Management: Answering Today’s Enterprise Challenge
Information Management: Answering Today’s Enterprise ChallengeInformation Management: Answering Today’s Enterprise Challenge
Information Management: Answering Today’s Enterprise ChallengeBob Rhubart
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesDataWorks Summit
 
Sybase Complex Event Processing
Sybase Complex Event ProcessingSybase Complex Event Processing
Sybase Complex Event ProcessingSybase Türkiye
 
Big Data Needs Big Analytics
Big Data Needs Big AnalyticsBig Data Needs Big Analytics
Big Data Needs Big AnalyticsDeepak Ramanathan
 
Scaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write SplittingScaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write SplittingScaleBase
 
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionReal-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionRevolution Analytics
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase
 
Solving Compliance for Big Data
Solving Compliance for Big DataSolving Compliance for Big Data
Solving Compliance for Big Datafbeckett1
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisOW2
 
Big Data and HPC
Big Data and HPCBig Data and HPC
Big Data and HPCNetApp
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANASAP Technology
 

Similar a Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase (20)

Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London Seminar
 
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage Strategy
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
Information Management: Answering Today’s Enterprise Challenge
Information Management: Answering Today’s Enterprise ChallengeInformation Management: Answering Today’s Enterprise Challenge
Information Management: Answering Today’s Enterprise Challenge
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation Architectures
 
Sybase Complex Event Processing
Sybase Complex Event ProcessingSybase Complex Event Processing
Sybase Complex Event Processing
 
Big Data Needs Big Analytics
Big Data Needs Big AnalyticsBig Data Needs Big Analytics
Big Data Needs Big Analytics
 
Scaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write SplittingScaling MySQL: Catch 22 of Read Write Splitting
Scaling MySQL: Catch 22 of Read Write Splitting
 
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionReal-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Introducing Splunk – The Big Data Engine
Introducing Splunk – The Big Data EngineIntroducing Splunk – The Big Data Engine
Introducing Splunk – The Big Data Engine
 
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
 
Solving Compliance for Big Data
Solving Compliance for Big DataSolving Compliance for Big Data
Solving Compliance for Big Data
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
 
Big Data and HPC
Big Data and HPCBig Data and HPC
Big Data and HPC
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
 

Más de Sybase Türkiye

Italya Posta Teskilatı Sybase Afaria Kullaniyot
Italya Posta Teskilatı Sybase Afaria KullaniyotItalya Posta Teskilatı Sybase Afaria Kullaniyot
Italya Posta Teskilatı Sybase Afaria KullaniyotSybase Türkiye
 
SAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORT
SAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORTSAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORT
SAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORTSybase Türkiye
 
SAP Sybase Event Streaming Processing
SAP Sybase Event Streaming ProcessingSAP Sybase Event Streaming Processing
SAP Sybase Event Streaming ProcessingSybase Türkiye
 
Sybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem PerformansSybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem PerformansSybase Türkiye
 
Mobil Uygulama Geliştirme Klavuzu
Mobil Uygulama Geliştirme KlavuzuMobil Uygulama Geliştirme Klavuzu
Mobil Uygulama Geliştirme KlavuzuSybase Türkiye
 
Mobile Device Management for Dummies
Mobile Device Management for DummiesMobile Device Management for Dummies
Mobile Device Management for DummiesSybase Türkiye
 
SAP Sybase Data Management
SAP Sybase Data Management SAP Sybase Data Management
SAP Sybase Data Management Sybase Türkiye
 
Sybase IQ ile Analitik Platform
Sybase IQ ile Analitik PlatformSybase IQ ile Analitik Platform
Sybase IQ ile Analitik PlatformSybase Türkiye
 
Appcelerator report-q2-2012
Appcelerator report-q2-2012Appcelerator report-q2-2012
Appcelerator report-q2-2012Sybase Türkiye
 
Sybase PowerDesigner Vs Erwin
Sybase PowerDesigner Vs ErwinSybase PowerDesigner Vs Erwin
Sybase PowerDesigner Vs ErwinSybase Türkiye
 
Elastic Platform for Business Analytics
Elastic Platform for Business AnalyticsElastic Platform for Business Analytics
Elastic Platform for Business AnalyticsSybase Türkiye
 
Information Architech and DWH with PowerDesigner
Information Architech and DWH with PowerDesignerInformation Architech and DWH with PowerDesigner
Information Architech and DWH with PowerDesignerSybase Türkiye
 
Real-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQReal-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQSybase Türkiye
 
Mobile Application Strategy
Mobile Application StrategyMobile Application Strategy
Mobile Application StrategySybase Türkiye
 
Mobile is the new face of business
Mobile is the new face of businessMobile is the new face of business
Mobile is the new face of businessSybase Türkiye
 

Más de Sybase Türkiye (20)

Italya Posta Teskilatı Sybase Afaria Kullaniyot
Italya Posta Teskilatı Sybase Afaria KullaniyotItalya Posta Teskilatı Sybase Afaria Kullaniyot
Italya Posta Teskilatı Sybase Afaria Kullaniyot
 
SAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORT
SAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORTSAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORT
SAP REAL TIME DATA PLATFORM WITH SYBASE SUPPORT
 
SAP Sybase Event Streaming Processing
SAP Sybase Event Streaming ProcessingSAP Sybase Event Streaming Processing
SAP Sybase Event Streaming Processing
 
Sybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem PerformansSybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem Performans
 
Mobil Uygulama Geliştirme Klavuzu
Mobil Uygulama Geliştirme KlavuzuMobil Uygulama Geliştirme Klavuzu
Mobil Uygulama Geliştirme Klavuzu
 
Mobile Device Management for Dummies
Mobile Device Management for DummiesMobile Device Management for Dummies
Mobile Device Management for Dummies
 
SAP Sybase Data Management
SAP Sybase Data Management SAP Sybase Data Management
SAP Sybase Data Management
 
Sybase IQ ve Big Data
Sybase IQ ve Big DataSybase IQ ve Big Data
Sybase IQ ve Big Data
 
Sybase IQ ile Analitik Platform
Sybase IQ ile Analitik PlatformSybase IQ ile Analitik Platform
Sybase IQ ile Analitik Platform
 
SAP EIM
SAP EIM SAP EIM
SAP EIM
 
Appcelerator report-q2-2012
Appcelerator report-q2-2012Appcelerator report-q2-2012
Appcelerator report-q2-2012
 
Sybase PowerDesigner Vs Erwin
Sybase PowerDesigner Vs ErwinSybase PowerDesigner Vs Erwin
Sybase PowerDesigner Vs Erwin
 
Elastic Platform for Business Analytics
Elastic Platform for Business AnalyticsElastic Platform for Business Analytics
Elastic Platform for Business Analytics
 
Actionable Architecture
Actionable Architecture Actionable Architecture
Actionable Architecture
 
Information Architech and DWH with PowerDesigner
Information Architech and DWH with PowerDesignerInformation Architech and DWH with PowerDesigner
Information Architech and DWH with PowerDesigner
 
Why modeling matters ?
Why modeling matters ?Why modeling matters ?
Why modeling matters ?
 
Welcome introduction
Welcome introductionWelcome introduction
Welcome introduction
 
Real-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQReal-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQ
 
Mobile Application Strategy
Mobile Application StrategyMobile Application Strategy
Mobile Application Strategy
 
Mobile is the new face of business
Mobile is the new face of businessMobile is the new face of business
Mobile is the new face of business
 

Último

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Último (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase

  • 1. BIG DATA ANALYTICS IN A HETEROGENEOUS WORLD JOYDEEP DAS DIRECTOR, ANALYTICS PRODUCT MANAGEMENT SYBASE INC, AN SAP COMPANY FEBRUARY 16, 2012
  • 2. AGENDA  The real world means business Change is afoot – Myriad solution trends Building bridges across a heterogeneous world  Summary
  • 3. BIG DATA ANALYTICS REAL WORLD MEANS BUSINESS
  • 4. BIG DATA ANALYTICS ISSUES DEALING WITH VOLUME, VARIETY, VELOCITY, COSTS, SKILLS Volume Managing and harnessing massive data sets Skills Variety Lack of adequate BIG Harmonizing silos of skills for popular structured and APIs DATA unstructured data ANALYTICS Costs Velocity Too expensive to Keeping up with acquire, operate, unpredictable data and expand and query flows
  • 5. BIG DATA ANALYTICS MATURITY FROM JARGON TO TRANSFORMATIONAL BUSINESS VALUE* New Strategies & Business Models Column Store Hadoop Big data NoSQL In memory Business data MPP Value* Operational Revenue Efficiencies Growth *A McKinsey study titled “Big Data: Next frontier for innovation, competition, and productivity”, May 2011, has found huge potential for Big Data Analytics with metrics as impressive as 60% improvements in Retail operating margins, 8% reduction in (US) national healthcare expenditures, and $150M savings in operational efficiencies in European economies
  • 6. BIG DATA ANALYTICS IN THE REAL WORLD PREVALENT IN DATA INTENSIVE VERTICALS AND FUNCTIONAL AREAS BIG DATA Verticals ANALYTICS Functional Banking • Marketing Analytics Digital channels Track visits, discover best channel mix: Telcom, email, social media, search • Sales Analytics Global Capital Markets Deep correlations Predict risks based on deal DNA (emails, Retail meetings) pattern match • Operational Analytics Government Atomic machine data Analyze RFIDs, weblogs, SMS, sensors — continuous operational inefficiency Healthcare • Financial Analytics Detailed simulations Information Providers Liquidity, portfolio simulations — Stress tests, error margins
  • 7. CHANGE IS AFOOT MYRIAD OF BIG DATA ANALYTICS SOLUTIONS
  • 8. CAUSAL LINKS: VARIETY, VELOCITY, VOLUME Events data Transactional data µSeconds Multi-media data eCommerce data Continuous and/or Bursts Routinely Petabytes x w a y z Graph data Variety Velocity Volume
  • 9. GROWING USER COMMUNITIES Data Scientists Business Analysts Developers/Programmers Administrators Business users External consumers Business Processes
  • 10. HARDWARE IS SUPERIOR Small Server farms – Scale out Larger Servers with partitions – Scale up Spinning disks to SSDs SSD 1.2x to 2x speed up SSD SSDs to Main Memory 4x to 200x speed up Main Memory to CPU caches CPU Caches 2x to 6x speed up
  • 11. SOFTWARE EXPECTATIONS HAVE CHANGED Intelligence & Automation Execution Characteristics Performance & Scalability Results Characteristics Traditional Contemporary
  • 12. EXECUTION CHARACTERISTICS PERFORMANCE FOCUS 1 2 3 4 5 6 7 8 9… r1 r2 r3 r4 r5 Columnar access MPP: Shared Nothing, Shared Everything Algorithms Computations close to data: InDB Analytics (MapReduce), FPGA filtering In-memory processing
  • 13. EXECUTION CHARACTERISTICS SCALABILITY FOCUS 1 2 3 4 5 6 7 8 9… r1 Data Compression r2 r3 r4 r5 Natural Compression Compression Techniques Hybrid Columnar Column Store Databases Row Store Databases Compression Databases SAN Distributed File Systems DAS NAS Stream Processing Engines Data Filtering Pre-processing Engines Transformation Engines
  • 14. EXECUTION CHARACTERISTICS INTELLIGENCE FOCUS Query & Load Optimization On-demand systems: Virtualization and provisioning CPU Caches CPU Cache Conscious Computations
  • 15. EXECUTION CHARACTERISTICS AUTOMATION FOCUS Data conscious federation Automatic Workload Balancer/Mixer User community focused collaborative services
  • 16. RESULTS CHARACTERISTICS ACCURACY TOLERANCE FOCUS  Complex schemas Multiple applications Write on schema  Atomic level locking  Consistency guarantees across system losses  Declarative API  Interactive  Does encapsulate elements of CAP Traditional  Associated with SQL Simple read on schemas  Single application Batch oriented  Snapshot isolations  Eventual consistency guarantees  Procedural APIs  Does encapsulate elements of ACID Contemporary Associated with NoSQL
  • 17. BUILDING BIG DATA BRIDGES ACROSS A HETEROGENEOUS WORLD
  • 18. COMPREHENSIVE 3-TIER FRAMEWORK COMMERCIAL AND/OR OPEN SOURCE Eco-System Business Intelligence Tools, Data Integration Tools, DBA Tools, Packaged Apps Application Services In-Database Analytics, Multi-lingual Client APIs, Federation, Web Enabled Data Management High Performance, Highly Scalable, Cloud Enabled
  • 19. RELIABLE DATA MANAGEMENT Full Mesh High Speed Interconnect Data Management  Can handle high performance, compression, batch, ad-hoc analysis  Can routinely scale to Petabyte class problems, thousands of concurrent jobs  Typical characteristics  Massively parallel processing of complex queries  In-memory and on-disk optimizations  Elastic resources for user communities  ACID guarantees  Data variety  Information lifecycle management  User friendly automation tools  File systems (schema free) and/or DBMS structures (schema specific)
  • 20. DATA MANAGEMENT INFRASTRUCTURE ROBUST, SCALABLE, HIGH PERFORMANCE Data Discovery Application Modeling Reports/Dashboards Business Decisions (Data Scientists) (Business Analysts) (BI Programmers) (Business End Users) Infrastructure Management Full Mesh High Speed Interconnect (DBAs) • Dynamic, elastic MPP grid – Grow, shrink, provision on-demand – Heavy parallelization • Load, prepare, mine, report in a workflow – Privacy through isolation of resources – Collaboration through sharing of results/data via sharing of resources
  • 21. VERSATILE APPLICATION SERVICES Python ADO.NET PERL Programming PHP Ruby Java C++ APIs Web Services API Application Services In-Database Analytics Plug-Ins: SQL, PMML, C++, JAVA, …  Comprehensive declarative and procedural APIs  In-Database Analytics Plug-In APIs  In-Database Web Services  Query and data federation APIs  Multi-lingual Client APIs
  • 22. VERSATILE APPLICATION SERVICES RICH ALGORITHMS CLOSE TO DATA Sybase IQ Process In Memory Sybase IQ Process RPC CALLS In Memory User’s DLL “A” User’s DLL “B” Library Access Process LOAD User’s DLL “A” User’s DLL “B” LOAD User’s DLL “B” User’s DLL “B” In-database + In-process Multi-lingual APIs • In-process dynamically loaded In-database + Out-process Scalar to Scalar shared libraries • Out of process shared library Scalar sets to Aggregate • Highest possible performance Scalar sets to Dimensional • Lower security risks • Incurs security risks, but Aggregates manageable via privileges • Lower robustness risks Scalar sets to Multi-attribute (bulk) • Incurs robustness risks, but • Lower performance than in- Multi-attribute (bulk) to manageable via multiplex process but better than out of Multi-attribute (bulk) database
  • 23. VERSATILE APPLICATION SERVICES NATIVE MAPREDUCE For stocks in enterprise software sector, find max relative strength of a stock for a trading day* Key (k1) Value (v1) Key (k2) Value (v2) Ticker 30-min interval Weighted variance = (A given stock’s variance 30-min Ticker TickValu TickValue Symbol time / Average Variance across All “N” stocks) interval time Symbol e Day 1 Day 2 SAP 9:30 am +1.4 / (SUM (+1.4-2.8-0.7….)/”N” stocks) 9:30 am SAP 51 52.4 SAP 10:00 am +2.2 / (SUM (+2.2-2.3-1.1 ….)/”N” stocks) 9:30 am ORCL 31 28.2 Map SAP …… …… 9:30 am TDC 22 21.3 Fn ORCL 9:30 am -2.8 / (SUM (+1.4-2.8-0.7….)/”N” stocks) ORCL 10:00 am -2.3 / (SUM (+2.2-2.3-1.1 ….)/”N” stocks) 10:00 am SAP 50.9 53.1 ORCL ……. ….. 10:00 am TDC 21.8 20.9 TDC 9:30 am -0.7 / (SUM (+1.4-2.8-0.7….)/”N” stocks) 10:00 am ORCL 29.4 27.1 TDC 10:00 am -1.1/ (SUM (+2.2-2.3-1.1 ….)/”N” stocks) ….. ORCL …… ….. TDC ….. …… Reduce Fn Value (v3) Ticker Symbol Max Absolute Weighted Variance (v3) SAP Max (ABS(9:30 Wt Var), ABS(10:00 Wt Var), …..) ORCL Max (ABS(9:30 Wt Var), ABS(10:00 Wt Var), …..) TDC Max (ABS(9:30 Wt Var), ABS(10:00 Wt Var), …..) *Calculate max variance for the day by comparing each 30-min interval tick values across two days: the trading day & the day before, weighted by average variance of all stocks for each 30-min interval
  • 24. VERSATILE APPLICATION SERVICES NATIVE MAPREDUCE – DECLARATIVE WAY For stocks in enterprise software sector, find max relative strength of a stock for a trading day • Map function declaration: CREATE PROCEDURE MapVarTPF (IN XY TABLE (a1 char, a2 datetime, a3 float, a4 float) RESULT SET YZ (b1 char, b2 datetime, b3 float) • Reduce function declaration: CREATE PROCEDURE RedMaxVarTPF (IN XY TABLE (a1 char, a2 datetime, a3 float) RESULTE SET YZ (b1 char, b2 float) • Query: SELECT RedMaxVarTPF.TickSymb, RedMaxVarTPF.MaxVar, FROM RedMaxVarTPF (TABLE (SELECT MapVarTPF.TickSymb, MapVarTPF.30MinIntTime, MapVarTPF.Var FROM MapVarTPF (TABLE ( SELECT TickDataTab.TickSymb, TickDataTab.30MinIntTime, TickDataTab.30MinValDay1, TickDataTab.30MinValDay2) OVER (PARTITION BY TickDataTab.30MinInt))) OVER (PARTITION BY MapVarTPF.TickSymb)) ORDER BY RedMaxVarTPF.TickSymb • Native MapReduce parallel execution workflow: MapVarTPF (Partitioned to 15 parallel instances) RedMaxVarTPF (Partitioned to 25 parallel instances) SQL Query collates output using 1 node ……. ……. ….. SAN Fabric SAN Fabric SAN Fabric • Native MapReduce with unstructured data: Native MapReduce using can easily be applied to unstructured data also e.g. text, multi-media, … stored in DBMS or to unstructured data brought into DBMS during execution time from external files
  • 25. RICH ECO-SYSTEM Source Answers Data preparation Data Usage Eco-System DBMS / Filesystem Event Processing Data Federation Business Intelligence Data Modeling / Database Design Tool  Business Intelligence Tools  Data Integration Tools  Data Mining Tools  Application Tools  DBA Tools
  • 26. RICH ECO-SYSTEM DBMS <–> HADOOP BRIDGE I Feature Characteristics Big Data Use Cases • Client tool capable of querying •Ideal for bringing together Big Data DBMS and Hadoop Analytics pre-computations from different domains • Better performance when results from sources are pre- • Example – In Telecommunication: DBMS Client Side Federation: Join data computed/pre-aggregated has aggregated customer loyalty data & Hadoop with aggregated network from DBMS AND Hadoop at a client utilization data; Quest Toad for Cloud can application level bring data from both sources, linking customer loyalty to network utilization or network faults (e.g. dropped calls) Quest Toad for Cloud DBMS Hadoop/Hive
  • 27. RICH ECO-SYSTEM DBMS <–> HADOOP BRIDGE II Feature Characteristics Big Data Use Cases • Extract & load subsets of HDFS data into DBMS store • Raw data from HDFS • Results of Hadoop MR jobs • Ideal for combining subsets of HDFS ETL unstructured data or summary of • HDFS Data stored in DBMS is HDFS data into DBMS for mid to long treated like other DBMS data term usage in business reports Load Hadoop Data into DBMS • Gets ACID properties of a DBMS column store: Extract, Transform, • Can be indexed, joined, parallelized • Example – In eCommerce: clickstream data • Can be queried in an ad-hoc way from weblogs stored in HDFS and outputs of Load data from HDFS (Hadoop MR jobs on that data (to study browsing Distributed File System) into DBMS • Visible to BI and other client tools behavior) ETL’d into DBMS. The transactional schemas sales data in DBMS joined with clickstream data via DBMS ANSI SQL API only to understand and predict customer browsing to buying behavior • Currently, the bulk data transfer utility SQOOP (built by Cloudera) is can be used provide this ETL capability Clickstream Data Sales Data Hadoop/Hive SQOOP DBMS
  • 28. RICH ECO-SYSTEM DBMS <–> HADOOP BRIDGE III Feature Characteristics Big Data Use Cases • Scan and fetch specified data • Ideal for combining subsets of HDFS subsets from HDFS via table UDF data with DBMS data for operational • Can read and fetch HDFS data subsets • Called as part of SQL query (transient) business reports • Output joinable with DBMS data Join HDFS data with DBMS data on • Multiple, simultaneous UDF calls possible • Example – In Retail: Point Of Sale (POS) the fly: Fetch and join subsets of • Sample UDFs provided in JAVA, C++ detailed data stored in HDFS. DBMS EDW fetches POS data at fixed intervals from HDFS of HDFS data on-demand using SQL queries from DBMS(Data • HDFS data not stored in DBMS specific hot selling SKUs, combines with inventory data in DBMS to predict and prevent • Fetched into DBMS In-memory tables inventory “stockouts”. Federation technique) • ACID properties not applicable • Repeated use: put fetched data in tables • Visible to BI/other client tools via ANSI SQL API only Inventory POS Data Data Hadoop/HDFS UDF Bridge DBMS
  • 29. RICH ECO-SYSTEM DBMS <–> HADOOP BRIDGE IV Feature Characteristics Big Data Use Cases • Trigger and fetch Hadoop MR job • Ideal for combining results of results via table UDF Hadoop MR job results with DBMS • Can trigger Hadoop MR jobs • Called as part of Sybase IQ SQL query data for operational (transient) • Output joinable with Sybase IQ data business reports • No multiple, simultaneous UDF calls • Sample UDFs provided in JAVA only Combine results of Hadoop MR jobs • Example – In Utilities: Smart meter and with DBMS data on the fly: Initiate smart grid data can be combined for load and Join results of Hadoop MR jobs • HDFS data not stored in DBMS monitoring and demand forecast. Smart grid on-demand using SQL queries from • Fetched into DBMS In-memory tables transmission quality data (multi-attribute time • ACID properties not applicable series data) stored in HDFS can be computed DBMS data (Query Federation • Repeated use: put fetched data in tables via Hadoop MR jobs triggered from DBMS and technique) combined with Smart meter data stored in DBMS to analyze demand and workload. • Visible to BI and other client tools via DBMS ANSI SQL API only Smart Grid Smart Meter Transmission Data consumption data Hadoop/HDFS UDF Bridge DBMS
  • 30. RICH ECO-SYSTEM DBMS <–> PREDICTIVE TOOLS BRIDGE Express Complex Computations In Industry Standard Predictive Modeling Markup Language (PMML), Plug In Models Close To data for execution Database Server DBMS SQL Applications Bridge Universal Predictions Plug-In PMML UDFs PMML PMML PMML (models) PMML Preprocessor (models) (models) (convert & validate)
  • 31. RICH ECO-SYSTEM FUNDAMENTALS OF STREAMS TECHNOLOGY Process data without storing it Input Streams Events arrive on input streams Derived Streams, Windows Apply continuous query operators to one or more input streams to produce a new stream Continuous Queries create a new Windows can Have State “derived” stream or window • Retention rules define how many or how long events are kept SELECT FROM one or more input • Opcodes in events can indicate streams/windows insert/update/delete and can be WHERE… automatically applied to the window GROUP BY…
  • 32. RICH ECO-SYSTEM STREAMS DATA PROCESSING VS TRADITIONAL DATA PROCESESSING SQL CCL Windows on Tables Event Streams Rows Events Columns Fields On-Demand: query Event-Driven: runs when information query updates when is needed information arrives
  • 33. RICH ECO-SYSTEM STREAMS PRE-PROCESSING Why store Big Data when you can deal with Small Data – Pre-filter un-necessary data on the fly with Streams technologies ESP Engine Alerts Actions Updates Memory Disk Hadoop/HDFS DBMS
  • 35. 3-LAYER LOGICAL INTEGRATION STREAM PROCESSING <-> NoSQL <-> DBMS BI TOOLS DI TOOLS DBA TOOLS DATA MINING TOOLS Eco-System Unstructured Data App Ingest + Persist (Hadoop, Services Web 2.0 Java C/C++ SQL Federation Content Mgmt) Structured Data (DBMS) DMBS Streaming Data (ESP) The heterogeneous world will require co-existence and playing nice!